

























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This lecture introduces the field of data science, outlining its key components and applications. It distinguishes between data scientists, data engineers, and data analysts, highlighting their roles and responsibilities. The lecture also covers the tools used in data science, including databricks, apache spark, and python. It provides an overview of databricks notebooks and their functionalities, emphasizing the use of python for data analysis and visualization. The lecture concludes with a homework assignment that involves setting up a databricks account, creating a notebook, and exploring data frames and visualization techniques.
Typology: Slides
1 / 33
This page cannot be seen from the preview
Don't miss anything!
Professor David Harrison
2024 CSCI 443 4
2024 CSCI 443 5
2024 CSCI 443 7
Data Analytics, Visualization
20XX CSCI 443 11 .
GITHUB 20XX CSCI 443^13 Example files I create during class will be put on github. The project is at https://github.com/dosirrah/CSCI443_25S _AdvancedDataScience You will need to create a Github account independent of your olemiss accounts. GitHub is free for our purposes. I highly recommend committing any code you create to GitHub.
20XX CSCI 443 14 Last year I used the department Gitlab. This semester I will only use github.
20XX CSCI 443^16 Community edition is free. Offers a single instance with limited cap abilities, but should be adequate for teaching.
20XX CSCI 443^17 Community edition is free. You do not need an AWS or Azure account. You do not need to sign up for the 14 - day trial.
20XX CSCI 443^19 https://community.cloud.databricks.com Once logged in, you should see options to start a notebook and to imp ort data. Ignore “Upgrade now”
20XX CSCI 443^20 Databricks provides cluster management and a notebook (akin to Jupyter) interface to Apache Spark. Spark unifies: