Quick Facts
Ages: 14–17
Grades: Rising ninth graders through rising 12th graders
Dates: June 23–27
Time: 9 a.m.–Noon
Location: Zoom
Format: Virtual
Duration: One week
Cost: $350
Topic Area
Data Science
Program Description
This online program is designed for high school students who are interested in big data and analytics. Through interactive projects in data analysis and visualization, students will explore essential data science concepts with hands-on guidance. This program combines practical skill-building with analytical thinking to introduce participants to the expanding field of data science in an engaging and accessible way.
Sample Schedule
- Day 1: Introduction—What is big data and data science? Why does it matter?
- What is big data? What is data science? Why does it matter?
- Characteristics of big data
- Data science lifecycle
- Real-world examples in different areas
- Introduction to R and Python and Jupyter Notebook for big data analysis
- Day 2: Data Exploration and Preprocessing and Visualization
- Data structure
- Handling missing values, outliers, and categorical data
- Descriptive Statistics
- Data visualization
- Day 3: Data Analysis—Statistics and Machine Learning
- Supervised and un-supervised methods, including regression, classification, and clustering
- Statistical inference
- Day 4: Data Analysis Continues; Model Evaluations and Storytelling
- Day 5: Big Data Processing and Capstone Presentation
- Brief introduction to database using SQL, Hadoop, Spark, etc.
- Capstone project presentation and reflection
Example Activities
- Determine a problem of interest that students want to dive into and explore:
- Given that students come from different backgrounds and have different interests, the instructor will ask each student to let them know if there is anything they can think of that they want to know more about
- Students will be asked how they can possibly answer the questions that they ask
- Real data analysis and storytelling in different steps:
- Choose a real-world data set (e.g., COVID-19 trends, stock market, traffic) that can come from sources such as Kaggle, Chicago data portal, etc.
- Students will generate a summary for the data that they think can be useful for their own projects.
- Students will formulate a problem that they can try to answer with the data they have
- Students will work to come up with a relatively comprehensive story for their data sets and problems of interest
How Students Will Benefit
- Understanding the basics of data science and big data and its real-world applications
- Appreciating the value of data and getting prepared for basic data analysis
- Understanding core analytics tasks such as exploratory data analysis, classification, prediction, optimization, and recommendation
- Hands-on experience with data science tools and real-world case studies that includes basic coding experience in R and Python
- Exposure to the nature, potential, and tools for processing big data
Eligibility and Program Requirements:
No prior experience is necessary!