Loading…
TechNest
Analyze data with Python
Explore and get curious
2 steps
Try things, experiment
2 steps
Go deep, master it
2 steps
Explore & Discover
Data scientists answer real questions using numbers. Start by exploring some of the most interesting publicly available datasets at kaggle.com/datasets — browse around and find something that surprises you. Utah's own data.utah.gov publishes free datasets on air quality along the Wasatch Front, school enrollment, and traffic patterns. Pick one dataset that makes you curious and look at it in a spreadsheet. What do the columns mean? What story might the numbers tell? Watch "What is Data Science?" by IBM Technology on YouTube for a quick overview of what people in this field actually do every day. You're ready for the next step when you can describe a real dataset you explored, name three variables in it, and write one question you could answer using that data.
Learn the Basics
Python is the language of data science. Start at kaggle.com/learn — their free "Python" and "Intro to Programming" courses run entirely in your browser with no installation needed. Complete the first three lessons, which cover variables, data types, and basic math operations. Then move to their "Pandas" course — Pandas is the Python library that lets you load, filter, and sort data tables just like a spreadsheet. Learn how to use df.head(), df.describe(), and df.value_counts() to quickly understand any dataset. Khan Academy also has a free "Statistics and Probability" unit that pairs perfectly with this. You're ready for the next step when you can write a Python script that loads a CSV file using Pandas, prints the first five rows, and shows basic summary statistics.
Build Your First Project
Build your first data analysis project from start to finish. Go to kaggle.com and find the "Titanic" dataset — it is one of the most famous beginner datasets in data science. Load it using Pandas in a Kaggle notebook (free, runs in the browser). Answer three specific questions using code: What percentage of passengers survived? Which passenger class had the best survival rate? What was the average age of survivors vs. non-survivors? Use Matplotlib (already installed on Kaggle) to make at least one chart showing your findings. Write a short paragraph below your chart explaining what the data shows. You're ready for the next step when you have a working Kaggle notebook that answers all three questions with code, charts, and written explanations.
Experiment & Iterate
Switch from exploring old data to analyzing something current and local. Download air quality data for Salt Lake City from the EPA's AQS Data Mart at aqs.epa.gov or use Utah's data.utah.gov air monitoring datasets. Load the data into a Kaggle or Google Colab notebook (colab.research.google.com — free). Use Pandas to find: Which months have the worst air quality on the Wasatch Front? Is PM2.5 (fine particle pollution) getting better or worse over the past five years? Make a line chart showing the trend over time using Matplotlib or Seaborn. Look up what causes inversion events in Salt Lake Valley and connect your findings to real weather patterns. You're ready for the next step when your notebook shows a time-series chart of SLC air quality with at least one written insight about a trend you found.
Advanced Techniques
Now learn to make predictions, not just observations. Complete Kaggle's free "Intro to Machine Learning" course, which teaches you to build decision trees and random forest models in Python using scikit-learn. Train a model to predict something: use the Titanic dataset to predict survival, or find a new dataset on Kaggle that interests you. Learn how to split your data into training and test sets using train_test_split, and measure your model accuracy with accuracy_score. Watch StatQuest with Josh Starmer on YouTube — his videos on decision trees and random forests are the clearest explanations anywhere. You're ready for the next step when you have trained a machine learning model, measured its accuracy on test data, and can explain why splitting training and test data matters.
Final Project Showcase
Your final project: analyze a dataset that matters to your community and present your findings publicly. Choose a topic with local relevance — Utah school test scores, Salt Lake City crime data, snowpack levels in the Wasatch Mountains, or air quality by zip code. Clean the data, explore it, build at least one predictive or descriptive model, and create a clear data story with three to five visualizations. Publish your notebook publicly on Kaggle or GitHub. Write a plain-English summary of your findings — imagine you are presenting to the Salt Lake City Mayor's office. Share your project link on the Kaggle discussion forums and respond to at least one comment. You're ready for the next step when your public notebook has been viewed by at least one person outside your household and you have received at least one piece of written feedback.
Recommended materials and resources for this quest.
Graph paper notebook
RequiredSketch your data visualizations by hand before coding them. Drawing a chart first forces you to think about what story you are actually trying to tell, and reveals whether your data can support that story.
amazon
$4–$9
Book: "Python Crash Course" by Eric Matthes
RequiredThe clearest beginner Python book available. The data visualization chapter alone is worth it — walks through Matplotlib and real datasets with projects you build from scratch. Used by learners worldwide including at Utah coding camps.
amazon
$25–$35
Book: "Storytelling with Data" by Cole Nussbaumer Knaflic
A masterclass in turning raw numbers into visuals that actually communicate. Goes deep on chart choice, color, and layout — the difference between a confusing chart and one that makes your findings instantly obvious.
amazon
$22–$32
Some links may be affiliate links. We may earn a small commission at no extra cost to you.