Harvard CS 109A: Data Science 1: Introduction to Data Science
CS 109A — cross-listed as Stat 109A — is the first half of Harvard's data science sequence: data wrangling, exploratory analysis, regression, classification, and model evaluation in Python. Past course materials are published openly on the teaching team's site, giving it a large self-study audience beyond enrolled students.
Fennie is independent and not affiliated with Harvard University. This is an unofficial study guide.
Build my CS 109A study planWhat makes it hard
The course sits at the junction of programming, statistics, and judgment — homeworks demand clean pandas code, correct inference, and sensible modeling decisions all at once. Students with only one of those legs (strong coders weak on stats, or vice versa) feel the missing leg on every assignment.
What you'll cover
- • Data wrangling with Python and pandas
- • Exploratory data analysis and visualization
- • Linear and logistic regression
- • Model selection and regularization
- • Classification and k-NN
- • Cross-validation and model evaluation
The CS 109A study guide
How to study for Harvard CS 109A, step by step.
- 1
Audit both legs: Python and statistics
CS 109A assumes CS50-level programming and Stat 100-level statistics, and homeworks punish whichever one is weaker. Identify your weak leg in week one and put deliberate practice there.
- 2
Rebuild the lab notebooks from scratch
Running provided notebooks feels like learning and isn't. After each lab, recreate the analysis in a blank notebook from the raw data — that's the skill homeworks actually grade.
- 3
Narrate every modeling decision
Why this model, why these features, why this validation split — write one sentence per choice. The graders reward reasoning, and the habit catches errors before they propagate.
- 4
Keep a personal log of pandas and sklearn gotchas
Index alignment, data leakage, fit-versus-transform — the same handful of traps cost points all semester. Recording each one once prevents paying for it twice.
- 5
Pace the pipeline with Fennie
Upload the CS 109A syllabus — or the public materials if you're self-pacing — and Fennie's Daily Plan schedules labs and homework stages to your deadlines, with quizzes on the statistical concepts generated from the actual course content. Free to start.
Start my CS 109A plan free
How Fennie helps with CS 109A
Fennie's Daily Plans pace CS 109A's lab-and-homework pipeline so the statistics review happens before the code needs it. Chat through why a model overfits or what a regularization penalty is doing, and quiz yourself on the inference concepts that separate working code from correct analysis.
FAQ
Is CS 109A hard?
It's demanding through breadth — programming, statistics, and modeling judgment in every homework. Students solid in Python and intro statistics find the workload heavy but fair.
What's the difference between CS 109A and Stat 110?
Stat 110 is probability theory; CS 109A is applied data science — wrangling, regression, and machine learning practice in Python. Stat 110 is recommended background, not a substitute.
Can I self-study CS 109A online?
Yes — past offerings publish lectures, labs, and homework notebooks openly. Work the homeworks honestly rather than reading the solutions; the judgment is built by doing.
Pass CS 109A with a plan, not a cram
Upload your CS 109A materials and Fennie generates a Daily Plan paced to your deadline — plus chat, flashcards, and quizzes built from the actual course content.
Get started freeMore Harvard courses
CS50 — Introduction to Computer Science
CS50 is Harvard's famous intro to computer science, taught by David Malan — and through CS50x on edX, almost certainly the most-taken and most-searched college course in the world. It moves from C through data structures, memory, and algorithms to Python, SQL, and web development, ending with a final project.
CS 51 — Abstraction and Design in Computation
CS 51 is the standard course after CS50 for Harvard CS concentrators, teaching functional programming in OCaml alongside design principles — abstraction, modularity, and multiple programming paradigms. It's where students go from making code work to making it well-designed.
CS 124 — Data Structures and Algorithms
CS 124 is Harvard's algorithms course — divide and conquer, greedy algorithms, dynamic programming, graph algorithms, hashing, and NP-completeness — combining rigorous analysis with programming assignments. It's a core theory requirement for CS concentrators and a known interview-prep powerhouse.
CS 61 — Systems Programming and Machine Organization
CS 61 is Harvard's systems programming course — C and C++, assembly, memory, caching, process control, and concurrency — and one of the two standard follow-ons to CS50 for CS concentrators. Its course site publishes lecture notes and problem sets publicly, so it also draws self-learners looking for a systems sequel to CS50.