Welcome

The goal of the class is to learn how to apply microeconomic concepts to large and complex datasets. We will first revisit notions such as identification, inference and latent heterogeneity in classical contexts. We will then study potential concerns in the presence of a large number of parameters in order to understand over-fitting. Throughout the class, emphasis will be put on project-driven computational exercises involving large datasets. We will learn how to efficiently process and visualize such data using state of the art tools in python. Topics will include fitting models using Tensor-Flow and neural nets, creating event studies using pandas, solving large-scale SVDs, etc.

This website together with the slack group will be the primary source of content.

Warning

I have been building the content of the course from scracth. The syllabus is not set in stone and will very likely change during the course of the term. Please bear with me!

Format

The lectures will be held in person. Class notes and related notebooks will be posted on this website. There will be a strong emphasis on take-home work in the form of assignments to run on the computer. There will be a mix between short psets (should be done individually) and long homeworks (done in teams of two).

Grading

We will have a midterm, we will use the long homework and the short tasks to form the overall grade. The weight will be 2/5 long homework, 1/5 short homeworks and 2/5 midterm.

List of topics:

Linear and parametric models notes
- review: population, estimand, identification, estimation
- finite sample versus asymptotic
- other models: non-linear conditional mean, IV, MLE
- homeworks: Pi pset, P&P pset, OLS pset
Topics on inference notes
- dependence in error and clustering standard errors
- bootstrap lab
- multiple testing
- weak IV
- homeworks: Inference pset
Beyond parametric models
- Non parametric estimators
- Many regressors
Treatment evaluation
- potential outcome notations
- Diff in Diff, pre-trends and examples
- Event studies, assumptions and examples
- synthetic controls
- heterogeneous causal effects
- homeworks: yelp pset
Network problems
- incidental parameter problem, overfitting
- bias correction
- large scale PCA, clustering
- network formation
- homeworks: Effect of classroom pset

Tools we will learn about:

create deployable environments, maintain code
- we will work with conda or poetry
- git and most likely github
- testing and logging
Working with databases
- mainly pandas, but possibly we will expand to modin, dask and pyspark
- megre, groupby and aggregate, method chaining
Plotting and reporting
- matplotlib and seaborn
- styling pandas
- creating latex tables
Creating workflows
- doit, but principles apply to Airflow, metaflow, nextflow or others.
Automatic differentiation and Stochastic EM
- tensorflow and pytorch
Computing in the cloud
- getting started with Amazon - ec2