The goal of the class is to learn how to apply microeconomic concepts to large and complex datasets. We will first revisit notions such as identification, inference and latent heterogeneity in classical contexts. We will then study potential concerns in the presence of a large number of parameters in order to understand over-fitting. Throughout the class, emphasis will be put on project-driven computational exercises involving large datasets. We will learn how to efficiently process and visualize such data using state of the art tools in python. Topics will include fitting models using Tensor-Flow and neural nets, creating event studies using pandas, solving large-scale SVDs, etc.

This website together with the slack group will be the primary source of content.


This is the first time this class is offered, and I am building the content from scratch, hence the syllabus is not set in stone and will very likely change during the course of the term.

In addition, this year I will try to adapt the content as much as possible to the online format due to our campus shut-down.

Please bear with me!

Format of the lectures

Most of the lectures will be done live on zoom, and the link will be shared on slack. Class notes and related notebooks will be posted on this website. There will be strong emphasis on take-home work in the form of assignments to run on the computer.

Schedule (TBD)

List of topics:

  • Linear models
    • review: population, estimand, identificaiton, estimation
    • dependence in error and clustering standard errors,
    • multiple testing
  • Diff in Diff, IV, Event study
  • incidental parameter problem, overfitting
    • bias correction
  • large scale PCA, clsutering with application to economics
    • using clustering to alleviate IP problems
    • (dimension reduction in recommendation system)
  • heterogeneous causal effects
    • cross-validation
    • using neural-nets to estimate heregeneous returns
    • using causal-trees

Tools we will learn about:

  • create deployable environments, maintain code
    • we will work with conda
    • git and most likely github
    • testing and logging
  • Serialization
    • json, pickle, working with files
  • Working with databases
    • mainly pandas, but possibly we will expand to modin, dask and pyspark
    • megre, groupby and aggregate, method chaining
  • Plotting and reporting
  • Creating workflows
  • Automatic differentiation and Stochastic EM
  • Computing in the cloud


This is unfortunately a bit up in the air. I will put a lot of weight on the assignments. Unfortuantely, differentiating students based on assignments only is quite difficult.