Preliminary notes on synthetic control

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = pd.read_csv('../data/cigs.csv')
data

	state	year	cigs	loggdppercap	retprice	pctage1524	beerpercap	cigsalespercap88	cigsalespercap80	cigsalespercap75
0	AL	1970	89.8	9.679	89.344	0.175	18.96	112.1	123.2	111.7
1	AL	1971	95.4	9.679	89.344	0.175	18.96	112.1	123.2	111.7
2	AL	1972	101.1	9.679	89.344	0.175	18.96	112.1	123.2	111.7
3	AL	1973	102.9	9.679	89.344	0.175	18.96	112.1	123.2	111.7
4	AL	1974	108.2	9.679	89.344	0.175	18.96	112.1	123.2	111.7
...	...	...	...	...	...	...	...	...	...	...
1204	WY	1996	110.3	9.914	81.000	0.174	24.98	114.3	158.1	160.7
1205	WY	1997	108.8	9.914	81.000	0.174	24.98	114.3	158.1	160.7
1206	WY	1998	102.9	9.914	81.000	0.174	24.98	114.3	158.1	160.7
1207	WY	1999	104.8	9.914	81.000	0.174	24.98	114.3	158.1	160.7
1208	WY	2000	90.5	9.914	81.000	0.174	24.98	114.3	158.1	160.7

1209 rows × 10 columns

dcal = data.query("state=='CA'")
plt.plot(dcal['year'],dcal['cigs']) 
plt.axvline(1988,linestyle=":")
plt.show()

png

dcal = data.query("state=='CA'")
plt.plot(dcal['year'],dcal['cigs'],label="CA") 
data.query("state != 'CA'").groupby('year')['cigs'].mean().plot(label="US")
plt.axvline(1988,linestyle=":") 
plt.legend()
plt.show()

png

Checking pretrends

We can regress using the other states as control.

data['D'] = (data['state']=='CA') & (data['year']>=1988)
data['T'] = (data['state']=='CA')

To evaluate the pre-trend, we can estimate the coffecient of the interaction with the values leading to the event.

import statsmodels.formula.api as smf

# Initialise and fit linear regression model using `statsmodels`
model = smf.ols("cigs ~ C(year)  + C(year):C(T)", data=data)
model = model.fit()

data_cal = data.query("state == 'CA'").copy()
data_cal['dep'] = model.predict(data_cal) - model.predict( data_cal.eval("T=False") )

data_cal.plot(x='year',y='dep',label="Treat")
plt.axvline(x=1988, linestyle=':',label="policy")
plt.legend()

<matplotlib.legend.Legend at 0x7f60f1d68d90>

png

Construct non treated outcome using observables

In this section we think about directly constructing a control group as a weighted sum of available non treated units. A simple choice would be to pick the state with the most similar observables. Or we could weight becased on the distance on obserbales.

Let's try both of these approaches.

Next we might want to change the way we construct our non-treated counterfactual for California. First we could choose to use observables characteristics to be added to our framework.

loggdppercap retprice pctage1524 beerpercap

# we start by computing pair-wise distances between observations