-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi there,
Thank you for putting together this package! I'm trying to estimate several two-way models for my application and I get puzzling results. Here's a summary by case.
1. Standard AKM, without collapsing at the spell level and using leave-out-observation methods for HE.
HE-corrected estimates do not sum to one. The more suspicious change relative to HO-correction is var(eps), but is not obvious that that's where the problem is.

These is my pytwoway setup for these results:
# Prepare input DataFrame for PyTwoWay
input_df = pd.DataFrame({
'i': panel['cluster_id'].values,
'j': panel['hgroup_id_temp'].values,
'y': panel['outcome'].values,
't': panel['year'].values
})
# Set parameters
clean_params = bpd.clean_params({
'connectedness': 'leave_out_observation',
'drop_single_stayers': True,
'collapse_at_connectedness_measure': False,
'copy': False
})
bdf = bpd.BipartiteDataFrame(input_df, track_id_changes=True)
# Clean the data
bdf = bdf.clean(clean_params)
# Estimation params
fe_params = tw.fe_params({
'ncore': 12,
'ndraw_trace_sigma_2': 200,
'ndraw_trace_ho': 200,
'he': True,
'ndraw_trace_he': 200,
'ndraw_lev_he': 1800,
'attach_fe_estimates': True,
'preconditioner': 'ichol'
})
# Run model and save results
# Initialize and fit the FE estimator with controls
fe_params['Q_var'] = [tw.Q.VarPsi(), tw.Q.VarAlpha()]
fe_params['Q_cov'] = [tw.Q.CovPsiAlpha()]
fe_estimator = tw.FEEstimator(bdf, fe_params)
fe_estimator.fit()
2. Standard AKM, collapsing at the spell level and using leave-out-spell methods for HE.
I run exactly the same code on the same data as above but change the clean_params to:
clean_params = bpd.clean_params({
'connectedness': 'leave_out_spell',
'drop_single_stayers': True,
'collapse_at_connectedness_measure': True,
'copy': False
})
Yet, my results are even more puzzling. var(eps) now exceeds var(y) for all models---even FE!---, and by a large amount. var(alpha)_he is negative, which clearly cannot be.
3. BLM with no controls and without collapsing at the spell level (analogous to case 1)
My setup is very basic:
input_df = pd.DataFrame({
'i': panel['cluster_id'],
'j': panel['hgroup_id_temp'],
'y': panel['patents'],
't': panel['year']
})
nl = 3 # Number of worker types
nk = 4 # Number of firm types
blm_params = tw.blm_params({
'nl': nl, 'nk': nk,
'a1_mu': 0, 'a1_sig': 1.5, 'a2_mu': 0, 'a2_sig': 1.5,
's1_low': 0.5, 's1_high': 1.5, 's2_low': 0.5, 's2_high': 1.5
})
cluster_params = bpd.cluster_params({
'measures': bpd.measures.CDFs(),
'grouping': bpd.grouping.KMeans(n_clusters=nk),
'is_sorted': True,
'copy': False
})
clean_params = bpd.clean_params({
'drop_returns': 'returners',
'copy': False
})
bdf = bpd.BipartiteDataFrame(input_df, track_id_changes=True)
bdf = bdf.clean(clean_params).collapse(is_sorted=True)
bdf = bdf.cluster(cluster_params)
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
movers = bdf.get_worker_m(is_sorted=True)
jdata = bdf.loc[movers,:]
sdata = bdf.loc[~movers,:]
blm_fit = tw.BLMEstimator(blm_params)
blm_fit.fit(jdata=jdata, sdata=sdata, n_init=100, n_best=5, ncore=12)
blm_model = blm_fit.model
var_decomp = tw.BLMVarianceDecomposition(blm_params)
var_decomp.fit(jdata=jdata, sdata=sdata, blm_model=blm_model, n_samples=100, Q_var=[tw.Q.VarPsi(), tw.Q.VarAlpha()], ncore=12)
results = {k: np.mean(v) for k, v in var_decomp.res.items()}
results_df = pd.DataFrame.from_dict(results, orient='index', columns=['mean_value'])
results_df.to_csv(temp_dir / "blm_variance_decomposition.csv")
Once again, var(eps) exceeds var(y):
What do you think could be going on? And do you have any suggestions for fixing it?
I haven't noticed a big difference by increasing the number of draws and iterations from the default to the level I have now. They're not as high as they can be---I kept them manageable as I was still tying out different approaches. Do you think more draws/iterations will help? How high would you go? For reference, in my non-collapsed connected set I have about 3 million observations for 265k workers and 30k firms.
Could it be a version compatibility issues with some of the package dependencies?

