variance decomposition issues across several models

Hi there,

Thank you for putting together this package! I'm trying to estimate several two-way models for my application and I get puzzling results. Here's a summary by case.

**1. Standard AKM, without collapsing at the spell level and using leave-out-observation methods for HE.**

HE-corrected estimates do not sum to one. The more suspicious change relative to HO-correction is var(eps), but is not obvious that that's where the problem is. 
![image](https://github.com/user-attachments/assets/f5a5fb6f-7af3-4e2f-a43a-9f0800ac6240)

These is my pytwoway setup for these results:
```
# Prepare input DataFrame for PyTwoWay
    input_df = pd.DataFrame({
        'i': panel['cluster_id'].values,
        'j': panel['hgroup_id_temp'].values,
        'y': panel['outcome'].values,
        't': panel['year'].values
    })
    
    # Set parameters
    clean_params = bpd.clean_params({
        'connectedness': 'leave_out_observation',
        'drop_single_stayers': True,
        'collapse_at_connectedness_measure': False,
        'copy': False
    })

    bdf = bpd.BipartiteDataFrame(input_df, track_id_changes=True)
    
    # Clean the data
    bdf = bdf.clean(clean_params)

    # Estimation params
    fe_params = tw.fe_params({
        'ncore': 12,
        'ndraw_trace_sigma_2': 200,
        'ndraw_trace_ho': 200,
        'he': True,
        'ndraw_trace_he': 200,
        'ndraw_lev_he': 1800,
        'attach_fe_estimates': True,
        'preconditioner': 'ichol'
    })

    # Run model and save results
    
    # Initialize and fit the FE estimator with controls
    fe_params['Q_var'] = [tw.Q.VarPsi(), tw.Q.VarAlpha()]  
    fe_params['Q_cov'] = [tw.Q.CovPsiAlpha()]
    
    fe_estimator = tw.FEEstimator(bdf, fe_params)
    fe_estimator.fit()
```

**2. Standard AKM, collapsing at the spell level and using leave-out-spell methods for HE.**

I run exactly the same code on the same data as above but change the clean_params to:

```   
clean_params = bpd.clean_params({
        'connectedness': 'leave_out_spell',
        'drop_single_stayers': True,
        'collapse_at_connectedness_measure': True,
        'copy': False
    })
```

Yet, my results are even more puzzling. var(eps) now exceeds var(y) for all models---even FE!---, and by a large amount. var(alpha)_he is negative, which clearly cannot be.
 
![image](https://github.com/user-attachments/assets/29153621-8f77-41d7-83c1-3058608007c6)

**3. BLM with no controls and without collapsing at the spell level (analogous to case 1)**

My setup is very basic:

```
input_df = pd.DataFrame({
        'i': panel['cluster_id'],
        'j': panel['hgroup_id_temp'],
        'y': panel['patents'],
        't': panel['year']
    })

    nl = 3 # Number of worker types
    nk = 4 # Number of firm types
    blm_params = tw.blm_params({
        'nl': nl, 'nk': nk,
        'a1_mu': 0, 'a1_sig': 1.5, 'a2_mu': 0, 'a2_sig': 1.5,
        's1_low': 0.5, 's1_high': 1.5, 's2_low': 0.5, 's2_high': 1.5
    })
    cluster_params = bpd.cluster_params({
        'measures': bpd.measures.CDFs(),
        'grouping': bpd.grouping.KMeans(n_clusters=nk),
        'is_sorted': True,
        'copy': False
    })
    clean_params = bpd.clean_params({
        'drop_returns': 'returners',
        'copy': False
    })
    bdf = bpd.BipartiteDataFrame(input_df, track_id_changes=True)
    bdf = bdf.clean(clean_params).collapse(is_sorted=True)
    bdf = bdf.cluster(cluster_params)
    bdf = bdf.to_eventstudy(is_sorted=True, copy=False)

    movers = bdf.get_worker_m(is_sorted=True)
    jdata = bdf.loc[movers,:]
    sdata = bdf.loc[~movers,:]

    blm_fit = tw.BLMEstimator(blm_params)
    blm_fit.fit(jdata=jdata, sdata=sdata, n_init=100, n_best=5, ncore=12)
    blm_model = blm_fit.model

    var_decomp = tw.BLMVarianceDecomposition(blm_params)
    var_decomp.fit(jdata=jdata, sdata=sdata, blm_model=blm_model, n_samples=100, Q_var=[tw.Q.VarPsi(), tw.Q.VarAlpha()], ncore=12)

    results = {k: np.mean(v) for k, v in var_decomp.res.items()}
    results_df = pd.DataFrame.from_dict(results, orient='index', columns=['mean_value'])
    results_df.to_csv(temp_dir / "blm_variance_decomposition.csv")
```

Once again, var(eps) exceeds var(y):
    
![image](https://github.com/user-attachments/assets/8a1b74f5-f901-4ff3-ba92-da12937c8531)

**What do you think could be going on?** And do you have any suggestions for fixing it?
I haven't noticed a big difference by increasing the number of draws and iterations from the default to the level I have now. They're not as high as they can be---I kept them manageable as I was still tying out different approaches. Do you think more draws/iterations will help? How high would you go? For reference, in my non-collapsed connected set I have about 3 million observations for 265k workers and 30k firms.

Could it be a version compatibility issues with some of the package dependencies?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variance decomposition issues across several models #60

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

variance decomposition issues across several models #60

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions