Clarifying Sorkin cleaning procedure

Hello! Thanks for putting together this package. 

I'm following the instructions for the Sorkin estimator [here](https://hub.2i2c.mybinder.org/user/tlamadon-pytwoway-f670pbis/notebooks/docs/source/notebooks/sorkin_example.ipynb).  

In my actual sample, there are workers who might exit the sample, and then return. So, for example, I might see a worker move from firm A to firm B, then from firm C to firm D, but miss their move from firm B to firm C. 

Because Sorkin is modeling these transitions as discrete choice, I thought that the strongly connected set should be constructed only using **visible** EE moves. (One important problem, for example, would be if the worker moved from B to unemployment before moving to C.) 

The current order of steps is 
```python
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
```

I think with this order of steps, the cleaning step that finds the strongly connected set occurs before the conversion to event study format, so then the actual events may not form a strongly connected set. Would this be fixed by doing the event study conversion before the clean? Would this be a valid procedure? 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifying Sorkin cleaning procedure #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarifying Sorkin cleaning procedure #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions