-
Notifications
You must be signed in to change notification settings - Fork 63
HAC Standard Errors #1000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HAC Standard Errors #1000
Conversation
Regarding tests, you can take a look at these: pyfixest/tests/test_vs_fixest.py Line 207 in 72b57a7
Basically, we call r-fixest via rpy2 and then compare results. We can write a similar test for HAC and test fewer formulas? |
… (else numba does not compile)
To do's:
Also, should we require a time_id value to be set in the vcov_kwargs for the HAC estimator? Advantage: we make very explicit what happens. Disadvantage: if users run multiple models, just sorting the data once before the feols() should increase model performance? Maybe a compromise would be to add a warning to inform users about the default behavior when they do not add a time_id arg? |
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 37 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Hmm, I can definitely see the advantage of requiring the time_id value. But would the performance be that improved? I will add to the documentation. |
@s3alfisc I'm working on the implementation for the Newey West Panel meat. Let me know what you think of this function for checking balance, it's different from what
|
Oh wait a second, I actually have some code to share! Will push it now. One sec |
Changed the following:
|
What we still have to do:
|
"The function argument `separation_check` must be a list of strings containing 'fe' and/or 'ir'." | ||
) | ||
|
||
if vcov_kwargs is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should keep all these checks here, or move them to the vcov
method? Because users can call vcov post-estimation and provide new arguments, and we would like to check these as well?
""" | ||
return _capture_context(context + 2) if isinstance(context, int) else context | ||
|
||
def _check_balanced(panel_arr: np.ndarray, time_arr: np.ndarray) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test for it?
Btw, I think we could also have easily done this in pandas with a groupby(["panel_id", "time_id"]).size()
and throw an error if we ever see a result that is not one?
Plus, please use the @njit decorator + maybe we can parallelize?
Other comment: as we are sorting in the _meat_hac_panel function already, maybe we should call this function there and then make use of the fact that we have already sorted once? This should make things more efficient right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't need this anymore after your latest commit
I think we should just implement DK-HAC as well in this PR while we still have this fresh in our minds. This way we can resolve more issues and potentially even push cleaner and more optimized code. I'll take a look at the nesting thing in fixest and see what we're missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to implement the same weight logic as nw_meat
and fixest
to this with w[0]=0.5
and the loop going from 0, lag+1
Looks like fixest by default treats all time lags as consecutive: link. Edit: looks like these have only been added with fixest 0.13, which is currently in dev. I suggest that we wait until fixest 0.13.0 is released and then add all of this logic for handling lags and assume that, until then, time series data is always consecutive? |
Sounds good to me.
Yes, though I think it is not 100% needed as we basically match fixest already? |
pyfixest/estimation/vcov_utils.py
Outdated
|
||
time_periods, k = ordered_scores.shape | ||
time_scores = np.zeros((time_periods, k)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this aggregation is wrong, no? We need to aggregate over a fixed t for each unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 707909a
New clean PR with updated for changes to the master. Addresses #675. @s3alfisc let me know what you think. Hoping to wrap this up this weekend.
If you could review the
_nw_meat
function invcov_utils.py
that would be great. Just want to confirm my understanding and the implementation.