-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Describe the warning
IMO, we should switch away from using query and fully switch to use loc when filtering data in chained operators. As an example, see this comment
To summarize, not only does loc seem to be faster, but it is also more clear for both users and automated tools exactly what is going on. In terms of users, query uses non-python and non-pandas syntax, especially how query interprets and as being the bitwise operator & instead of the actual Python boolean operator and. In terms of automated tools, query requires you to put your query into a string, which means that linters and what not can't detect all the usages of a variable. This can be annoying when refactoring, as the tool would miss some occurrences, but more importantly it also hides variables from debugging tools
Oh also from the link above, it's also about 50% slower when doing simple operations on large (10M rows) datasets. But some more rigorous investigation should be done here as apparently, this was not always the case