Skip to content

[Warning] Usage of query versus loc in chained operators #132

@Eric-Liu-SANDAG

Description

@Eric-Liu-SANDAG

Describe the warning

IMO, we should switch away from using query and fully switch to use loc when filtering data in chained operators. As an example, see this comment

To summarize, not only does loc seem to be faster, but it is also more clear for both users and automated tools exactly what is going on. In terms of users, query uses non-python and non-pandas syntax, especially how query interprets and as being the bitwise operator & instead of the actual Python boolean operator and. In terms of automated tools, query requires you to put your query into a string, which means that linters and what not can't detect all the usages of a variable. This can be annoying when refactoring, as the tool would miss some occurrences, but more importantly it also hides variables from debugging tools

Oh also from the link above, it's also about 50% slower when doing simple operations on large (10M rows) datasets. But some more rigorous investigation should be done here as apparently, this was not always the case

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions