Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-84 Limit and filters not applied correctly for dag runs API. #46572

Closed
1 of 2 tasks
tirkarthi opened this issue Feb 7, 2025 · 5 comments · Fixed by #46619
Closed
1 of 2 tasks

AIP-84 Limit and filters not applied correctly for dag runs API. #46572

tirkarthi opened this issue Feb 7, 2025 · 5 comments · Fixed by #46619
Assignees
Labels
AIP-84 Modern Rest API area:API Airflow's REST/HTTP API kind:bug This is a clearly a bug
Milestone

Comments

@tirkarthi
Copy link
Contributor

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

It seems the limit filter is not honored in the dagruns api as below. Though limit 14 is applied all dagruns are returned. This is slightly tricky to reproduce since the limit is passed correctly to API and doesn't work through UI but curl works. Ref #46504 (comment)

curl http://localhost:8000/public/dags/tutorial_taskflow_api/dagRuns\?limit\=14

What you think should happen instead?

No response

How to reproduce

  1. Visit the dags page with more than 14 runs and the filter is not honored.
  2. curl http://localhost:8000/public/dags/tutorial_taskflow_api/dagRuns\?limit\=14

Operating System

Ubuntu 20.04.3 LTS

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tirkarthi tirkarthi added AIP-84 Modern Rest API kind:bug This is a clearly a bug labels Feb 7, 2025
@dosubot dosubot bot added the area:API Airflow's REST/HTTP API label Feb 7, 2025
@insomnes
Copy link
Contributor

insomnes commented Feb 9, 2025

From your initial PR conversation, I understood you are referencing the part where you extract total_entries from the response, and it shows all total_entries despite the limit set to 14.

If so I believe that this number is not affected by pagination limits and offsets, the total_entries would be based on count from filters= fields.

    dag_run_select, total_entries = paginated_select(
        statement=query,
        filters=[logical_date, start_date_range, end_date_range, update_at_range, state],
        order_by=order_by,
        offset=offset,
        limit=limit,
        session=session,
    )

And with curl it works as you mention:

curl run for 2 entries
curl -X 'GET' \
      'http://localhost:29091/public/dags/example_bash_decorator/dagRuns?limit=2&order_by=-start_date' \
      -H 'accept: application/json' | jq
{
"dag_runs": [
  {
    "dag_run_id": "manual__2025-02-08T23:20:51.665298+00:00",
    "dag_id": "example_bash_decorator",
    "logical_date": "2025-02-08T23:20:51.673703Z",
    "queued_at": "2025-02-08T23:20:51.688648Z",
    "start_date": "2025-02-08T23:38:06.692506Z",
    "end_date": "2025-02-08T23:38:09.660110Z",
    "data_interval_start": "2025-02-08T23:20:51.673703Z",
    "data_interval_end": "2025-02-08T23:20:51.673703Z",
    "last_scheduling_decision": "2025-02-08T23:38:09.656040Z",
    "run_type": "manual",
    "state": "failed",
    "external_trigger": true,
    "triggered_by": "rest_api",
    "conf": {},
    "note": null
  },
  {
    "dag_run_id": "manual__2025-02-08T23:04:32.759689+00:00",
    "dag_id": "example_bash_decorator",
    "logical_date": "2025-02-08T23:04:32.759689Z",
    "queued_at": "2025-02-08T23:04:32.784971Z",
    "start_date": "2025-02-08T23:04:33.602888Z",
    "end_date": "2025-02-08T23:04:36.819802Z",
    "data_interval_start": "2025-02-08T23:04:32.759689Z",
    "data_interval_end": "2025-02-08T23:04:32.759689Z",
    "last_scheduling_decision": "2025-02-08T23:04:36.817639Z",
    "run_type": "manual",
    "state": "failed",
    "external_trigger": true,
    "triggered_by": "ui",
    "conf": {},
    "note": null
  }
],
"total_entries": 127
}

Or the problem is that API returns the wrong number of entries in dag_runs field?

@pierrejeambrun
Copy link
Member

I have identified the problem.

I should be able to open a PR to fix that.

@pierrejeambrun pierrejeambrun self-assigned this Feb 10, 2025
@pierrejeambrun pierrejeambrun moved this to In Progress in AIP-84 MODERN REST API Feb 10, 2025
@pierrejeambrun pierrejeambrun added this to the Airflow 3.0.0 milestone Feb 10, 2025
@pierrejeambrun pierrejeambrun changed the title Limit and filters not applied correctly for dag runs API. AIP-84 Limit and filters not applied correctly for dag runs API. Feb 10, 2025
@tirkarthi
Copy link
Contributor Author

Thanks @pierrejeambrun , from my understanding the QueryLimit and QueryOffset are module level variable initialised only once and using it on type annotations the same LimitFilter object gets used for requests. This is reproducible only in UI with parallel requests where the limit values of different APIs overwrite each other and the final value is taken into account. In curl since it's sequential the limit filter object is updated properly. This might be applicable to other filters as well constructed in similar manner.

@insomnes
Copy link
Contributor

Sorry for my previous question, but the exact problem was initially unclear to me.
Yesterday I was able to reproduce the behavior you described now with async requests to API with a single HTTP session, but didn't have time to explain what I found.

Great to see that the root cause was identified already by Pierre!

@pierrejeambrun
Copy link
Member

PR: #46619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-84 Modern Rest API area:API Airflow's REST/HTTP API kind:bug This is a clearly a bug
Projects
Development

Successfully merging a pull request may close this issue.

3 participants