Skip to content

box_events: fix handling of large cursor offsets #14319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 29, 2025

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Jun 25, 2025

Proposed commit message

box_events: fix handling of large cursor offsets

When a cursor stream offset is large — at least 1e6, the template renders
the value in e-notation. This is a consequence of the cursor being stored
as JSON and so being contaminated by JS number semantics. Another
threshold exists at 0x1p53 (4.5e15) where we lose exact integer
representation. We do see values as large as 3.0e16, so we are beyond
this value and cannot rely on numeric value representation at all.
This is exacerbated by the fact that the input converts from string to
integer values via float64.

To resolve this, explicitly convert the offset to an integer when
rendering the value into the parameter, and accept that we may either
recollect or miss documents from the API.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@efd6 efd6 self-assigned this Jun 25, 2025
@efd6 efd6 added Integration:box_events Box Events bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Jun 25, 2025
When a cursor stream offset is large — at least 1e6, the template renders
the value in e-notation. This is a consequence of the cursor being stored
as JSON and so being contaminated by JS number semantics. Another
threshold exists at 0x1p53 (4.5e15) where we lose exact integer
representation. We do see values as large as 3.0e16, so we are beyond
this value and cannot rely on numeric value representation at all.
This is exacerbated by the fact that the input converts from string to
integer values via float64.

To resolve this, explicitly convert the offset to an integer when
rendering the value into the parameter, and accept that we may either
recollect or miss documents from the API.
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@efd6 efd6 marked this pull request as ready for review June 25, 2025 22:45
@efd6 efd6 requested a review from a team as a code owner June 25, 2025 22:45
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@@ -9,7 +9,8 @@ vars:
# correspond to data_stream
data_stream:
vars:
interval: 10s
stream_type: 'all'
enable_request_tracer: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move enable_request_tracer to be a child of vars instead data_stream.vars. Currently it's not being honored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not identified by ep? ISTM it is something that could (probably does) happen regularly without mechanical support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type": "event"
}
],
"next_stream_position": 2152922976252290800
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the request tracer logs I see ?stream_position=2152922976252290816 so I think we have lost precision.

{"log.level":"debug","@timestamp":"2025-06-27T18:08:05.810Z","message":"HTTP request","transaction.id":"HN21F8SIV161G-5","url.original":"http://svc-box-http:8080/2.0/events?stream_position=2152922976252290816&stream_type=all","url.scheme":"http","url.path":"/2.0/events","url.domain":"svc-box-http","url.port":"8080","url.query":"stream_position=2152922976252290816&stream_type=all","http.request.method":"GET","http.request.header":{"Accept":["application/json"],"Authorization":["Bearer c3FIOG9vSGV4VHo4QzAyg5T1JvNnJoZ3ExaVNyQWw6WjRsanRKZG5lQk9qUE1BVQ"],"User-Agent":["Elastic-Filebeat/8.18.2 (linux; arm64; 2651640ff23044732e551dd9139a298e0f833ac1; 2025-05-22 17:09:10 +0000 UTC)"]},"user_agent.original":"Elastic-Filebeat/8.18.2 (linux; arm64; 2651640ff23044732e551dd9139a298e0f833ac1; 2025-05-22 17:09:10 +0000 UTC)","http.request.body.content":"","http.request.body.truncated":false,"http.request.body.bytes":0,"http.request.mime_type":"","ecs.version":"1.6.0"}

This cursor on disk has:

{"k":"httpjson::httpjson-box_events.events-20eb7aed-40ef-4cca-bccb-d27053fcd2dc::http://svc-box-http:8080/2.0/events","v":{"ttl":1800000000000,"updated":[809677759,1751046234],"cursor":{"next_stream_position":"2.1529229762522908e+18"}}}

So I assume that httpjson is not unmarshaling with json.UseNumber. Without using json.Number and avoiding the number -> float64 -> int64 conversion, I'm not sure we can fix this with configuration only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the fix here is a reasonably non-invasive fix to something that is the consequence of some quite unfortunate decisions that are spread throughout the agent, the JSON serialisation spec and the data source. This is all discussed in the issue.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better than it was.

I think an input change will be necessary to avoid the mandatory conversion to float64 so that we can pass through the next_stream_position as the literal text of the number.

@efd6 efd6 enabled auto-merge (squash) June 29, 2025 21:42
@efd6 efd6 merged commit d877b4c into elastic:main Jun 29, 2025
5 checks passed
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @efd6

Copy link

@elastic-vault-github-plugin-prod

Package box_events - 2.14.1 containing this change is available at https://epr.elastic.co/package/box_events/2.14.1/

shmsr pushed a commit to shmsr/integrations that referenced this pull request Jun 30, 2025
When a cursor stream offset is large — at least 1e6, the template renders
the value in e-notation. This is a consequence of the cursor being stored
as JSON and so being contaminated by JS number semantics. Another
threshold exists at 0x1p53 (4.5e15) where we lose exact integer
representation. We do see values as large as 3.0e16, so we are beyond
this value and cannot rely on numeric value representation at all.
This is exacerbated by the fact that the input converts from string to
integer values via float64.

To resolve this, explicitly convert the offset to an integer when
rendering the value into the parameter, and accept that we may either
recollect or miss documents from the API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:box_events Box Events Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants