Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch OTLP endpoints to ingest v2 #5283

Merged
merged 11 commits into from
Aug 23, 2024
Merged

Switch OTLP endpoints to ingest v2 #5283

merged 11 commits into from
Aug 23, 2024

Conversation

rdettai
Copy link
Collaborator

@rdettai rdettai commented Jul 31, 2024

Description

Closes #5209

How was this PR tested?

Integration tests

@rdettai rdettai self-assigned this Jul 31, 2024
@rdettai rdettai marked this pull request as draft July 31, 2024 16:32
@rdettai rdettai changed the title Boilerplate for switching OTLP to ingest v2 Switching OTLP endpoints to ingest v2 Jul 31, 2024
@rdettai rdettai changed the title Switching OTLP endpoints to ingest v2 SwitchOTLP endpoints to ingest v2 Jul 31, 2024
@rdettai rdettai changed the title SwitchOTLP endpoints to ingest v2 Switch OTLP endpoints to ingest v2 Jul 31, 2024
Copy link

github-actions bot commented Jul 31, 2024

On SSD:

Average search latency is 0.99x that of the reference (lower is better).
Ref run id: 2963, ref commit: 82c475b
Link

On GCS:

Average search latency is 0.939x that of the reference (lower is better).
Ref run id: 2964, ref commit: 82c475b
Link

@rdettai
Copy link
Collaborator Author

rdettai commented Aug 1, 2024

Integration tests for OTLP are hanging on shutdown

[INFO  quickwit_ingest::ingest_v2::ingester] decommissioning ingester

Likely related to #5068

@rdettai
Copy link
Collaborator Author

rdettai commented Aug 1, 2024

After some further analysis, it seems that the ingester decomissioning freeze happens when we stop the cluster right after the first ingestions. Likely race condition:

  • indexer has no shard, no ingest v2 indexing pipeline is scheduled
  • indexer receives events, creates a shard
  • the control plane is shut down before rebuilding the indexing plan, so the pipeline is never scheduled for the shard
  • the indexer hangs on shutdown because the shard is never indexed

@rdettai
Copy link
Collaborator Author

rdettai commented Aug 1, 2024

Adding a wait after the ingest to give time to the control plane to schedule an indexing pipeline works but:

  • on some runs, the indexing pipeline becomes irresponsive (no extra logs)
  • on runs where the indexing goes through, the decommissioning of the indexer still fails. It looks like it tries to index a new empty batch right in the middle of the teardown, and that gets stuck forever because the control plane and metastore are not available anymore:
[INFO  quickwit-indexing] index-doc-batches; index_id=otel-logs-v0_7 source_id=_ingest-source pipeline_uid=01J47CHTZPH4SX2J4D6HSVY5RH workbench_id=01J47CJDJEAQBP1NHP0GJTGDMP

This can be reproduced with:

cargo test --package quickwit-integration-tests --lib --all-features --tests::index_tests::test_ingest_logs_with_otlp_grpc_api --exact --show-output

Note: in the test_ingest_v2_happy_path test if we don't delete the index before shutting down the cluster we end up with the same error (trying to index an empty batch late in the tear down process)

@rdettai rdettai marked this pull request as ready for review August 2, 2024 13:41
@rdettai rdettai requested a review from fmassot August 2, 2024 13:44
@rdettai
Copy link
Collaborator Author

rdettai commented Aug 2, 2024

Note: the issue mentioned above

on some runs, the indexing pipeline becomes irresponsive (no extra logs)

was not addressed. I didn't manage to consistently reproduce or get more logs.

Copy link
Collaborator

@fmassot fmassot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

for span in spans {
if let Err(error) = doc_batch_builder.ingest_doc(&span.0) {
let doc_uid = doc_uid_generator.next_doc_uid();
if let Err(error) = doc_batch_builder.add_doc(doc_uid, span.0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: we could have a add_json_doc function on the DocBatchV2Builder.

Copy link
Collaborator Author

@rdettai rdettai Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied over the JsonDocBatchBuilder logic from V1. To use DocBatchV2Builder directly, we would either:

  1. need to first serialize the struct then add thoses bytes to the BytesMut, but that means an extra copy
  2. also use a Writer<BytesMut> in DocBatchV2Builder, but I think it is not 0 cost, otherwise it would have been done that way in V1 (going through the writer involves some extra if statements)

Copy link
Collaborator Author

@rdettai rdettai Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the other review comments, I simplified the JsonDocBatchV2Builder as much as possible by avoiding the unnecessary forth and back with DocBatchV2Builder. I would prefer not to unify the two structures without ensuring that there isn't a performance impact as this is on the critical write path. But measuring this could be a bit complicated. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for me!

@rdettai rdettai requested a review from fmassot August 19, 2024 14:10
@rdettai rdettai requested a review from fmassot August 22, 2024 13:15
@rdettai rdettai enabled auto-merge (squash) August 23, 2024 08:00
@rdettai rdettai merged commit d43f013 into main Aug 23, 2024
5 checks passed
@rdettai rdettai deleted the otlp-ingest-v2 branch August 23, 2024 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plug OTLP to ingest v2
2 participants