-
Notifications
You must be signed in to change notification settings - Fork 1k
Implement Library Instrumentation for Apache Iceberg #15114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
.../library/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/ScanMetricsBuilder.java
Outdated
Show resolved
Hide resolved
.../library/src/test/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergLibraryTest.java
Outdated
Show resolved
Hide resolved
.../library/src/test/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergLibraryTest.java
Outdated
Show resolved
Hide resolved
| implementation("org.apache.iceberg:iceberg-core:1.8.1") { | ||
| artifact { | ||
| classifier = "tests" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this second implementation needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaydeluca In the abstract unit test class AbstractIcebergTest, I am using the public classes TestTables and TestTable that are defined in the src/test/java directory of the iceberg core project, so the second implementation is meant to give me access to these classes in the unit tests of the testing project.
As a side note: I am not really an expert in Gradle, so it could really be that a better approach exists...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be ok, but it might be brittle for us to have a dependency on an artifact that they are not publicly publishing. I don't know anything about iceberg, but what if we replaced the test setup with hadoop? if we add the implementation dependency here:
implementation("org.apache.hadoop:hadoop-common:3.4.2")
and then instead of using the test table, you do:
Configuration conf = new Configuration();
this.tables = new HadoopTables(conf);
String tableLocation = new File(tableDir, "test").toString();
this.table =
tables.create(
SCHEMA,
SPEC,
Collections.singletonMap("format-version", String.valueOf(FORMAT_VERSION)),
tableLocation);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaydeluca yes this works, very good idea! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaydeluca It turns out that hadoop won't work with a high JDK level without modifying some security settings (see screenshot for concrete error). Therefore, I added maxJavaVersionForTests.set(JavaVersion.VERSION_21) to build.gradke.kts, and with this, the tests work fine.
Alternatively, we can go back to depending on test library of Iceberg if you want. I don't see where the problem is since the dependency is only for testing, and we control the exact iceberg version that we are using for testing. What do you think?

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think using the test dependency is probably preferable compared to not testing against newer versions of java, so It's probably best to switch back. Can you just include a small comment above the gradle config saying something like "Test classes are not published by default" or something similar that explains why it is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaydeluca I reverted back to using the Iceberg test classes and added the required comment in the gradle file
Co-authored-by: Jay DeLuca <[email protected]>
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
.../library/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/ScanMetricsBuilder.java
Outdated
Show resolved
Hide resolved
....8/library/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergTelemetry.java
Outdated
Show resolved
Hide resolved
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
laurit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...erg-1.8/library/src/test/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergTest.java
Outdated
Show resolved
Hide resolved
...testing/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/AbstractIcebergTest.java
Outdated
Show resolved
Hide resolved
...testing/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/AbstractIcebergTest.java
Outdated
Show resolved
Hide resolved
...testing/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/AbstractIcebergTest.java
Outdated
Show resolved
Hide resolved
...testing/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/AbstractIcebergTest.java
Outdated
Show resolved
Hide resolved
....8/library/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergTelemetry.java
Show resolved
Hide resolved
| implementation("org.apache.iceberg:iceberg-core:1.8.1") { | ||
| artifact { | ||
| classifier = "tests" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be ok, but it might be brittle for us to have a dependency on an artifact that they are not publicly publishing. I don't know anything about iceberg, but what if we replaced the test setup with hadoop? if we add the implementation dependency here:
implementation("org.apache.hadoop:hadoop-common:3.4.2")
and then instead of using the test table, you do:
Configuration conf = new Configuration();
this.tables = new HadoopTables(conf);
String tableLocation = new File(tableDir, "test").toString();
this.table =
tables.create(
SCHEMA,
SPEC,
Collections.singletonMap("format-version", String.valueOf(FORMAT_VERSION)),
tableLocation);
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
...rary/src/main/java/io/opentelemetry/instrumentation/iceberg/v1_8/IcebergMetricsReporter.java
Outdated
Show resolved
Hide resolved
…est against latest JDK version
Co-authored-by: Jay DeLuca <[email protected]>
This PR aims at creating library instrumentation for Apache Iceberg scan metrics. The figure above summarizes the planned solution (the green classes constitute the instrumentation).
The Apache Iceberg API currently emits two types of metrics:
ScanMetricsthat are emitted when a table scan is executed, andCommitMetricsthat are emitted when modifications to the data or metadata are executed, e.g., insert, update, delete, drop column, time travel, etc.ScanMetricsare straightforward to report using a custom reporter. We can programmatically inject such a reporter into an exitingScanobject as described here. On the contrary,CommitMetricsare difficult to report using a custom reporter due to the absence of a programmatical interface to inject such a reporter. A configuration-based approach exists for this purpose as described here. This PR focuses onScanMetricsonly and creates a library-based instrumentation for them.CommitMetricsand agent-based instrumentation will be the subject of future PRs. Therefore, this PR partially solves #15113