Library crates should propagate errors instead of silently logging them

## Description

### Problem

The OpenTelemetry Rust SDK currently logs errors on behalf of applications, which is inappropriate for library crates. While [ADR-001](docs/adr/001_error_handling.md) provides error handling guidance, it includes a problematic allowance:

> Failures during regular operation should not panic, instead returning errors to the caller where appropriate, **_or_ logging an error if not appropriate**.

**This guidance needs to be updated.** For library crates, it is _never_ appropriate to log errors. Per [CONTRIBUTING.md](CONTRIBUTING.md), the SDK should either return errors to callers or delegate to a global error handler registered by the application. However, many codepaths are logging directly instead, leaving applications unable to respond to failures.

**Example from `span_processor.rs`:**

```rust
fn on_end(&self, span: SpanData) {
    let result = self.exporter.lock().map(|mut exporter| {
        exporter.export(vec![span])
    });

    if let Err(err) = result {
        otel_error!(
            name: "BatchSpanProcessor.Export.Error",
            error = format!("{:?}", err)
        );
    }
}
```

**Why this is problematic:**

Library crates logging on behalf of applications creates several problems:

1. **No error visibility**: Applications cannot detect, count, or respond to failures
2. **No integration**: Errors cannot be integrated with the application's monitoring, alerting, or metrics systems
3. **Inconsistent formatting**: Library logs don't match the application's logging format, style, or context (request IDs, etc.), causing confusion for operators and breaking log ingestion pipelines
4. **Policy violations**: The library makes policy decisions (what to log, when, how) that belong to the application

Standard Rust library crates (`std`, `tokio`, `serde`, etc.) return errors and let applications decide how to handle them. OpenTelemetry Rust should follow the same pattern.

### Proposed Solution

**For synchronous operations with a direct caller:**

- Return `OTelSdkResult` or appropriate error types defined in `opentelemetry-sdk::error`
- Let callers decide whether to log, retry, or propagate errors
- Aligns with existing `SpanExporter`, `LogExporter`, and `PushMetricExporter` traits which already return `OTelSdkResult`

**For background/asynchronous operations without a direct caller:**

- Implement an **error callback mechanism** via `with_error_handler()` on processor builders
- The callback is invoked when background tasks fail
- Users can then log, emit metrics, trigger alerts, or implement custom strategies

### Affected Areas

#### Traces (High Priority)

- [x] `opentelemetry-sdk/src/trace/span_processor.rs` - Remove error logging in batch/simple processors
- [x] `opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs` - Add error callback for background exports
- [x] `opentelemetry-sdk/src/trace/provider.rs` - Remove redundant error logging in shutdown

#### Metrics (High Priority)

- [ ] `opentelemetry/src/metrics/instruments.rs` - `InstrumentProvider` trait methods should return `Result`
- [ ] `opentelemetry-sdk/src/metrics/meter.rs` - Return errors instead of logging and creating no-op instruments
- [ ] `opentelemetry-sdk/src/metrics/meter_provider.rs` - Propagate shutdown errors per ADR-001 patterns
- [ ] Periodic reader implementations - Expose background export errors via error callback

#### Logs (High Priority)

- [ ] `opentelemetry-sdk/src/logs/log_processor.rs` - Make `LogProcessor::emit()` fallible
- [ ] `opentelemetry-sdk/src/logs/simple_log_processor.rs` - Return errors from emit operations
- [ ] `opentelemetry-sdk/src/logs/log_processor_with_async_runtime.rs` - Add error callback for background processing

#### Other

- [ ] `opentelemetry-zipkin/src/exporter/env.rs` - Replace `eprintln!` with proper error returns
- [ ] Update examples to demonstrate proper error handling
- [ ] Update tests to verify error propagation

### Implementation Strategy

1. **Phase 1: Traces**
   - Remove all `otel_error!`, `otel_warn!`, `otel_debug!` calls that mask export failures
   - Add `with_error_handler()` to `BatchSpanProcessorBuilder`
   - Background export errors invoke user-provided callback
   - Synchronous operations return `OTelSdkResult`

2. **Phase 2: Metrics**
   - Update trait definitions to return `Result` types per ADR-001 guidance
   - Implement error callbacks for periodic readers
   - Update meter implementation to propagate errors from instrument creation

3. **Phase 3: Logs**
   - Make `LogProcessor::emit()` fallible where appropriate
   - Add error callbacks for async log processors
   - Update log appenders to propagate errors

4. **Phase 4: Documentation & Examples**
   - **Update ADR-001** to remove the allowance for logging errors in library crates
   - Update all examples to demonstrate proper error handling
   - Add migration guide documenting breaking changes
   - Document error callback patterns and best practices

### Backward Compatibility

This is a **breaking change** that will require:

- Minor version bump (0.x -> 0.y, as the crate is pre-1.0)
- Migration guide for users updating from previous versions
- Updated examples and documentation
- **Update to ADR-001** clarifying that library crates must never log errors on behalf of applications

However, the benefits justify the breaking change:

- Proper library design following Rust best practices and standard library conventions
- Better error visibility and control for applications
- Enables custom error handling strategies (retry, metrics, alerting)
- Improved debuggability and observability in production

### Additional Context

**Why ADR-001 allows logging:**

The allowance for logging "where errors cannot be returned" likely stems from background operations where there's no direct caller. However, the solution is **not** to log, but rather to:

- Use error callbacks that applications can register
- Delegate to a global error handler if one is registered
- Return errors wherever possible

**Technical feasibility:**

- The OpenTelemetry specification requires operations like `on_end()` to be fast and non-blocking, but **does not mandate void return types**
- Returning an error does not violate the non-blocking requirement
- Error callbacks provide a way to handle background failures without blocking the hot path
- This change aligns with the existing exporter traits (`SpanExporter`, `LogExporter`, `PushMetricExporter`) which already return `OTelSdkResult` from their methods

---

**Related:** This aligns with Rust's error handling best practices, the Rust standard library's patterns, and the principle that libraries should be "honest" about failures, letting applications make all policy decisions about logging, retrying, and error handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Library crates should propagate errors instead of silently logging them #3210

Description

Problem

Proposed Solution

Affected Areas

Traces (High Priority)

Metrics (High Priority)

Logs (High Priority)

Other

Implementation Strategy

Backward Compatibility

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Library crates should propagate errors instead of silently logging them #3210

Description

Description

Problem

Proposed Solution

Affected Areas

Traces (High Priority)

Metrics (High Priority)

Logs (High Priority)

Other

Implementation Strategy

Backward Compatibility

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions