feat(observability): add Prometheus metrics, audit logging, and OpenTelemetry tracing#21
Conversation
DeFiVC
left a comment
There was a problem hiding this comment.
Solid observability implementation that covers metrics, audit logging, and tracing — but must resolve merge conflicts before merging.
Blocking Issues
-
❌ Merge Conflicts —
mergeStateStatus: "DIRTY". The PR is based on stalemain.src/server.tshas new health check endpoints (/health,/health/live,/health/ready) and astellarClientimport that the PR's diff doesn't account for. Rebase needed:git fetch origin git rebase origin/main # resolve conflicts in src/server.ts git rebase --continue -
⚠️ CI Not Run —statusCheckRollupis empty (likely due to conflict state). Must verifynpm run typecheck,npm run lint, andnpm testpass after rebase.
Code Issues
-
Missing system gauges — Issue #9 requested
db_active_connectionsandredis_connectedgauges for infrastructure health. Not implemented. -
src/audit/index.ts— MissingipanduserAgentfields inAuditFieldsfor security event tracing (issue #9 specified these for failed auth attempts).
What's Good
- Clean module separation:
metrics/,audit/,tracing/ - Correct metric types: counters for totals, histograms for latency with sensible buckets
- Audit logging covers all financial events (reward claims, credential mints, quiz submissions)
- Tracing disabled by default via
OTEL_SDK_DISABLED— safe for dev initTracing()before Fastify setup captures full request lifecycleshutdownTracing()on clean exit prevents span data loss
Please rebase on current main, resolve conflicts, and verify CI passes.
…elemetry tracing - Add prom-client registry with HTTP latency histogram, Stellar tx duration histogram, quiz submissions counter, reward claims counter, and credentials minted counter; expose /metrics endpoint for scraping - Register Fastify onRequest/onResponse hooks to instrument every route with request count and duration labels (method, route, status_code) - Add structured audit logger (src/audit/index.ts) emitting pino entries with audit: true for quiz.submitted, reward.claimed/queued, and credential.minted - Initialize OpenTelemetry NodeSDK in src/tracing.ts with OTLP exporter and Fastify, pg, and ioredis auto-instrumentations; call initTracing() before Fastify setup and shutdownTracing() on SIGTERM/SIGINT - Instrument reward.service, credential.service, and quiz.service with metrics increments, Stellar tx timing, and audit log calls on every financial event Closes ChainLearnOfficial#9
…gent, resolve conflict - Add db_active_connections Gauge (samples pool.totalCount - pool.idleCount on each /metrics scrape) and redis_connected Gauge (0/1 via ioredis connect/close events) as requested in issue ChainLearnOfficial#9 and reviewer feedback - Export pool from config/database.ts so setupInfraMetrics() can bind to it; call setupInfraMetrics(pool, redis) in buildApp() before route registration - Add ip and userAgent fields to AuditFields interface for security event tracing - Resolve rebase conflict in quiz.service.ts: keep generateQuizFromAI import from upstream AI-quiz PR alongside observability imports Closes ChainLearnOfficial#9
fc9cab2 to
ef4c985
Compare
|
Hi @DeFiVC — all review items have been addressed and the branch has been force-pushed. Here's a summary of what was done: 1. Merge conflict resolved 2. Missing infra gauges added
3. 4. Typecheck passes Ready for re-review. |
DeFiVC
left a comment
There was a problem hiding this comment.
All blocking issues from the previous review have been addressed. The PR now delivers a complete observability stack that fully satisfies issue #9.
What's Good
- Infrastructure gauges:
db_active_connections(samplespool.totalCount - pool.idleCount) andredis_connected(tracks connect/close/error events) — both use prom-client'scollect()callback for lazy evaluation - Audit fields complete:
ipanduserAgentnow present inAuditFieldsfor security event tracing - Clean rebase: Conflicts resolved, upstream AI-quiz PR changes incorporated
- CI green: Lint & Typecheck and Test both pass
Summary
- 6 Prometheus metrics (HTTP, Stellar tx, quiz, reward, credential) + 2 infrastructure gauges
- Structured audit logging for all financial events
- OpenTelemetry tracing with Fastify, pg, and ioredis auto-instrumentation
/metricsendpoint for Prometheus scraping
Approving — the implementation is solid, well-structured, and addresses all requirements from issue #9.
Closes #9
What changed
src/metrics/index.ts— prom-client registry with 6 metrics: HTTP request counter, HTTP latency histogram, Stellar tx duration histogram (per-method, success/error), quiz submission counter (passed/failed), reward claims counter (success/queued), credentials minted counter; default Node.js process metrics collectedsrc/metrics/fastify-hook.ts— FastifyonRequest/onResponsehooks that record count and latency for every route, labelled by method, route pattern, and status codesrc/server.ts— registers metrics hook, exposesGET /metrics(Prometheus text format), importsinitTracing()before Fastify setup, callsshutdownTracing()on clean exitsrc/audit/index.ts— typedauditLog(event, fields)helper that emits a pino log entry withaudit: true; events:quiz.submitted,reward.claimed,reward.queued,credential.minted,auth.login,auth.login_failedsrc/tracing.ts— OTelNodeSDKwith OTLP/HTTP exporter (configurable viaOTEL_EXPORTER_OTLP_ENDPOINT), Fastify, pg, and ioredis auto-instrumentations; disabled whenOTEL_SDK_DISABLED=truereward.service.ts— Stellar tx duration measured aroundinvokeContract,rewardClaimsTotalincremented on success/queued,auditLogcalled on both pathscredential.service.ts— same Stellar tx timing pattern formint_credential,credentialsMintedTotalincremented,auditLogon successquiz.service.ts—quizSubmissionsTotalincremented withpassed/failedlabel,auditLogon submissionWhy
Closes issue #9 — without metrics there is no visibility into request latency, Stellar contract call health, or financial event throughput. Audit logs provide a tamper-evident record of every on-chain operation for compliance. OTel traces allow distributed request correlation across Fastify → Postgres → Redis → Stellar.
How to test