Introduce schema-driven structured extraction pathway with safe fallback mechanism by Arijit429 · Pull Request #355 · fireform-core/FireForm

Arijit429 · 2026-03-26T06:59:39Z

Closes #59
Closes #40
Closes #148

Overview

This pull request introduces an optional schema-driven structured extraction pathway to improve the consistency, performance, and future scalability of the LLM-based form filling pipeline in FireForm.

Currently, field extraction is performed iteratively, where the language model is prompted independently for each template field. While this approach works functionally, it may lead to inconsistent outputs, increased latency due to repeated model invocations, and limited visibility into extraction completeness.

This change introduces a structured extraction flow that allows the system to request all relevant fields in a single model interaction, while preserving the existing extraction mechanism as a safe fallback. This ensures backward compatibility and allows incremental adoption of the improved pipeline.

Key Changes

Added a centralized schema definition module to represent expected incident extraction fields.
Introduced a new structured extraction method in the LLM interaction layer that performs a single JSON-oriented extraction request.
Implemented safe parsing logic to validate model responses and detect malformed or incomplete outputs.
Integrated an automatic fallback mechanism to the existing per-field extraction loop if structured extraction fails.
Updated the file manipulation pipeline to attempt structured extraction prior to invoking the legacy extraction workflow.

Motivation

In real-world emergency documentation workflows, extraction reliability and response time are critical. Repeated field-level prompting increases the probability of inconsistent responses and adds unnecessary latency.

By enabling schema-driven extraction, this change aims to:

Reduce extraction variability
Improve performance by minimizing multiple LLM calls
Prepare the system for stronger validation and confidence scoring strategies
Provide a cleaner foundation for future architectural improvements

Impact

This enhancement is fully backward compatible and does not modify existing API contracts or database schemas. If structured extraction encounters runtime errors or invalid model outputs, the system seamlessly reverts to the current extraction approach.

The implementation therefore enables experimentation with improved extraction strategies without disrupting existing functionality.

Future Work

This change lays groundwork for several potential improvements, including:

Confidence scoring and extraction quality metrics
Human-review workflows for incomplete structured outputs
Asynchronous extraction handling for better scalability
Template-aware schema validation
Performance benchmarking between structured and iterative extraction modes

Testing

The updated extraction flow was tested locally by:

Creating templates through the API
Submitting form fill requests using sample incident descriptions
Verifying successful PDF generation and database persistence
Confirming correct fallback behavior when structured extraction parsing fails

No breaking changes were observed in existing workflows.

Arijit429

I have checked and run all my changes to check errors.

Arijit429 added 6 commits March 18, 2026 18:27

Fixing error handling message when PDF generation fails

a27df64

Added 30 seconds of timeout handling time for Api request on Ollamas

c96ab2a

Replace print statements with logging for better observability

a027545

Improve README with clearer local setup steps

dd6ac2d

Add requires_review flag for incomplete LLM extraction validation

aa98b33

feat: add structured extraction flow with safe fallback

7048088

Arijit429 commented Mar 27, 2026

View reviewed changes

Arijit429 mentioned this pull request Apr 17, 2026

[UPDATE] Post-proposal contribution summary — Arijit Deb #456

Open

Merge branch 'main' into structured-extraction-flow

bf92aa6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce schema-driven structured extraction pathway with safe fallback mechanism#355

Introduce schema-driven structured extraction pathway with safe fallback mechanism#355
Arijit429 wants to merge 7 commits into
fireform-core:mainfrom
Arijit429:structured-extraction-flow

Arijit429 commented Mar 26, 2026 •

edited

Loading

Uh oh!

Arijit429 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Arijit429 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Changes

Motivation

Impact

Future Work

Testing

Uh oh!

Arijit429 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Arijit429 commented Mar 26, 2026 •

edited

Loading