Skip to content

Introduce schema-driven structured extraction pathway with safe fallback mechanism#355

Open
Arijit429 wants to merge 7 commits into
fireform-core:mainfrom
Arijit429:structured-extraction-flow
Open

Introduce schema-driven structured extraction pathway with safe fallback mechanism#355
Arijit429 wants to merge 7 commits into
fireform-core:mainfrom
Arijit429:structured-extraction-flow

Conversation

@Arijit429
Copy link
Copy Markdown

@Arijit429 Arijit429 commented Mar 26, 2026

Closes #59
Closes #40
Closes #148

Overview

This pull request introduces an optional schema-driven structured extraction pathway to improve the consistency, performance, and future scalability of the LLM-based form filling pipeline in FireForm.

Currently, field extraction is performed iteratively, where the language model is prompted independently for each template field. While this approach works functionally, it may lead to inconsistent outputs, increased latency due to repeated model invocations, and limited visibility into extraction completeness.

This change introduces a structured extraction flow that allows the system to request all relevant fields in a single model interaction, while preserving the existing extraction mechanism as a safe fallback. This ensures backward compatibility and allows incremental adoption of the improved pipeline.

Key Changes

  • Added a centralized schema definition module to represent expected incident extraction fields.
  • Introduced a new structured extraction method in the LLM interaction layer that performs a single JSON-oriented extraction request.
  • Implemented safe parsing logic to validate model responses and detect malformed or incomplete outputs.
  • Integrated an automatic fallback mechanism to the existing per-field extraction loop if structured extraction fails.
  • Updated the file manipulation pipeline to attempt structured extraction prior to invoking the legacy extraction workflow.

Motivation

In real-world emergency documentation workflows, extraction reliability and response time are critical. Repeated field-level prompting increases the probability of inconsistent responses and adds unnecessary latency.

By enabling schema-driven extraction, this change aims to:

  • Reduce extraction variability
  • Improve performance by minimizing multiple LLM calls
  • Prepare the system for stronger validation and confidence scoring strategies
  • Provide a cleaner foundation for future architectural improvements

Impact

This enhancement is fully backward compatible and does not modify existing API contracts or database schemas. If structured extraction encounters runtime errors or invalid model outputs, the system seamlessly reverts to the current extraction approach.

The implementation therefore enables experimentation with improved extraction strategies without disrupting existing functionality.

Future Work

This change lays groundwork for several potential improvements, including:

  • Confidence scoring and extraction quality metrics
  • Human-review workflows for incomplete structured outputs
  • Asynchronous extraction handling for better scalability
  • Template-aware schema validation
  • Performance benchmarking between structured and iterative extraction modes

Testing

The updated extraction flow was tested locally by:

  • Creating templates through the API
  • Submitting form fill requests using sample incident descriptions
  • Verifying successful PDF generation and database persistence
  • Confirming correct fallback behavior when structured extraction parsing fails

No breaking changes were observed in existing workflows.

Copy link
Copy Markdown
Author

@Arijit429 Arijit429 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and run all my changes to check errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant