This repository contains the implementation used in our manuscript: "LLM-Powered Borehole Data Automation for 3D Geological Modeling Workflows: An Eight-Model Evaluation and the Efficiency-Effectiveness Paradox".
Geo Drill Extractor is an end-to-end pipeline that:
- Parses borehole reports (Word documents) into structured borehole entities.
- Infers 3D coordinates from relative location descriptions using survey control points.
- Evaluates multiple LLMs under a unified prompting and parsing protocol that enforces machine-parseable JSON outputs.
- Reports both metric scores and failure modes (protocol/parsing vs. semantic/geometric errors) to avoid misattribution.
- Models: 8 core LLMs (see below)
- Documents: 30 reports
- Repetitions: 3 runs per model-document pair
- Total runs: 8 x 30 x 3 = 720
- Metrics (6D): Extraction Recall (ER), Location Recall (LR), Coordinate Success Rate (CSR), Efficiency Coefficient (EC), Processing Stability (PS), and Average Location Processing Time (ALPT)
-
Install dependencies:
python -m pip install -r requirements.txt
-
Configure API keys:
- Copy
.env.exampleto.envand setALIYUN_API_KEYand/orOPENROUTER_API_KEY.
- Copy
-
Run a low-cost end-to-end check using the synthetic dataset (auto-generated if missing):
python scripts/run_incremental_experiment.py --dataset synthetic --documents 3 --repetitions 1
-
Run the full 8-model paper configuration (requires you to provide the real documents and annotation files via config):
python run_full_test.py
Results are written to experiment_results/<timestamp>/.
- Data paths are configured via
configs/config.yaml(or overridden programmatically by scripts):data.documents_dir: directory containing.docxreportsdata.survey_points_file: survey control points CSVdata.ground_truth_file: ground-truth annotations CSV (for metric computation)
- Model routing is implemented in code (see
src/llm/) and the model identifiers are defined insrc/core/models.py.
The paper configuration (used by run_full_test.py) evaluates the following 8 model values:
- Aliyun Bailian (DashScope compatible API):
deepseek-r1-distill-qwen-7b-aliyundeepseek-r1-distill-qwen-14b-aliyundeepseek-r1-distill-qwen-32b-aliyunqwen-maxqwq-32b
- OpenRouter:
gpt-3.5-turbo-openroutergpt-4o-mini-openroutergpt-4.1-openrouter
Each experiment run produces a timestamped folder under experiment_results/ containing:
experiment_results.json: metadata + exported artifactsraw_results.json: raw structured outputs per modelmetrics_results.json/metrics_results.csv: per-document metric tablefailure_modes_summary.json/failure_modes_breakdown.csv: failure mode aggregation and breakdownprocessing_summary.txt: a human-readable summary with file pointers
MIT License. See LICENSE.