Skip to content

Add geothermal electricity extraction support#400

Merged
ppinchuk merged 27 commits into
mainfrom
feature/geothermal-extraction-pr
May 21, 2026
Merged

Add geothermal electricity extraction support#400
ppinchuk merged 27 commits into
mainfrom
feature/geothermal-extraction-pr

Conversation

@bpulluta
Copy link
Copy Markdown
Collaborator

@bpulluta bpulluta commented Mar 20, 2026

Overview

This PR adds geothermal electricity as a supported extraction technology in COMPASS, including the extraction schema and plugin configuration needed to discover, retrieve, and extract structured ordinance data from jurisdictions governing utility-scale geothermal electricity generation.

Additional improvements:

  • Clarifies the output directory policy for the CLI, ensuring users understand that --out_dir_exists must be set via the CLI
  • Fixes two bugs in the retrieval layer affecting all technologies.

New: Geothermal Electricity Extraction

Files added:

  • compass/extraction/geothermal_electricity/geothermal_schema.json — Defines 29 extractable features including setbacks, permitting, noise limits, zoning classifications, decommissioning, and drilling requirements.
  • compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml — Configures search queries, website scoring keywords, heuristic filters, and document collection behavior tuned for geothermal electricity ordinances.

The schema follows the standard COMPASS one-shot extraction format and is compatible with the existing compass process pipeline.


CLI Improvements

  • The --out_dir_exists option for handling existing output directories is now available for users to overwrite, fail, or create a new directory automatically
  • Help text and logic ensure robust, user-friendly, and safe behavior for both interactive and automated runs.

Bug Fixes

Bug Fix 1 — PDF URLs with spaces failed to download

  • crawl4ai can return document URLs with raw spaces in the path. These are now percent-encoded before download, ensuring correct retrieval.
  • File: compass/scripts/download.py

Bug Fix 2 — Anchor text was never used in link scoring

  • Anchor text is now correctly populated and used in link scoring, improving document discovery.
  • File: compass/web/website_crawl.py

bpulluta and others added 10 commits March 17, 2026 12:11
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…rmatting (#399)

* Initial plan

* Fix all review comments in skills documentation

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
…eval

- percent-encode raw spaces in crawl4ai PDF source URLs before downstream use
- populate link text field from anchor text so ELMLinkScorer can score link labels
- add two regression tests covering both fixes
Copilot AI review requested due to automatic review settings March 20, 2026 23:49

This comment was marked as resolved.

@codecov-commenter

This comment was marked as resolved.

@bpulluta

This comment was marked as resolved.

This comment was marked as resolved.

bpulluta and others added 11 commits March 20, 2026 23:06
… test (#401)

* Initial plan

* Extract shared _sanitize_url to url_utils.py, simplify to space-only encoding, fix test robustness

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
Agent-Logs-Url: https://github.com/NatLabRockies/COMPASS/sessions/ceb782b4-c312-41d1-b4eb-eccbbef67097

* fix failing test

* ruff error fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
Bumps [release-drafter/release-drafter](https://github.com/release-drafter/release-drafter) from 7.0.0 to 7.1.1.
- [Release notes](https://github.com/release-drafter/release-drafter/releases)
- [Commits](release-drafter/release-drafter@3a7fb5c...139054a)

---
updated-dependencies:
- dependency-name: release-drafter/release-drafter
  dependency-version: 7.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add COMPASS workflow skills

* Added one-shot skills

* update one-shot SKILL.md structure and trigger contracts

* Initial plan (#398)

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

* Fix skills documentation: correct paths, caching behavior, and tab formatting (#399)

* Initial plan

* Fix all review comments in skills documentation

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

* renamed skills and fixed minor comments

* udpated skills Paul review march 26

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dive deep into the schema yet, but there are a few code things I'd like to address before focusing on the schema.

I really like the idea of adding a configurable post-processing pipeline to the schema. I envision that we could build up a "library" of post-processing functions that are generic enough where different schemas can use them. Very cool idea and we should definitely use it.

I'm quite concerned about parsing the summary for drilling hour windows instead of relying on the LLM output. Are you sure this is required? Can you explain why you went with that choice @bpulluta? Is there not a way to shape the output of the LLM instead? I'm worried about the case where the summary has multiple windows, which we already saw in the sample documents you had

Comment thread compass/_cli/process.py Outdated
Comment thread compass/_cli/process.py Outdated
Comment thread compass/_cli/process.py Outdated
Comment thread compass/_cli/process.py Outdated
Comment thread compass/plugin/one_shot/components.py
Comment thread compass/plugin/one_shot/components.py Outdated
Comment thread compass/plugin/one_shot/components.py Outdated
Comment thread compass/plugin/one_shot/components.py Outdated
Comment thread compass/web/url_utils.py Outdated
Comment thread compass/web/website_crawl.py Outdated
@ppinchuk
Copy link
Copy Markdown
Collaborator

I didn't dive deep into the schema yet, but there are a few code things I'd like to address before focusing on the schema.

I really like the idea of adding a configurable post-processing pipeline to the schema. I envision that we could build up a "library" of post-processing functions that are generic enough where different schemas can use them. Very cool idea and we should definitely use it.

I'm quite concerned about parsing the summary for drilling hour windows instead of relying on the LLM output. Are you sure this is required? Can you explain why you went with that choice @bpulluta? Is there not a way to shape the output of the LLM instead? I'm worried about the case where the summary has multiple windows, which we already saw in the sample documents you had

OK on second look, it looks like the summary is only being used as a fallback. That might be ok actually. I think you can disregard the comments about the summary parsing

Comment thread tox.ini
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments specifically for the plugin and schema configs. Excited to get this merged in!

Comment thread compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml Outdated
Comment thread compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml Outdated
Comment thread compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml Outdated
Comment thread compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml Outdated
Comment thread compass/extraction/geothermal_electricity/geothermal_schema.json Outdated
Comment thread compass/extraction/geothermal_electricity/geothermal_schema.json
@castelao
Copy link
Copy Markdown
Member

How do we move this one forward? @ppinchuk 's comments are relevant and should be addressed or it would be best to discuss why not. @bpulluta , would you like help to address the open issues on this PR?

As we rely more and more on the one-shot approach, properly closing this PR is important to set the tone for the other technologies on the way. That will save us a lot of time down the road.

Copy link
Copy Markdown
Collaborator Author

@bpulluta bpulluta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed and made the changes outlined in the comments. No regression was observed in reruning the geothermal electricity extraction

Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @bpulluta! Looks great!

@ppinchuk ppinchuk merged commit 8ff9cd1 into main May 21, 2026
21 checks passed
@ppinchuk ppinchuk deleted the feature/geothermal-extraction-pr branch May 21, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants