Add geothermal electricity extraction support#400
Conversation
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…rmatting (#399) * Initial plan * Fix all review comments in skills documentation Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
…eval - percent-encode raw spaces in crawl4ai PDF source URLs before downstream use - populate link text field from anchor text so ELMLinkScorer can score link labels - add two regression tests covering both fixes
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
… test (#401) * Initial plan * Extract shared _sanitize_url to url_utils.py, simplify to space-only encoding, fix test robustness Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com> Agent-Logs-Url: https://github.com/NatLabRockies/COMPASS/sessions/ceb782b4-c312-41d1-b4eb-eccbbef67097 * fix failing test * ruff error fix --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
Bumps [release-drafter/release-drafter](https://github.com/release-drafter/release-drafter) from 7.0.0 to 7.1.1. - [Release notes](https://github.com/release-drafter/release-drafter/releases) - [Commits](release-drafter/release-drafter@3a7fb5c...139054a) --- updated-dependencies: - dependency-name: release-drafter/release-drafter dependency-version: 7.1.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add COMPASS workflow skills * Added one-shot skills * update one-shot SKILL.md structure and trigger contracts * Initial plan (#398) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> * Fix skills documentation: correct paths, caching behavior, and tab formatting (#399) * Initial plan * Fix all review comments in skills documentation Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com> * renamed skills and fixed minor comments * udpated skills Paul review march 26 --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v5...v6) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
ppinchuk
left a comment
There was a problem hiding this comment.
I didn't dive deep into the schema yet, but there are a few code things I'd like to address before focusing on the schema.
I really like the idea of adding a configurable post-processing pipeline to the schema. I envision that we could build up a "library" of post-processing functions that are generic enough where different schemas can use them. Very cool idea and we should definitely use it.
I'm quite concerned about parsing the summary for drilling hour windows instead of relying on the LLM output. Are you sure this is required? Can you explain why you went with that choice @bpulluta? Is there not a way to shape the output of the LLM instead? I'm worried about the case where the summary has multiple windows, which we already saw in the sample documents you had
OK on second look, it looks like the summary is only being used as a fallback. That might be ok actually. I think you can disregard the comments about the summary parsing |
ppinchuk
left a comment
There was a problem hiding this comment.
A few more comments specifically for the plugin and schema configs. Excited to get this merged in!
|
How do we move this one forward? @ppinchuk 's comments are relevant and should be addressed or it would be best to discuss why not. @bpulluta , would you like help to address the open issues on this PR? As we rely more and more on the one-shot approach, properly closing this PR is important to set the tone for the other technologies on the way. That will save us a lot of time down the road. |
bpulluta
left a comment
There was a problem hiding this comment.
reviewed and made the changes outlined in the comments. No regression was observed in reruning the geothermal electricity extraction
Overview
This PR adds geothermal electricity as a supported extraction technology in COMPASS, including the extraction schema and plugin configuration needed to discover, retrieve, and extract structured ordinance data from jurisdictions governing utility-scale geothermal electricity generation.
Additional improvements:
--out_dir_existsmust be set via the CLINew: Geothermal Electricity Extraction
Files added:
compass/extraction/geothermal_electricity/geothermal_schema.json— Defines 29 extractable features including setbacks, permitting, noise limits, zoning classifications, decommissioning, and drilling requirements.compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml— Configures search queries, website scoring keywords, heuristic filters, and document collection behavior tuned for geothermal electricity ordinances.The schema follows the standard COMPASS one-shot extraction format and is compatible with the existing
compass processpipeline.CLI Improvements
--out_dir_existsoption for handling existing output directories is now available for users to overwrite, fail, or create a new directory automaticallyBug Fixes
Bug Fix 1 — PDF URLs with spaces failed to download
crawl4aican return document URLs with raw spaces in the path. These are now percent-encoded before download, ensuring correct retrieval.compass/scripts/download.pyBug Fix 2 — Anchor text was never used in link scoring
compass/web/website_crawl.py