-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genai package #545
Merged
+196
−15
Merged
genai package #545
Changes from all commits
Commits
Show all changes
213 commits
Select commit
Hold shift + click to select a range
ada8b06
Move prescreen inclusion criterion input to ops prescreen
4613597
update and rename workflows
geritwagner ca6d446
Update README.md
geritwagner ee364e2
crossref: catch Exception
geritwagner 5b077f4
refactor: pylint messages
geritwagner ac991d6
Run Update documentation weekly to avoid many PRs
geritwagner fa22b2f
Update documentation (#548)
github-actions[bot] e09ff9d
europe_pmc: catch ValueError in lock.release()
geritwagner fb6f4f2
Use posix paths for platform independence (#544)
julianprester 0329207
colrev project installation (making internal packages optional) (#530)
geritwagner 49c0e5d
docs: fix path
geritwagner e05779c
fix docker test
geritwagner 673aff8
docs: update path
geritwagner 3c2f278
docker tests: remove intermediate containers
geritwagner 3376b5a
package_manager: packages do not necessarily start with "colrev."
geritwagner 4145c33
update dependencies (#550)
github-actions[bot] 77416a2
paper_md: stop container
geritwagner e44ed40
fix import error: local_index.builder
geritwagner b7fdb14
add todo
geritwagner 3620b23
do not build paper in silent mode
geritwagner 6d207ee
Reduce dependencies and switch to pydantic (#551)
geritwagner f1e1c49
crossref: update printout
geritwagner 5bea258
docs: drop asciinema of package --init
geritwagner 345b927
docs: add note on search udpates
geritwagner ccaced5
cli: add instructions
geritwagner 015c436
upgrade: fix path-names in registry
geritwagner e4e429b
tei_parser: set defaults
geritwagner 5f59a78
testing/fixes
geritwagner 1001bf2
Export instead of print
22cf727
Remove instructor dependency
11f79aa
Split prompt into system and user
f82d39b
align screening output with prescreen file export
d6b0191
move packages asciinema to comments
geritwagner b73b80c
add command how to verify git credentials
dengdenglele 49a7ee7
fixes
geritwagner 29379ff
update dependencies (#553)
github-actions[bot] 984eec6
[pre-commit.ci] pre-commit autoupdate (#556)
pre-commit-ci[bot] 6fac7e9
prep polish: reset original state
geritwagner ac9f45b
crossref: raise ServiceNotAvailableException in crossref_query()
geritwagner 0154630
update set_prepared in record.run_quality_model()
geritwagner 1ac78c5
update sync
geritwagner 5cc6132
update validation
geritwagner 4723bc2
fix long line
geritwagner 8165407
no name-format defect for abbreviated names
geritwagner 44c022a
record.remove_field_provenance_note(): also remove IGNORE:note
geritwagner 01c0b67
record.change_entrytype(): run_quality_model() with set_prepared=True
geritwagner 8e81766
fixes
geritwagner e951b90
temporarily remove genai
geritwagner 47ac368
install all-internal-packages for devcontainer (pylint)
geritwagner 270280e
fix naming conventions
geritwagner 45452c5
fix naming conventions
geritwagner 189ef77
fix arxiv: pyproject.toml
geritwagner dd393a9
Relax prep (#529)
geritwagner 5546ac8
update upgrade of gh-actions
geritwagner 10defed
doi_org: use re instead of bs4
geritwagner cd4a147
record.has_fatal_quality_defects(): catch doi/numbers
geritwagner 63b1b28
update GROBID
geritwagner a505d86
fix tei_parser
geritwagner df01561
search_api_feed: make _add_record_to_feed public and remove redundant…
geritwagner 45003ef
colrev.files_dir: update rerun
geritwagner ae6e974
fix linter messages: colrev.files_dir
geritwagner 4787eb5
update grobid-0.8.1 tests
geritwagner 15b1a8d
update dependencies (#560)
github-actions[bot] 3dca2a8
Update documentation (#561)
github-actions[bot] c6203f5
colrev.github: support topic field, use inquirer
geritwagner e168f25
update dependencies (#562)
github-actions[bot] b065909
update links/docs
geritwagner 306229e
colrev.files_dir: catch exception
geritwagner e1e7f4a
update docs
geritwagner 8ea9e82
colrev.arxiv: fix feedparser dep
geritwagner b69ab1d
colrev.files_dir: ignore pylint
geritwagner a986ca0
release 0.13.0
geritwagner 658ce3d
update doi in CITATION.cff
geritwagner 111547d
colrev.files_dir: catch connection error
geritwagner 7e9f395
add comparison
geritwagner 6a60529
update README.md
geritwagner 4c97ae9
update overview
geritwagner 5c70901
update readme
geritwagner 1a5fd83
update comparison
geritwagner f8f26b0
update docs
geritwagner 5f40927
update docs/index
geritwagner 6ee03a8
udpate docs/index
geritwagner bbc67f3
colrev-packages: if direct-url does not start with file://, it is not…
geritwagner 237515a
colrev-packages: shallow clone
geritwagner 4bd032b
init: install package before instantiating
geritwagner d562273
update overview
geritwagner 381b5aa
add python venv instructions to getting started page
dengdenglele 2755e42
install setuptools with --break-system-packages
geritwagner e597929
drop click_completion, update zope.interface
geritwagner ad04893
fix pylint warnings
geritwagner 0306634
codespaces: py3.12
geritwagner 7f0282a
colrev init: add not to install internal_packages
geritwagner 5cf3ae1
temporarily deactivate is_installed()
geritwagner 83cd227
add "colrev install all-packages" to getting started install steps
dengdenglele 0ed1e50
fix previous wrong colrev package command
dengdenglele 40de2f9
set python version for workflow/tests
geritwagner c4b2d4d
fix pylint warnings
geritwagner e0d7b52
package-manager: import pypi packages containing colrev in package-name
geritwagner f6f0f04
add bs4 dependency for docs
geritwagner e753c86
update dependencies
geritwagner 07dd97e
deps: add mypy to dev
geritwagner dad9184
fix: package name validation
geritwagner 11add12
fix: do not call install() in init
geritwagner 00e027f
feat: create two commits upon init
geritwagner c02883e
deps: update bib-dedupe and Python (3.10-3.12)
geritwagner f682d5a
docs: reduce badges
geritwagner 78e7b90
add installation of diverse internal packages
dengdenglele 97560be
restructure installation of CoLRev and pre-commit hooks
dengdenglele 300142f
update
geritwagner 4dbce0c
dev: remove setup.md from devcontainer
geritwagner 2f98b76
docs: fix typo
geritwagner b0e2fd8
update dependencies
a09c34e
[pre-commit.ci] pre-commit autoupdate
pre-commit-ci[bot] 10bd5af
deps: add importlib_metadata for dash
geritwagner ff9d8c8
deps: poetry updates manually (not as a cronjob)
geritwagner 9b22ecf
crossref,dblp: update formatting
geritwagner 73c0523
docs: include colrev-scidb
geritwagner 51b1907
docs: update colrev-scidb
geritwagner 6e13e80
Update documentation (#573)
github-actions[bot] 072107c
Update documentation (#575)
github-actions[bot] 91ec2a8
fix: entering author details in package --init
geritwagner b2b6add
ui: add note for colrev package --init
geritwagner f00035c
fix: crossref - use cursor method for large queries
geritwagner 21d28c0
rename test files
geritwagner 14af26d
fix: package_manager init built-in
geritwagner d158ce3
release 0.13.1
geritwagner 2fc4fa6
docs: SearchSource
geritwagner 333a374
version: drop pre-python3.8
geritwagner 1aa5276
crossref: add get_dois()
geritwagner 09b12ee
format
geritwagner 6172de0
deps: update pre-commit to prevent InvalidManifest error
geritwagner 4ab4bf8
search: catch ModuleNotFound
geritwagner 7bf9832
update synergy-datasets
geritwagner 79e6e6a
release 0.13.2
geritwagner 6e5f897
update mypy-python version
geritwagner 62421f5
deps: update to bib-dedupe (silenced pandas warnings)
geritwagner 1312d72
fix: europe_pmc empty-list
geritwagner c10c62c
update pre-commit hooks
geritwagner 80bf32b
remove pylint flags
geritwagner 1b19e03
synergy: extract method
geritwagner c02d684
crossref: rename attribute
geritwagner 2206304
update docs
geritwagner 64349de
extract colrev.sync to separate PyPI package
geritwagner 6b58870
extract hooks-update to colrev-sync
geritwagner 41e942c
docs: remove hooks.update
geritwagner c29860b
fix: scope-prescreen optional with None
geritwagner 165f01d
add colrev convert (cli)
geritwagner ae90786
update gh-action workflows
geritwagner 8d568cb
gh-actions: update deploy
geritwagner e440e9d
update record_id_setter (for colrev convert)
geritwagner 2dde05a
colrev.plos package (#594)
olgagirona 8ab5a23
prospero searchsource (#586)
trathienphuc-tran e9573cb
Update documentation (#603)
github-actions[bot] 85ca03e
fix: ui-cli: detect SearchSource
geritwagner 34c1832
fix: search/missing query
geritwagner 6b15a28
update docs: repo_name
geritwagner b041ccf
deps: drop selenium for colrev core
geritwagner 9987659
replace pkg_resources with importlib (#605)
geritwagner 14a7cdf
Replace pybtex (#606)
geritwagner 99a5530
refactor
geritwagner de2b225
update coverage-badge
geritwagner 371354b
deps: remove importlib_metadata
geritwagner 09fa503
[pre-commit.ci] pre-commit autoupdate (#607)
pre-commit-ci[bot] 97b8da2
update docs
geritwagner 080a6db
fix pylint warnings in cli
geritwagner e7ffd5f
consistently update settings in add_package_to_settings
geritwagner d87ae05
fix pylint warning in plos
geritwagner dc8cc02
update scope_prescreen and docs
geritwagner d13ef5d
docs: update files_dir
geritwagner 2afc226
update docs
geritwagner 20b1f22
update cli handling
geritwagner 99bd114
fix: loader accept empty fields
geritwagner c06c6b4
update
geritwagner 99ced54
Replace zope-interfaces by abstract base classes (abc) (#610)
geritwagner 7803a3c
fix: drop repoze.sphinx.autointerface from docs/conf
geritwagner ced822e
update docs
geritwagner d3961cf
zope cleanup
geritwagner 2476197
poetry to uv (#611)
geritwagner 21efe43
docs: run sphinx in uv
geritwagner 8035199
fix pylint warning in scope-prescreen
geritwagner 18f00ec
revise package_manager
geritwagner bffe5e5
remove comment
geritwagner 9f54869
remove note
geritwagner b91235b
update pyproject tomls
geritwagner 5995a71
update publishing workflow
geritwagner 4709475
fix pyproject.tomls/dependencies
geritwagner 6f56cd3
release 0.14.0
geritwagner 6de42e8
update doi, release checklist
geritwagner 7cf2d71
update docs
geritwagner 6a4757e
update docs
geritwagner d28906e
update README/release-checklist
geritwagner 693fb52
fix link
geritwagner f0ab35b
update nr extensions
geritwagner 6aed9ca
update PLOS api/docs
geritwagner a882763
update covert: ris
geritwagner 3be4253
update ris writer
geritwagner 38a1687
update load-utils: load_df
geritwagner 5eb43ba
fixes
geritwagner 386d060
temporarily remove genai
geritwagner 0adc626
Update documentation (#575)
github-actions[bot] 2925aa7
Export instead of print
4dc6db7
fixes
geritwagner 2cfd79f
temporarily remove genai
geritwagner 0f305e2
Update documentation (#575)
github-actions[bot] 47be145
Update pyproject.toml
geritwagner 53d5bff
prescreen: use input() in package instead of operation
geritwagner 7653fd4
Merge branch 'main' into genai
geritwagner 8be9320
fix
geritwagner d366a48
fix
geritwagner 0e38f08
fix
geritwagner 9220533
switch from zope-interface to ABC
geritwagner 82f6169
udpate package
geritwagner fe1e73d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
## Summary | ||
|
||
Gen-AI package. | ||
|
||
## prescreen | ||
|
||
docs... | ||
|
||
## Links | ||
|
||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
[project] | ||
name = "colrev.genai" | ||
description = "CoLRev package for GenAI" | ||
version = "0.1.0" | ||
license = "MIT" | ||
authors = [ | ||
{ name = "Julian Prester", email = "[email protected]" }, | ||
{ name = "Gerit Wagner", email = "[email protected]" } | ||
] | ||
requires-python = ">=3.8, <4" | ||
dependencies = [ | ||
"litellm>=1.37.0", | ||
"pydantic>=2.7.1", | ||
] | ||
|
||
[project.urls] | ||
repository ="https://github.com/CoLRev-Environment/colrev/tree/main/colrev/packages/genai" | ||
|
||
[tool.hatch.build.targets.wheel] | ||
packages = ["src"] | ||
|
||
[tool.colrev] | ||
colrev_doc_description = "GenAI" | ||
colrev_doc_link = "README.md" | ||
search_types = [] | ||
|
||
[project.entry-points.colrev] | ||
prescreen = "colrev.packages.genai.src.genai_prescreen:GenAIPrescreen" | ||
screen = "colrev.packages.genai.src.genai_screen:GenAIScreen" | ||
|
||
[build-system] | ||
requires = ["hatchling"] | ||
build-backend = "hatchling.build" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
#! /usr/bin/env python | ||
"""Prescreen based on GenAI""" | ||
from __future__ import annotations | ||
|
||
import csv | ||
from pathlib import Path | ||
from typing import ClassVar | ||
|
||
import pandas as pd | ||
from litellm import completion | ||
from pydantic import BaseModel | ||
from pydantic import Field | ||
|
||
import colrev.package_manager.package_base_classes as base_classes | ||
import colrev.package_manager.package_manager | ||
import colrev.package_manager.package_settings | ||
import colrev.record.record | ||
from colrev.constants import Colors | ||
from colrev.constants import RecordState | ||
|
||
|
||
# pylint: disable=too-few-public-methods | ||
# pylint: disable=duplicate-code | ||
|
||
|
||
class PreScreenDecision(BaseModel): | ||
""" | ||
Class for a prescreen | ||
""" | ||
|
||
SYSTEM_PROMPT: ClassVar[str] = ( | ||
"You are an expert screener of scientific literature. " | ||
"You are tasked with identifying relevant articles for a literature review. " | ||
"You are provided with the metadata of an article and are asked to determine " | ||
"whether the article should be included in the review based on an inclusion criterion." | ||
) | ||
included: bool = Field( | ||
description="Whether the article should be included in the review " | ||
+ "based on the inclusion criterion." | ||
) | ||
explanation: str = Field(description="Explanation of the inclusion decision.") | ||
|
||
|
||
class GenAIPrescreen(base_classes.PrescreenPackageBaseClass): | ||
"""GenAI-based prescreen""" | ||
|
||
ci_supported: bool = Field(default=True) | ||
export_todos_only: bool = True | ||
|
||
class GenAIPrescreenSettings( | ||
colrev.package_manager.package_settings.DefaultSettings, BaseModel | ||
): | ||
"""Settings for GenAIPrescreen""" | ||
|
||
# pylint: disable=invalid-name | ||
# pylint: disable=too-many-instance-attributes | ||
|
||
endpoint: str | ||
model: str = "gpt-4o-mini" | ||
|
||
settings_class = GenAIPrescreenSettings | ||
|
||
def __init__( | ||
self, | ||
*, | ||
prescreen_operation: colrev.ops.prescreen.Prescreen, | ||
settings: dict, | ||
) -> None: | ||
self.review_manager = prescreen_operation.review_manager | ||
self.settings = self.settings_class(**settings) | ||
self.prescreen_decision_explanation_path = ( | ||
self.review_manager.paths.prescreen | ||
/ Path("prescreen_decision_explanation.csv") | ||
) | ||
|
||
# pylint: disable=unused-argument | ||
def run_prescreen( | ||
self, | ||
records: dict, | ||
split: list, | ||
) -> dict: | ||
"""Prescreen records based on GenAI""" | ||
|
||
if self.review_manager.settings.prescreen.explanation == "": | ||
print( | ||
f"\n{Colors.ORANGE}Provide a short explanation of the prescreen{Colors.END} " | ||
"(why should particular papers be included?):" | ||
) | ||
print( | ||
'Example objective: "Include papers that focus on digital technology."' | ||
) | ||
self.review_manager.settings.prescreen.explanation = input("") | ||
self.review_manager.save_settings() | ||
else: | ||
print("\nIn the prescreen, the following process is followed:\n") | ||
print(" " + self.review_manager.settings.prescreen.explanation) | ||
print() | ||
|
||
# API key needs to be set as an environment variable | ||
inclusion_criterion = self.review_manager.settings.prescreen.explanation | ||
|
||
screening_decisions = [] | ||
|
||
for record_dict in records.values(): | ||
record = colrev.record.record.Record(record_dict) | ||
response = completion( | ||
model=self.settings.model, | ||
max_tokens=1024, | ||
messages=[ | ||
{ | ||
"role": "user", | ||
"content": f"{PreScreenDecision.SYSTEM_PROMPT}\n\n" | ||
+ f"INCLUSION CRITERION:\n\n{inclusion_criterion}\n\n" | ||
+ f"METADATA:\n\n{record}", | ||
} | ||
], | ||
response_format=PreScreenDecision, | ||
) | ||
prescreen_decision = PreScreenDecision.model_validate_json( | ||
response.choices[0].message.content | ||
) | ||
if prescreen_decision.included: | ||
record.set_status(RecordState.rev_prescreen_included) | ||
else: | ||
record.set_status(RecordState.rev_prescreen_excluded) | ||
|
||
screening_decisions.append( | ||
{ | ||
"Record": record.get_data()["ID"], | ||
"Inclusion/Exclusion Decision": ( | ||
"Included" if prescreen_decision.included else "Excluded" | ||
), | ||
"Explanation": prescreen_decision.explanation, | ||
} | ||
) | ||
|
||
self.review_manager.paths.prescreen.mkdir(parents=True, exist_ok=True) | ||
screening_decisions_df = pd.DataFrame(screening_decisions) | ||
screening_decisions_df.to_csv( | ||
self.prescreen_decision_explanation_path, index=False, quoting=csv.QUOTE_ALL | ||
) | ||
self.review_manager.logger.info( | ||
f"Exported prescreening decisions to {self.prescreen_decision_explanation_path}" | ||
) | ||
|
||
self.review_manager.dataset.save_records_dict(records) | ||
self.review_manager.dataset.create_commit( | ||
msg="Pre-screen (GenAI)", | ||
manual_author=False, | ||
) | ||
|
||
return records |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geritwagner can you check this? I moved the screening decision table from a CLI print into a csv output instead. Is this the right way to create the file here?