Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genai package #545

Merged
merged 213 commits into from
Mar 12, 2025
Merged
Changes from all commits
Commits
Show all changes
213 commits
Select commit Hold shift + click to select a range
ada8b06
Move prescreen inclusion criterion input to ops prescreen
Sep 7, 2024
4613597
update and rename workflows
geritwagner Sep 8, 2024
ca6d446
Update README.md
geritwagner Sep 8, 2024
ee364e2
crossref: catch Exception
geritwagner Sep 8, 2024
5b077f4
refactor: pylint messages
geritwagner Sep 8, 2024
ac991d6
Run Update documentation weekly to avoid many PRs
geritwagner Sep 8, 2024
fa22b2f
Update documentation (#548)
github-actions[bot] Sep 8, 2024
e09ff9d
europe_pmc: catch ValueError in lock.release()
geritwagner Sep 9, 2024
fb6f4f2
Use posix paths for platform independence (#544)
julianprester Sep 10, 2024
0329207
colrev project installation (making internal packages optional) (#530)
geritwagner Sep 10, 2024
49c0e5d
docs: fix path
geritwagner Sep 10, 2024
e05779c
fix docker test
geritwagner Sep 10, 2024
673aff8
docs: update path
geritwagner Sep 10, 2024
3c2f278
docker tests: remove intermediate containers
geritwagner Sep 10, 2024
3376b5a
package_manager: packages do not necessarily start with "colrev."
geritwagner Sep 10, 2024
4145c33
update dependencies (#550)
github-actions[bot] Sep 11, 2024
77416a2
paper_md: stop container
geritwagner Sep 11, 2024
e44ed40
fix import error: local_index.builder
geritwagner Sep 13, 2024
b7fdb14
add todo
geritwagner Sep 13, 2024
3620b23
do not build paper in silent mode
geritwagner Sep 13, 2024
6d207ee
Reduce dependencies and switch to pydantic (#551)
geritwagner Sep 13, 2024
f1e1c49
crossref: update printout
geritwagner Sep 14, 2024
5bea258
docs: drop asciinema of package --init
geritwagner Sep 14, 2024
345b927
docs: add note on search udpates
geritwagner Sep 15, 2024
ccaced5
cli: add instructions
geritwagner Sep 16, 2024
015c436
upgrade: fix path-names in registry
geritwagner Sep 16, 2024
e4e429b
tei_parser: set defaults
geritwagner Sep 19, 2024
5f59a78
testing/fixes
geritwagner Sep 19, 2024
1001bf2
Export instead of print
Sep 19, 2024
22cf727
Remove instructor dependency
Sep 19, 2024
11f79aa
Split prompt into system and user
Sep 19, 2024
f82d39b
align screening output with prescreen file export
Sep 19, 2024
d6b0191
move packages asciinema to comments
geritwagner Sep 19, 2024
b73b80c
add command how to verify git credentials
dengdenglele Sep 20, 2024
49a7ee7
fixes
geritwagner Sep 23, 2024
29379ff
update dependencies (#553)
github-actions[bot] Sep 24, 2024
984eec6
[pre-commit.ci] pre-commit autoupdate (#556)
pre-commit-ci[bot] Sep 24, 2024
6fac7e9
prep polish: reset original state
geritwagner Sep 28, 2024
ac9f45b
crossref: raise ServiceNotAvailableException in crossref_query()
geritwagner Sep 28, 2024
0154630
update set_prepared in record.run_quality_model()
geritwagner Sep 28, 2024
1ac78c5
update sync
geritwagner Sep 28, 2024
5cc6132
update validation
geritwagner Sep 28, 2024
4723bc2
fix long line
geritwagner Sep 28, 2024
8165407
no name-format defect for abbreviated names
geritwagner Sep 28, 2024
44c022a
record.remove_field_provenance_note(): also remove IGNORE:note
geritwagner Sep 28, 2024
01c0b67
record.change_entrytype(): run_quality_model() with set_prepared=True
geritwagner Sep 28, 2024
8e81766
fixes
geritwagner Sep 28, 2024
e951b90
temporarily remove genai
geritwagner Sep 28, 2024
47ac368
install all-internal-packages for devcontainer (pylint)
geritwagner Sep 28, 2024
270280e
fix naming conventions
geritwagner Sep 28, 2024
45452c5
fix naming conventions
geritwagner Sep 28, 2024
189ef77
fix arxiv: pyproject.toml
geritwagner Sep 28, 2024
dd393a9
Relax prep (#529)
geritwagner Sep 28, 2024
5546ac8
update upgrade of gh-actions
geritwagner Sep 29, 2024
10defed
doi_org: use re instead of bs4
geritwagner Sep 29, 2024
cd4a147
record.has_fatal_quality_defects(): catch doi/numbers
geritwagner Sep 29, 2024
63b1b28
update GROBID
geritwagner Oct 1, 2024
a505d86
fix tei_parser
geritwagner Oct 1, 2024
df01561
search_api_feed: make _add_record_to_feed public and remove redundant…
geritwagner Oct 1, 2024
45003ef
colrev.files_dir: update rerun
geritwagner Oct 1, 2024
ae6e974
fix linter messages: colrev.files_dir
geritwagner Oct 1, 2024
4787eb5
update grobid-0.8.1 tests
geritwagner Oct 1, 2024
15b1a8d
update dependencies (#560)
github-actions[bot] Oct 2, 2024
3dca2a8
Update documentation (#561)
github-actions[bot] Oct 2, 2024
c6203f5
colrev.github: support topic field, use inquirer
geritwagner Oct 3, 2024
e168f25
update dependencies (#562)
github-actions[bot] Oct 4, 2024
b065909
update links/docs
geritwagner Oct 4, 2024
306229e
colrev.files_dir: catch exception
geritwagner Oct 4, 2024
e1e7f4a
update docs
geritwagner Oct 4, 2024
8ea9e82
colrev.arxiv: fix feedparser dep
geritwagner Oct 4, 2024
b69ab1d
colrev.files_dir: ignore pylint
geritwagner Oct 4, 2024
a986ca0
release 0.13.0
geritwagner Oct 4, 2024
658ce3d
update doi in CITATION.cff
geritwagner Oct 4, 2024
111547d
colrev.files_dir: catch connection error
geritwagner Oct 7, 2024
7e9f395
add comparison
geritwagner Oct 8, 2024
6a60529
update README.md
geritwagner Oct 8, 2024
4c97ae9
update overview
geritwagner Oct 8, 2024
5c70901
update readme
geritwagner Oct 10, 2024
1a5fd83
update comparison
geritwagner Oct 12, 2024
f8f26b0
update docs
geritwagner Oct 12, 2024
5f40927
update docs/index
geritwagner Oct 12, 2024
6ee03a8
udpate docs/index
geritwagner Oct 12, 2024
bbc67f3
colrev-packages: if direct-url does not start with file://, it is not…
geritwagner Oct 14, 2024
237515a
colrev-packages: shallow clone
geritwagner Oct 14, 2024
4bd032b
init: install package before instantiating
geritwagner Oct 14, 2024
d562273
update overview
geritwagner Oct 14, 2024
381b5aa
add python venv instructions to getting started page
dengdenglele Oct 15, 2024
2755e42
install setuptools with --break-system-packages
geritwagner Oct 15, 2024
e597929
drop click_completion, update zope.interface
geritwagner Oct 16, 2024
ad04893
fix pylint warnings
geritwagner Oct 16, 2024
0306634
codespaces: py3.12
geritwagner Oct 16, 2024
7f0282a
colrev init: add not to install internal_packages
geritwagner Oct 16, 2024
5cf3ae1
temporarily deactivate is_installed()
geritwagner Oct 16, 2024
83cd227
add "colrev install all-packages" to getting started install steps
dengdenglele Oct 16, 2024
0ed1e50
fix previous wrong colrev package command
dengdenglele Oct 16, 2024
40de2f9
set python version for workflow/tests
geritwagner Oct 16, 2024
c4b2d4d
fix pylint warnings
geritwagner Oct 16, 2024
e0d7b52
package-manager: import pypi packages containing colrev in package-name
geritwagner Oct 16, 2024
f6f0f04
add bs4 dependency for docs
geritwagner Oct 17, 2024
e753c86
update dependencies
geritwagner Oct 17, 2024
07dd97e
deps: add mypy to dev
geritwagner Oct 19, 2024
dad9184
fix: package name validation
geritwagner Oct 19, 2024
11add12
fix: do not call install() in init
geritwagner Oct 19, 2024
00e027f
feat: create two commits upon init
geritwagner Oct 19, 2024
c02883e
deps: update bib-dedupe and Python (3.10-3.12)
geritwagner Oct 23, 2024
f682d5a
docs: reduce badges
geritwagner Oct 24, 2024
78e7b90
add installation of diverse internal packages
dengdenglele Oct 28, 2024
97560be
restructure installation of CoLRev and pre-commit hooks
dengdenglele Oct 28, 2024
300142f
update
geritwagner Oct 29, 2024
4dbce0c
dev: remove setup.md from devcontainer
geritwagner Oct 30, 2024
2f98b76
docs: fix typo
geritwagner Nov 1, 2024
b0e2fd8
update dependencies
Oct 30, 2024
a09c34e
[pre-commit.ci] pre-commit autoupdate
pre-commit-ci[bot] Oct 28, 2024
10bd5af
deps: add importlib_metadata for dash
geritwagner Nov 4, 2024
ff9d8c8
deps: poetry updates manually (not as a cronjob)
geritwagner Nov 4, 2024
9b22ecf
crossref,dblp: update formatting
geritwagner Nov 4, 2024
73c0523
docs: include colrev-scidb
geritwagner Nov 4, 2024
51b1907
docs: update colrev-scidb
geritwagner Nov 5, 2024
6e13e80
Update documentation (#573)
github-actions[bot] Nov 6, 2024
072107c
Update documentation (#575)
github-actions[bot] Nov 13, 2024
91ec2a8
fix: entering author details in package --init
geritwagner Nov 21, 2024
b2b6add
ui: add note for colrev package --init
geritwagner Nov 28, 2024
f00035c
fix: crossref - use cursor method for large queries
geritwagner Nov 28, 2024
21d28c0
rename test files
geritwagner Dec 5, 2024
14af26d
fix: package_manager init built-in
geritwagner Dec 6, 2024
d158ce3
release 0.13.1
geritwagner Dec 17, 2024
2fc4fa6
docs: SearchSource
geritwagner Dec 25, 2024
333a374
version: drop pre-python3.8
geritwagner Dec 25, 2024
1aa5276
crossref: add get_dois()
geritwagner Jan 6, 2025
09b12ee
format
geritwagner Jan 6, 2025
6172de0
deps: update pre-commit to prevent InvalidManifest error
geritwagner Jan 6, 2025
4ab4bf8
search: catch ModuleNotFound
geritwagner Jan 7, 2025
7bf9832
update synergy-datasets
geritwagner Jan 14, 2025
79e6e6a
release 0.13.2
geritwagner Jan 15, 2025
6e5f897
update mypy-python version
geritwagner Jan 19, 2025
62421f5
deps: update to bib-dedupe (silenced pandas warnings)
geritwagner Jan 22, 2025
1312d72
fix: europe_pmc empty-list
geritwagner Jan 22, 2025
c10c62c
update pre-commit hooks
geritwagner Jan 22, 2025
80bf32b
remove pylint flags
geritwagner Jan 22, 2025
1b19e03
synergy: extract method
geritwagner Jan 22, 2025
c02d684
crossref: rename attribute
geritwagner Jan 22, 2025
2206304
update docs
geritwagner Jan 22, 2025
64349de
extract colrev.sync to separate PyPI package
geritwagner Jan 22, 2025
6b58870
extract hooks-update to colrev-sync
geritwagner Jan 22, 2025
41e942c
docs: remove hooks.update
geritwagner Jan 22, 2025
c29860b
fix: scope-prescreen optional with None
geritwagner Jan 23, 2025
165f01d
add colrev convert (cli)
geritwagner Jan 23, 2025
ae90786
update gh-action workflows
geritwagner Jan 23, 2025
8d568cb
gh-actions: update deploy
geritwagner Jan 23, 2025
e440e9d
update record_id_setter (for colrev convert)
geritwagner Jan 24, 2025
2dde05a
colrev.plos package (#594)
olgagirona Jan 25, 2025
8ab5a23
prospero searchsource (#586)
trathienphuc-tran Jan 26, 2025
e9573cb
Update documentation (#603)
github-actions[bot] Jan 29, 2025
85ca03e
fix: ui-cli: detect SearchSource
geritwagner Feb 3, 2025
34c1832
fix: search/missing query
geritwagner Feb 4, 2025
6b15a28
update docs: repo_name
geritwagner Feb 4, 2025
b041ccf
deps: drop selenium for colrev core
geritwagner Feb 4, 2025
9987659
replace pkg_resources with importlib (#605)
geritwagner Feb 7, 2025
14a7cdf
Replace pybtex (#606)
geritwagner Feb 8, 2025
99a5530
refactor
geritwagner Feb 8, 2025
de2b225
update coverage-badge
geritwagner Feb 9, 2025
371354b
deps: remove importlib_metadata
geritwagner Feb 10, 2025
09fa503
[pre-commit.ci] pre-commit autoupdate (#607)
pre-commit-ci[bot] Feb 12, 2025
97b8da2
update docs
geritwagner Feb 12, 2025
080a6db
fix pylint warnings in cli
geritwagner Feb 12, 2025
e7ffd5f
consistently update settings in add_package_to_settings
geritwagner Feb 12, 2025
d87ae05
fix pylint warning in plos
geritwagner Feb 12, 2025
dc8cc02
update scope_prescreen and docs
geritwagner Feb 12, 2025
d13ef5d
docs: update files_dir
geritwagner Feb 14, 2025
2afc226
update docs
geritwagner Feb 14, 2025
20b1f22
update cli handling
geritwagner Feb 14, 2025
99bd114
fix: loader accept empty fields
geritwagner Feb 14, 2025
c06c6b4
update
geritwagner Feb 16, 2025
99ced54
Replace zope-interfaces by abstract base classes (abc) (#610)
geritwagner Feb 20, 2025
7803a3c
fix: drop repoze.sphinx.autointerface from docs/conf
geritwagner Feb 20, 2025
ced822e
update docs
geritwagner Feb 20, 2025
d3961cf
zope cleanup
geritwagner Feb 20, 2025
2476197
poetry to uv (#611)
geritwagner Feb 21, 2025
21efe43
docs: run sphinx in uv
geritwagner Feb 21, 2025
8035199
fix pylint warning in scope-prescreen
geritwagner Feb 21, 2025
18f00ec
revise package_manager
geritwagner Feb 21, 2025
bffe5e5
remove comment
geritwagner Feb 21, 2025
9f54869
remove note
geritwagner Feb 21, 2025
b91235b
update pyproject tomls
geritwagner Feb 21, 2025
5995a71
update publishing workflow
geritwagner Feb 21, 2025
4709475
fix pyproject.tomls/dependencies
geritwagner Feb 21, 2025
6f56cd3
release 0.14.0
geritwagner Feb 21, 2025
6de42e8
update doi, release checklist
geritwagner Feb 21, 2025
7cf2d71
update docs
geritwagner Feb 22, 2025
6a4757e
update docs
geritwagner Feb 22, 2025
d28906e
update README/release-checklist
geritwagner Feb 22, 2025
693fb52
fix link
geritwagner Feb 22, 2025
f0ab35b
update nr extensions
geritwagner Feb 22, 2025
6aed9ca
update PLOS api/docs
geritwagner Feb 22, 2025
a882763
update covert: ris
geritwagner Feb 23, 2025
3be4253
update ris writer
geritwagner Feb 25, 2025
38a1687
update load-utils: load_df
geritwagner Mar 5, 2025
5eb43ba
fixes
geritwagner Sep 28, 2024
386d060
temporarily remove genai
geritwagner Sep 28, 2024
0adc626
Update documentation (#575)
github-actions[bot] Nov 13, 2024
2925aa7
Export instead of print
Sep 19, 2024
4dc6db7
fixes
geritwagner Sep 28, 2024
2cfd79f
temporarily remove genai
geritwagner Sep 28, 2024
0f305e2
Update documentation (#575)
github-actions[bot] Nov 13, 2024
47be145
Update pyproject.toml
geritwagner Mar 10, 2025
53d5bff
prescreen: use input() in package instead of operation
geritwagner Mar 11, 2025
7653fd4
Merge branch 'main' into genai
geritwagner Mar 11, 2025
8be9320
fix
geritwagner Mar 11, 2025
d366a48
fix
geritwagner Mar 11, 2025
0e38f08
fix
geritwagner Mar 11, 2025
9220533
switch from zope-interface to ABC
geritwagner Mar 11, 2025
82f6169
udpate package
geritwagner Mar 11, 2025
fe1e73d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions colrev/packages/colrev_cli_prescreen/src/prescreen_cli.py
Original file line number Diff line number Diff line change
@@ -76,21 +76,6 @@ def _fun_cli_prescreen(
stat_len: int,
padding: int,
) -> bool:
if self.review_manager.settings.prescreen.explanation == "":
print(
f"\n{Colors.ORANGE}Provide a short explanation of the prescreen{Colors.END} "
"(why should particular papers be included?):"
)
print(
'Example objective: "Include papers that focus on digital technology."'
)
self.review_manager.settings.prescreen.explanation = input("")
self.review_manager.save_settings()
else:
print("\nIn the prescreen, the following process is followed:\n")
print(" " + self.review_manager.settings.prescreen.explanation)
print()

self.review_manager.logger.debug("Start prescreen")

if 0 == stat_len:
11 changes: 11 additions & 0 deletions colrev/packages/genai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Summary

Gen-AI package.

## prescreen

docs...

## Links

...
33 changes: 33 additions & 0 deletions colrev/packages/genai/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[project]
name = "colrev.genai"
description = "CoLRev package for GenAI"
version = "0.1.0"
license = "MIT"
authors = [
{ name = "Julian Prester", email = "[email protected]" },
{ name = "Gerit Wagner", email = "[email protected]" }
]
requires-python = ">=3.8, <4"
dependencies = [
"litellm>=1.37.0",
"pydantic>=2.7.1",
]

[project.urls]
repository ="https://github.com/CoLRev-Environment/colrev/tree/main/colrev/packages/genai"

[tool.hatch.build.targets.wheel]
packages = ["src"]

[tool.colrev]
colrev_doc_description = "GenAI"
colrev_doc_link = "README.md"
search_types = []

[project.entry-points.colrev]
prescreen = "colrev.packages.genai.src.genai_prescreen:GenAIPrescreen"
screen = "colrev.packages.genai.src.genai_screen:GenAIScreen"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
152 changes: 152 additions & 0 deletions colrev/packages/genai/src/genai_prescreen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
#! /usr/bin/env python
"""Prescreen based on GenAI"""
from __future__ import annotations

import csv
from pathlib import Path
from typing import ClassVar

import pandas as pd
from litellm import completion
from pydantic import BaseModel
from pydantic import Field

import colrev.package_manager.package_base_classes as base_classes
import colrev.package_manager.package_manager
import colrev.package_manager.package_settings
import colrev.record.record
from colrev.constants import Colors
from colrev.constants import RecordState


# pylint: disable=too-few-public-methods
# pylint: disable=duplicate-code


class PreScreenDecision(BaseModel):
"""
Class for a prescreen
"""

SYSTEM_PROMPT: ClassVar[str] = (
"You are an expert screener of scientific literature. "
"You are tasked with identifying relevant articles for a literature review. "
"You are provided with the metadata of an article and are asked to determine "
"whether the article should be included in the review based on an inclusion criterion."
)
included: bool = Field(
description="Whether the article should be included in the review "
+ "based on the inclusion criterion."
)
explanation: str = Field(description="Explanation of the inclusion decision.")


class GenAIPrescreen(base_classes.PrescreenPackageBaseClass):
"""GenAI-based prescreen"""

ci_supported: bool = Field(default=True)
export_todos_only: bool = True

class GenAIPrescreenSettings(
colrev.package_manager.package_settings.DefaultSettings, BaseModel
):
"""Settings for GenAIPrescreen"""

# pylint: disable=invalid-name
# pylint: disable=too-many-instance-attributes

endpoint: str
model: str = "gpt-4o-mini"

settings_class = GenAIPrescreenSettings

def __init__(
self,
*,
prescreen_operation: colrev.ops.prescreen.Prescreen,
settings: dict,
) -> None:
self.review_manager = prescreen_operation.review_manager
self.settings = self.settings_class(**settings)
self.prescreen_decision_explanation_path = (
self.review_manager.paths.prescreen
/ Path("prescreen_decision_explanation.csv")
)

# pylint: disable=unused-argument
def run_prescreen(
self,
records: dict,
split: list,
) -> dict:
"""Prescreen records based on GenAI"""

if self.review_manager.settings.prescreen.explanation == "":
print(
f"\n{Colors.ORANGE}Provide a short explanation of the prescreen{Colors.END} "
"(why should particular papers be included?):"
)
print(
'Example objective: "Include papers that focus on digital technology."'
)
self.review_manager.settings.prescreen.explanation = input("")
self.review_manager.save_settings()
else:
print("\nIn the prescreen, the following process is followed:\n")
print(" " + self.review_manager.settings.prescreen.explanation)
print()

# API key needs to be set as an environment variable
inclusion_criterion = self.review_manager.settings.prescreen.explanation

screening_decisions = []

for record_dict in records.values():
record = colrev.record.record.Record(record_dict)
response = completion(
model=self.settings.model,
max_tokens=1024,
messages=[
{
"role": "user",
"content": f"{PreScreenDecision.SYSTEM_PROMPT}\n\n"
+ f"INCLUSION CRITERION:\n\n{inclusion_criterion}\n\n"
+ f"METADATA:\n\n{record}",
}
],
response_format=PreScreenDecision,
)
prescreen_decision = PreScreenDecision.model_validate_json(
response.choices[0].message.content
)
if prescreen_decision.included:
record.set_status(RecordState.rev_prescreen_included)
else:
record.set_status(RecordState.rev_prescreen_excluded)

screening_decisions.append(
{
"Record": record.get_data()["ID"],
"Inclusion/Exclusion Decision": (
"Included" if prescreen_decision.included else "Excluded"
),
"Explanation": prescreen_decision.explanation,
}
)

self.review_manager.paths.prescreen.mkdir(parents=True, exist_ok=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geritwagner can you check this? I moved the screening decision table from a CLI print into a csv output instead. Is this the right way to create the file here?

screening_decisions_df = pd.DataFrame(screening_decisions)
screening_decisions_df.to_csv(
self.prescreen_decision_explanation_path, index=False, quoting=csv.QUOTE_ALL
)
self.review_manager.logger.info(
f"Exported prescreening decisions to {self.prescreen_decision_explanation_path}"
)

self.review_manager.dataset.save_records_dict(records)
self.review_manager.dataset.create_commit(
msg="Pre-screen (GenAI)",
manual_author=False,
)

return records