Skip to content

Commit 7988b01

Browse files
committed
Merge Relax prep #529
commit 096ae8f Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 21:23:48 2024 +0200 merge main commit d9a1a8a Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 17:33:22 2024 +0200 fix arxiv: pyproject.tomlÄ commit 38552d1 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 17:33:06 2024 +0200 fix naming conventions commit 6831e5b Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 17:07:16 2024 +0200 fix naming conventions commit a4aefd0 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 17:00:42 2024 +0200 install all-internal-packages for devcontainer (pylint) commit 5323329 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:57:18 2024 +0200 temporarily remove genai commit 520a713 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:28:13 2024 +0200 fixes commit 227bf71 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:21:47 2024 +0200 record.change_entrytype(): run_quality_model() with set_prepared=True commit ba6ba8b Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:19:14 2024 +0200 record.remove_field_provenance_note(): also remove IGNORE:note commit 9f1c228 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:18:37 2024 +0200 no name-format defect for abbreviated names commit 8df75d4 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:18:02 2024 +0200 fix long line commit bbe831d Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:17:47 2024 +0200 update validation commit 12beaf5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:03:27 2024 +0200 update sync Signed-off-by: Gerit Wagner <[email protected]> commit 520192c Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:01:14 2024 +0200 update set_prepared in record.run_quality_model() commit ae108e5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:00:42 2024 +0200 crossref: raise ServiceNotAvailableException in crossref_query() commit 9d882f3 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 15:58:17 2024 +0200 prep polish: reset original state commit ddea08f Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 24 06:16:49 2024 +0200 [pre-commit.ci] pre-commit autoupdate (#556) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.2 → v0.6.7](astral-sh/ruff-pre-commit@v0.6.2...v0.6.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 77c055e Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue Sep 24 06:16:39 2024 +0200 update dependencies (#553) Co-authored-by: Poetry updater <[email protected]> commit d045a1d Author: Gerit Wagner <[email protected]> Date: Mon Sep 23 08:35:53 2024 +0200 fixes commit a112c20 Author: Carlo <[email protected]> Date: Fri Sep 20 07:38:10 2024 +0200 add command how to verify git credentials commit 81ebfb4 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 19:25:34 2024 +0200 move packages asciinema to comments commit 700c805 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 07:50:11 2024 +0200 testing/fixes commit 2605661 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 06:21:49 2024 +0200 tei_parser: set defaults commit cd42141 Author: Gerit Wagner <[email protected]> Date: Mon Sep 16 11:24:19 2024 +0200 upgrade: fix path-names in registry commit 36bfd05 Author: Gerit Wagner <[email protected]> Date: Mon Sep 16 11:08:04 2024 +0200 cli: add instructions commit ecbad8f Author: Gerit Wagner <[email protected]> Date: Sun Sep 15 10:51:48 2024 +0200 docs: add note on search udpates commit aa1b676 Author: Gerit Wagner <[email protected]> Date: Sat Sep 14 12:00:44 2024 +0200 docs: drop asciinema of package --init Signed-off-by:t Gerit Wagner <[email protected]> commit e791cc5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 14 09:41:21 2024 +0200 crossref: update printout commit ffd9628 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 14:02:14 2024 +0200 Reduce dependencies and switch to pydantic (#551) * move dependencies to arxiv and dedupe * pin numpy<2.0 * add bib-dedupe * switch to pydantic * switch to pydantic * update * sources. use relative filenames * update docs * fix mypy commit 1b84a37 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 10:13:18 2024 +0200 do not build paper in silent mode commit 910dffa Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 08:05:21 2024 +0200 add todo commit 7277c71 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 08:05:10 2024 +0200 fix import error: local_index.builder commit 026df75 Author: Gerit Wagner <[email protected]> Date: Wed Sep 11 07:09:15 2024 +0200 paper_md: stop container commit dc6ed46 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Sep 11 06:19:15 2024 +0200 update dependencies (#550) Co-authored-by: Poetry updater <[email protected]> commit a4d12ec Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 20:40:56 2024 +0200 package_manager: packages do not necessarily start with "colrev." commit 5d23d21 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:36:08 2024 +0200 docker tests: remove intermediate containers commit 5142f4d Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:09:21 2024 +0200 docs: update path commit 2d6c2c2 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:08:34 2024 +0200 fix docker test commit 8c8f137 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 13:42:36 2024 +0200 docs: fix path commit 7b8c31e Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 09:23:19 2024 +0200 colrev project installation (making internal packages optional) (#530) * colrev project installation / make internal packages optional * drop optional extras from colrev * update * update gh-workflow Signed-off-by: Gerit Wagner <[email protected]> * format and docs * update upgrade * extract colrev-internal-package discovery * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install packages after --add and init * add note on colrev install . to docs --------- Signed-off-by: Gerit Wagner <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit bea78ac Author: Julian Prester <[email protected]> Date: Tue Sep 10 16:42:30 2024 +1000 Use posix paths for platform independence (#544) * Convert all paths for docker to posix * PRISMA: as_posix() * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * OCRMyPDF: as_posix() * fix prisma: path unlink() --------- Co-authored-by: Julian Prester <[email protected]> Co-authored-by: Gerit Wagner <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Gerit Wagner <[email protected]> commit cd944e1 Author: Gerit Wagner <[email protected]> Date: Mon Sep 9 09:19:49 2024 +0200 europe_pmc: catch ValueError in lock.release() commit 685260b Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Sep 8 14:44:24 2024 +0200 Update documentation (#548) * Update documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 046c09a Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:52:40 2024 +0200 Run Update documentation weekly to avoid many PRs commit 8fcb7e3 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:52:15 2024 +0200 refactor: pylint messages commit 3ea50f6 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:21:37 2024 +0200 crossref: catch Exception commit 163b5e1 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:19:55 2024 +0200 Update README.md commit 8f2460e Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:12:58 2024 +0200 update and rename workflows Signed-off-by: Gerit Wagner <[email protected]> commit c8bafc8 Author: Gerit Wagner <[email protected]> Date: Fri Sep 6 08:26:07 2024 +0200 data endpoint. add and commit commit b460db7 Author: Gerit Wagner <[email protected]> Date: Fri Sep 6 08:25:48 2024 +0200 init: check Docker available commit 85ea667 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Sep 4 08:58:41 2024 +0200 update dependencies (#539) Co-authored-by: Poetry updater <[email protected]> commit 69d3eff Author: Gerit Wagner <[email protected]> Date: Wed Sep 4 07:41:42 2024 +0200 update relink_pdfs Signed-off-by: Gerit Wagner <[email protected]> commit fbd4d78 Author: Gerit Wagner <[email protected]> Date: Tue Sep 3 09:11:58 2024 +0200 add note on the order of pre/screening packages commit 09aa217 Author: Gerit Wagner <[email protected]> Date: Tue Sep 3 08:20:41 2024 +0200 update pdf text extraction commit 6588b68 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:26:21 2024 +0200 europe_pmc: refactor commit 9877da0 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:25:31 2024 +0200 crossref: refactor commit 7ba35df Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:24:35 2024 +0200 update print output commit dad30ad Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:24:05 2024 +0200 RecordNotInIndexException: ID mandatory commit 45deaa8 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:22:44 2024 +0200 pdf-get: fix symlinks after renamed dirs commit 91d5552 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:22:25 2024 +0200 load: warn on non-standardized fields (instead of raising exception) commit 829e036 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:20:57 2024 +0200 paper_md: fix path commit 5b4079e Author: Gerit Wagner <[email protected]> Date: Sat Aug 31 18:59:57 2024 +0200 sort SearchTypes commit 84d7901 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sat Aug 31 16:20:02 2024 +0200 Update documentation (#528) Co-authored-by: github-actions <[email protected]> commit 9a0b265 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:53:19 2024 +0200 files_dir: stricter quality control commit ed28672 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:40:27 2024 +0200 add instruction to cli-validation commit 866ca2c Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:32:18 2024 +0200 sync with main commit 520a713 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:28:13 2024 +0200 fixes commit 227bf71 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:21:47 2024 +0200 record.change_entrytype(): run_quality_model() with set_prepared=True commit ba6ba8b Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:19:14 2024 +0200 record.remove_field_provenance_note(): also remove IGNORE:note commit 9f1c228 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:18:37 2024 +0200 no name-format defect for abbreviated names commit 8df75d4 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:18:02 2024 +0200 fix long line commit bbe831d Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:17:47 2024 +0200 update validation commit 12beaf5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:03:27 2024 +0200 update sync Signed-off-by: Gerit Wagner <[email protected]> commit 520192c Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:01:14 2024 +0200 update set_prepared in record.run_quality_model() commit ae108e5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 16:00:42 2024 +0200 crossref: raise ServiceNotAvailableException in crossref_query() commit 9d882f3 Author: Gerit Wagner <[email protected]> Date: Sat Sep 28 15:58:17 2024 +0200 prep polish: reset original state commit ddea08f Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 24 06:16:49 2024 +0200 [pre-commit.ci] pre-commit autoupdate (#556) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.2 → v0.6.7](astral-sh/ruff-pre-commit@v0.6.2...v0.6.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 77c055e Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue Sep 24 06:16:39 2024 +0200 update dependencies (#553) Co-authored-by: Poetry updater <[email protected]> commit d045a1d Author: Gerit Wagner <[email protected]> Date: Mon Sep 23 08:35:53 2024 +0200 fixes commit a112c20 Author: Carlo <[email protected]> Date: Fri Sep 20 07:38:10 2024 +0200 add command how to verify git credentials commit 81ebfb4 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 19:25:34 2024 +0200 move packages asciinema to comments commit 700c805 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 07:50:11 2024 +0200 testing/fixes commit 2605661 Author: Gerit Wagner <[email protected]> Date: Thu Sep 19 06:21:49 2024 +0200 tei_parser: set defaults commit cd42141 Author: Gerit Wagner <[email protected]> Date: Mon Sep 16 11:24:19 2024 +0200 upgrade: fix path-names in registry commit 36bfd05 Author: Gerit Wagner <[email protected]> Date: Mon Sep 16 11:08:04 2024 +0200 cli: add instructions commit ecbad8f Author: Gerit Wagner <[email protected]> Date: Sun Sep 15 10:51:48 2024 +0200 docs: add note on search udpates commit aa1b676 Author: Gerit Wagner <[email protected]> Date: Sat Sep 14 12:00:44 2024 +0200 docs: drop asciinema of package --init Signed-off-by:t Gerit Wagner <[email protected]> commit e791cc5 Author: Gerit Wagner <[email protected]> Date: Sat Sep 14 09:41:21 2024 +0200 crossref: update printout commit ffd9628 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 14:02:14 2024 +0200 Reduce dependencies and switch to pydantic (#551) * move dependencies to arxiv and dedupe * pin numpy<2.0 * add bib-dedupe * switch to pydantic * switch to pydantic * update * sources. use relative filenames * update docs * fix mypy commit 1b84a37 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 10:13:18 2024 +0200 do not build paper in silent mode commit 910dffa Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 08:05:21 2024 +0200 add todo commit 7277c71 Author: Gerit Wagner <[email protected]> Date: Fri Sep 13 08:05:10 2024 +0200 fix import error: local_index.builder commit 026df75 Author: Gerit Wagner <[email protected]> Date: Wed Sep 11 07:09:15 2024 +0200 paper_md: stop container commit dc6ed46 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Sep 11 06:19:15 2024 +0200 update dependencies (#550) Co-authored-by: Poetry updater <[email protected]> commit a4d12ec Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 20:40:56 2024 +0200 package_manager: packages do not necessarily start with "colrev." commit 5d23d21 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:36:08 2024 +0200 docker tests: remove intermediate containers commit 5142f4d Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:09:21 2024 +0200 docs: update path commit 2d6c2c2 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 14:08:34 2024 +0200 fix docker test commit 8c8f137 Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 13:42:36 2024 +0200 docs: fix path commit 7b8c31e Author: Gerit Wagner <[email protected]> Date: Tue Sep 10 09:23:19 2024 +0200 colrev project installation (making internal packages optional) (#530) * colrev project installation / make internal packages optional * drop optional extras from colrev * update * update gh-workflow Signed-off-by: Gerit Wagner <[email protected]> * format and docs * update upgrade * extract colrev-internal-package discovery * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install packages after --add and init * add note on colrev install . to docs --------- Signed-off-by: Gerit Wagner <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit bea78ac Author: Julian Prester <[email protected]> Date: Tue Sep 10 16:42:30 2024 +1000 Use posix paths for platform independence (#544) * Convert all paths for docker to posix * PRISMA: as_posix() * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * OCRMyPDF: as_posix() * fix prisma: path unlink() --------- Co-authored-by: Julian Prester <[email protected]> Co-authored-by: Gerit Wagner <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Gerit Wagner <[email protected]> commit cd944e1 Author: Gerit Wagner <[email protected]> Date: Mon Sep 9 09:19:49 2024 +0200 europe_pmc: catch ValueError in lock.release() commit 685260b Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Sep 8 14:44:24 2024 +0200 Update documentation (#548) * Update documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 046c09a Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:52:40 2024 +0200 Run Update documentation weekly to avoid many PRs commit 8fcb7e3 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:52:15 2024 +0200 refactor: pylint messages commit 3ea50f6 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:21:37 2024 +0200 crossref: catch Exception commit 163b5e1 Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:19:55 2024 +0200 Update README.md commit 8f2460e Author: Gerit Wagner <[email protected]> Date: Sun Sep 8 09:12:58 2024 +0200 update and rename workflows Signed-off-by: Gerit Wagner <[email protected]> commit c8bafc8 Author: Gerit Wagner <[email protected]> Date: Fri Sep 6 08:26:07 2024 +0200 data endpoint. add and commit commit b460db7 Author: Gerit Wagner <[email protected]> Date: Fri Sep 6 08:25:48 2024 +0200 init: check Docker available commit 85ea667 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Sep 4 08:58:41 2024 +0200 update dependencies (#539) Co-authored-by: Poetry updater <[email protected]> commit 69d3eff Author: Gerit Wagner <[email protected]> Date: Wed Sep 4 07:41:42 2024 +0200 update relink_pdfs Signed-off-by: Gerit Wagner <[email protected]> commit fbd4d78 Author: Gerit Wagner <[email protected]> Date: Tue Sep 3 09:11:58 2024 +0200 add note on the order of pre/screening packages commit 09aa217 Author: Gerit Wagner <[email protected]> Date: Tue Sep 3 08:20:41 2024 +0200 update pdf text extraction commit 6588b68 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:26:21 2024 +0200 europe_pmc: refactor commit 9877da0 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:25:31 2024 +0200 crossref: refactor commit 7ba35df Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:24:35 2024 +0200 update print output commit dad30ad Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:24:05 2024 +0200 RecordNotInIndexException: ID mandatory commit 45deaa8 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:22:44 2024 +0200 pdf-get: fix symlinks after renamed dirs commit 91d5552 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:22:25 2024 +0200 load: warn on non-standardized fields (instead of raising exception) commit 829e036 Author: Gerit Wagner <[email protected]> Date: Mon Sep 2 18:20:57 2024 +0200 paper_md: fix path commit 5b4079e Author: Gerit Wagner <[email protected]> Date: Sat Aug 31 18:59:57 2024 +0200 sort SearchTypes commit 84d7901 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sat Aug 31 16:20:02 2024 +0200 Update documentation (#528) Co-authored-by: github-actions <[email protected]> commit 2b765e4 Merge: 4944933 5c4117c Author: Gerit Wagner <[email protected]> Date: Sat Aug 31 18:32:36 2024 +0200 Merge branch 'main' into relax_prep commit 4944933 Author: Gerit Wagner <[email protected]> Date: Sat Aug 31 18:29:47 2024 +0200 update poetry.lock commit 20e00b5 Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 18:40:16 2024 +0200 update validation commit cae8c98 Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 18:12:59 2024 +0200 reorder imports commit 0cc6350 Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 18:12:41 2024 +0200 remove record notes commit 9038f35 Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 18:09:13 2024 +0200 add ref_check as a default package commit 4a13a14 Merge: 90ad50e b4be8af Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 17:22:12 2024 +0200 Merge branch 'main' into relax_prep commit 90ad50e Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 17:05:45 2024 +0200 record_test: ignore mypy errors commit 73a0653 Merge: 3ceb0d5 861b5a5 Author: Gerit Wagner <[email protected]> Date: Wed Aug 28 16:38:30 2024 +0200 Merge branch 'main' into relax_prep commit 3ceb0d5 Author: Gerit Wagner <[email protected]> Date: Fri Aug 23 18:38:52 2024 +0200 create package ref_check commit becd843 Author: Gerit Wagner <[email protected]> Date: Fri Aug 23 16:56:35 2024 +0200 has_fatal_quality_defects()
1 parent d9a1a8a commit 7988b01

26 files changed

+687
-211
lines changed

colrev/ops/init/settings.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
"load": {},
2727
"prep": {
2828
"fields_to_keep": [],
29-
"defects_to_ignore": ["inconsistent-with-url-metadata"],
29+
"defects_to_ignore": [],
3030
"prep_rounds": [
3131
{
3232
"name": "prep",
@@ -153,6 +153,10 @@
153153
]
154154
},
155155
"data": {
156-
"data_package_endpoints": []
156+
"data_package_endpoints": [
157+
{
158+
"endpoint": "colrev.rev_check"
159+
}
160+
]
157161
}
158162
}

colrev/ops/upgrade.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -720,6 +720,22 @@ def _migrate_0_13_0(self) -> bool:
720720
out.write(bibtex_str + "\n")
721721
self.repo.index.add([source["filename"]])
722722

723+
# Add "colrev.ref_check" to data endpoints
724+
if "colrev.ref_check" not in [
725+
e["endpoint"] for e in settings["data"]["data_package_endpoints"]
726+
]:
727+
settings["data"]["data_package_endpoints"].append(
728+
{"endpoint": "colrev.ref_check"}
729+
)
730+
731+
# Remove "inconsistent-with-url-metadata" from settings["prep"]["defects_to_ignore"]
732+
settings["prep"]["defects_to_ignore"] = [
733+
d
734+
for d in settings["prep"]["defects_to_ignore"]
735+
if d != "inconsistent-with-url-metadata"
736+
]
737+
self._save_settings(settings)
738+
723739
# Rename LOCAL_ENVIRONMENT_DIR
724740
if not Filepaths.LOCAL_ENVIRONMENT_DIR.is_dir():
725741
shutil.move(

colrev/package_manager/packages.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,5 +268,8 @@
268268
},
269269
"colrev.github": {
270270
"dev_status": "maturing"
271+
},
272+
"colrev.ref_check": {
273+
"dev_status": "experimental"
271274
}
272275
}

colrev/packages/arxiv/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ repository = "https://github.com/CoLRev-Environment/colrev/tree/main/colrev/pack
1010

1111
[[tool.poetry.packages]]
1212
include = "src"
13+
feedparser = "^6.0.10"
1314

1415
[tool.poetry.dependencies]
1516
python = ">=3.8, <4"

colrev/packages/files_dir/src/files_dir.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,24 @@ def _index_file(
457457
)
458458
raise NotImplementedError
459459

460+
def _fix_grobid_errors(self, new_record: dict) -> None:
461+
# Fix common GROBID errors that would cause problems in deduplication
462+
463+
# drop title if it is identical with journal
464+
if Fields.TITLE in new_record and Fields.JOURNAL in new_record:
465+
if new_record[Fields.TITLE] == new_record[Fields.JOURNAL]:
466+
new_record.pop(Fields.TITLE)
467+
# drop title if it starts with "doi:"
468+
if Fields.TITLE in new_record:
469+
if new_record[Fields.TITLE].lower().startswith("doi:"):
470+
new_record.pop(Fields.TITLE)
471+
# drop title if it has more numbers than characters
472+
if Fields.TITLE in new_record:
473+
if sum(c.isdigit() for c in new_record[Fields.TITLE]) > sum(
474+
c.isalpha() for c in new_record[Fields.TITLE]
475+
):
476+
new_record.pop(Fields.TITLE)
477+
460478
def _index_pdf(
461479
self,
462480
*,
@@ -512,6 +530,7 @@ def _index_pdf(
512530
):
513531
# otherwise, get metadata from grobid (indexing)
514532
new_record = self._get_grobid_metadata(file_path=file_path_abs)
533+
self._fix_grobid_errors(new_record)
515534

516535
new_record[Fields.FILE] = str(file_path)
517536
new_record = self._add_md_string(record_dict=new_record)
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
2+
repos:
3+
- repo: https://github.com/pre-commit/pre-commit-hooks
4+
rev: v4.6.0
5+
hooks:
6+
- id: trailing-whitespace
7+
- id: end-of-file-fixer
8+
exclude: bib$|txt$|ris$|enl$|xml$
9+
- id: check-docstring-first
10+
- id: check-json
11+
- id: check-yaml
12+
- id: check-toml
13+
- id: debug-statements
14+
- id: name-tests-test
15+
- repo: https://github.com/psf/black-pre-commit-mirror
16+
rev: 24.4.2
17+
hooks:
18+
- id: black
19+
language_version: python3
20+
- repo: https://github.com/PyCQA/autoflake
21+
rev: v2.3.1
22+
hooks:
23+
- id: autoflake
24+
- repo: https://github.com/PyCQA/flake8
25+
rev: 7.1.0
26+
hooks:
27+
- id: flake8
28+
additional_dependencies: [flake8-typing-imports==1.12.0]
29+
args: ['--max-line-length=110', '--extend-ignore=E203,TYP006']
30+
- repo: https://github.com/asottile/reorder-python-imports
31+
rev: v3.13.0
32+
hooks:
33+
- id: reorder-python-imports
34+
args: [--py3-plus]
35+
- repo: https://github.com/asottile/pyupgrade
36+
rev: v3.16.0
37+
hooks:
38+
- id: pyupgrade
39+
args: [--py36-plus, --keep-runtime-typing]
40+
- repo: https://github.com/pre-commit/mirrors-mypy
41+
rev: 'v1.11.0'
42+
hooks:
43+
- id: mypy
44+
args: [--disallow-untyped-defs, --disallow-incomplete-defs, --disallow-untyped-calls]
45+
additional_dependencies: [types-toml]
46+
- repo: https://github.com/astral-sh/ruff-pre-commit
47+
rev: v0.5.4
48+
hooks:
49+
- id: ruff # runs faster than pylint
50+
args: [--fix, --exit-non-zero-on-fix]
51+
- repo: local
52+
hooks:
53+
- id: pylint
54+
name: pylint
55+
entry: pylint
56+
language: system
57+
types: [python]
58+
files: colrev
59+
args:
60+
[
61+
"-rn", # Only display messages
62+
"-sn", # Don't display the score
63+
]

colrev/packages/ref_check/LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) [2024] [Gerit Wagner]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# colrev.ref_check
2+
3+
Checks the quality of reference metadata
4+
5+
## Installation
6+
7+
```bash
8+
colrev install colrev.ref_check
9+
```
10+
11+
## Usage
12+
13+
`colrev.ref_check` can be added as a data endpoint. It ensures that records are only set to `rev_synthesized` if there are no remaining defects in the record metadata.
14+
15+
## License
16+
17+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## Installation
2+
3+
```bash
4+
colrev install colrev.ref_check
5+
```
6+
7+
## Usage
8+
9+
`colrev.ref_check` can be added as a data endpoint. It ensures that records are only set to `rev_synthesized` if there are no remaining defects in the record metadata.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
[tool.poetry]
2+
name = "colrev.ref_check"
3+
description = "Checks the quality of reference metadata"
4+
version = "0.1.0"
5+
license = "MIT"
6+
authors = ["Gerit Wagner <[email protected]>"]
7+
repository = "https://github.com/CoLRev-Environment/colrev/tree/main/colrev/packages/"
8+
9+
[[tool.poetry.packages]]
10+
include = "src"
11+
12+
[tool.poetry.dependencies]
13+
python = ">=3.9, <4"
14+
15+
[tool.colrev]
16+
colrev_doc_description = "TODO"
17+
colrev_doc_link = "docs/README.md"
18+
search_types = []
19+
20+
[tool.poetry.plugins.colrev]
21+
data = "colrev.packages.ref_check.src.ref_check:RefCheck"
22+
23+
[build-system]
24+
requires = ["poetry-core>=1.0.0", "cython<3.0"]
25+
build-backend = "poetry.core.masonry.api"

0 commit comments

Comments
 (0)