Skip to content

Add tetrahedron letters and Charles University dissertations iupacs#14

Open
dehaenw wants to merge 1 commit intoBlueObelisk:mainfrom
dehaenw:tetra_cuni
Open

Add tetrahedron letters and Charles University dissertations iupacs#14
dehaenw wants to merge 1 commit intoBlueObelisk:mainfrom
dehaenw:tetra_cuni

Conversation

@dehaenw
Copy link
Copy Markdown

@dehaenw dehaenw commented Jul 26, 2025

publicly available charles university organic chemistry theses were taken from https://dspace.cuni.cz and searching for dichloromethane

publicly available editions of tetrahedron letters on archive.org were taken from https://archive.org/details/pub_tetrahedron-letters

plaintext was used as is, just trailing characters and line break related corrections were done. all 1, 2 and 3grams were checked with Opsin

the used code for the dspace scrape and iupac extraction is presented "as is" this gist: https://gist.github.com/dehaenw/6a930d1c4375f3a4943b84810b0a003c

CA 10K new IUPACs in each.

Feel free to let me know if anything is unsuitable or needs changes!

publicly available charles uni organic chemistry theses were taken from https://dspace.cuni.cz and searching for dichloromethane
publicly available editions of tetrahedron letter on archive.org were taken from https://archive.org/details/pub_tetrahedron-letters
plaintext was used as is, just trailing characters and line break related corrections were done. all 1- 2 and 3grams were checked with Opsin
code for the dspace scrape and iupac extraction is in this gist:  https://gist.github.com/dehaenw/6a930d1c4375f3a4943b84810b0a003c
@egonw egonw self-assigned this Jul 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants