Skip to content
@ReadingTimeMachine

The Reading Time Machine

Welcome to the Reading Time Machine Project!

Our work focuses on digitizing the historical (published prior to 1997) holdings of the Astrophysics Data System using optical character recognition (OCR) and document layout analysis.

The repository associated with our first paper Figure and Figure Caption Extraction for Mixed Raster and Vector PDFs: Digitization of Astronomical Literature with OCR Features, published at TPDL 2022 is figure_and_caption_extraction.

We discuss the issue of "Generalizability" in document layout analysis as part of an invited talk at the November 2022 AEOLIAN Workshop: Making More Sense With Machines: AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Sciences with the conference preceedings Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction. The associated repository is htrc_short_conf.

We expand on our TPDL 2022 submission in the followup IJDT special issue contribution The Digitization of Historical Astrophysical Literature with Highly-Localized Figures and Figure Captions. The repository is digitization_at_high_localization for this work.

Pinned Loading

  1. figure_and_caption_extraction figure_and_caption_extraction Public

    Jupyter Notebook 5 1

  2. htrc_short_conf htrc_short_conf Public

    Jupyter Notebook 2

Repositories

Showing 8 of 8 repositories
  • LMM_Figure_Parsing Public

    How well do LMMs parse figures? Lets find out! :D

    ReadingTimeMachine/LMM_Figure_Parsing’s past year of commit activity
    Jupyter Notebook 1 Apache-2.0 0 11 0 Updated Aug 11, 2024
  • ReadingTimeMachine/ocr_post_correction’s past year of commit activity
    Jupyter Notebook 3 Apache-2.0 0 0 0 Updated Jan 5, 2024
  • TexSoup Public Forked from alvinwan/TexSoup

    fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents

    ReadingTimeMachine/TexSoup’s past year of commit activity
    Python 0 BSD-2-Clause 44 0 0 Updated Jun 19, 2023
  • ReadingTimeMachine.github.io Public

    Project repo for github page

    ReadingTimeMachine/ReadingTimeMachine.github.io’s past year of commit activity
    HTML 0 Apache-2.0 0 0 0 Updated Jun 6, 2023
  • .github Public
    ReadingTimeMachine/.github’s past year of commit activity
    0 Apache-2.0 0 0 0 Updated Mar 10, 2023
  • ReadingTimeMachine/digitization_at_high_localization’s past year of commit activity
    Jupyter Notebook 0 Apache-2.0 0 0 0 Updated Feb 8, 2023
  • ReadingTimeMachine/htrc_short_conf’s past year of commit activity
    Jupyter Notebook 2 Apache-2.0 0 0 0 Updated Oct 28, 2022
  • ReadingTimeMachine/figure_and_caption_extraction’s past year of commit activity
    Jupyter Notebook 5 Apache-2.0 1 2 0 Updated Oct 20, 2022

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…