Skip to content
@dell-research-harvard

dell-research-harvard

Popular repositories Loading

  1. linktransformer linktransformer Public

    A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

    Python 118 10

  2. AmericanStories AmericanStories Public

    The official Github for the American Stories dataset as in {link}

    Python 116 9

  3. effocr effocr Public

    A model(ing framework) for sample efficient OCR

    Python 57 5

  4. HJDataset HJDataset Public

    A Large Dataset of Historical Japanese Documents with Complex Layouts

    Jupyter Notebook 32 4

  5. NEWS-COPY NEWS-COPY Public

    Noise-robust de-duplication at scale

    Python 18 1

  6. newswire newswire Public

    Python 8

Repositories

Showing 10 of 29 repositories
  • linktransformer Public

    A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

    dell-research-harvard/linktransformer’s past year of commit activity
    Python 118 GPL-3.0 10 2 1 Updated Apr 4, 2025
  • efficient_ocr Public

    Efficient OCR for Building a Diverse Digital History

    dell-research-harvard/efficient_ocr’s past year of commit activity
    Python 7 Apache-2.0 1 0 0 Updated Apr 4, 2025
  • newswire Public
    dell-research-harvard/newswire’s past year of commit activity
    Python 8 0 0 0 Updated Aug 15, 2024
  • newsdejavu Public

    Python package for News Deja Vu

    dell-research-harvard/newsdejavu’s past year of commit activity
    Python 4 MIT 0 0 0 Updated Apr 9, 2024
  • AmericanStories Public

    The official Github for the American Stories dataset as in {link}

    dell-research-harvard/AmericanStories’s past year of commit activity
    Python 116 9 7 0 Updated Mar 7, 2024
  • HomoglyphsCJKTraining Public

    Quantifying Character Similarity with Vision Transformers

    dell-research-harvard/HomoglyphsCJKTraining’s past year of commit activity
    Python 6 0 0 0 Updated Oct 27, 2023
  • HomoglyphsCJK Public

    An efficient and useful tool to fuzzy match Japanese, Korean, Simplified Chinese or Traditional Chinese words.

    dell-research-harvard/HomoglyphsCJK’s past year of commit activity
    Python 3 MIT 1 0 0 Updated Oct 13, 2023
  • Associating-Press Public

    Associating layout elements from newspapers into full articles

    dell-research-harvard/Associating-Press’s past year of commit activity
    2 0 0 0 Updated Sep 15, 2023
  • DPR Public Forked from facebookresearch/DPR

    Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

    dell-research-harvard/DPR’s past year of commit activity
    Python 1 312 0 0 Updated Aug 15, 2023
  • dell-research-harvard/linktransformer-readthedocs’s past year of commit activity
    Python 0 0 0 0 Updated Aug 6, 2023

Top languages

Loading…

Most used topics

Loading…