Personalized-Arxiv-digest

This repo aims to provide a better daily digest for newly published arxiv papers based on your own research interests and descriptions.

What this repo does

Staying up to date on arxiv papers can take a considerable amount of time, with on the order of hundreds of new papers each day to filter through. There is an official daily digest service, however large subtopics like cs.AI still have 50-100 papers a day. Determining if these papers are relevant and important to you means reading through the title and abstract.

This repository provides a way to have this daily digest sorted by relevance via large language models:

You modify the configuration file config.yaml with an arxiv topic, some set of subtopics, and a natural language statement about the type of papers you are interested in
The code pulls all the abstracts for papers in those subtopics and ranks how relevant they are to your interest on a scale of 1-10 using gpt-3.5-turbo.
The code then emits an HTML digest listing all the relevant papers, and optionally emails it to you using SendGrid. You will need to have a SendGrid account with an API key for this functionality to work

Some examples:

Topic: cs.AI, cs.CL
Interest:
- Large language model pretraining and finetunings
- Multimodal machine learning
- Do not care about specific application, for example, information extraction, summarization, etc.
- Not interested in paper focus on specific languages, e.g., Arabic, Chinese, etc.

Topic: q-fin
Interest: "making lots of money"

Usage

Running as a github action using SendGrid.

The recommended way to get started using this repository is to:

Fork the repository
Modify config.yaml and merge the changes into your main branch. If you want a different schedule than Sunday through Thursday at 1:25PM UTC, then also modify the file .github/workflows/daily_pipeline.yaml
Create or fetch your api key for OpenAI. Note: you will need an OpenAI account.
Create or fetch your api key for SendGrid. You will need a SendGrid account. The free tier will generally suffice.
Set the following secrets:
- OPENAI_API_KEY
- SENDGRID_API_KEY
- FROM_EMAIL (only if you don't have it set in config.yaml)
- TO_EMAIL (only if you don't have it set in config.yaml)
Manually trigger the action or wait until the scheduled action takes place.

Running as a github action with SMTP credentials.

An alternative way to get started using this repository is to:

Fork the repository
Modify config.yaml and merge the changes into your main branch. If you want a different schedule than Sunday through Thursday at 1:25PM UTC, then also modify the file .github/workflows/daily_pipeline.yaml
Create or fetch your api key for OpenAI. Note: you will need an OpenAI account.
Find your email provider's SMTP settings and set the secret MAIL_CONNECTION to that. It should be in the form smtp://user:password@server:port or smtp+starttls://user:password@server:port. Alternatively, if you are using Gmail, you can set MAIL_USERNAME and MAIL_PASSWORD instead. If you are (understandably) apprehensive about using your email authentication here, you can create something like an application password instead
Set the following secrets:
- OPENAI_API_KEY
- MAIL_CONNECTION (see above)
- MAIL_PASSWORD (only if you don't have MAIL_CONNECTION set)
- MAIL_USERNAME (only if you don't have MAIL_CONNECTION set)
- FROM_EMAIL (only if you don't have it set in config.yaml)
- TO_EMAIL (only if you don't have it set in config.yaml)
Manually trigger the action or wait until the scheduled action takes place.

Running as a github action without emails

If you do not wish to create a SendGrid account or use your email authentication, the action will also emit an artifact containing the HTML output. Simply do not create the SendGrid or SMTP secrets.

You can access this digest as part of the github action artifact.

Running from the command line

If you do not wish to fork this repository, and would prefer to clone and run it locally instead:

Install the requirements in src/requirements.txt
Modify the configuration file config.yaml
Create or fetch your api key for OpenAI. Note: you will need an OpenAI account.
Create or fetch your api key for SendGrid (optional, if you want the script to email you)
Set the following secrets:
- OPENAI_API_KEY
- SENDGRID_API_KEY (only if using SendGrid)
- FROM_EMAIL (only if using SendGrid and if you don't have them set in config.yaml)
- TO_EMAIL (only if using SendGrid and if you don't have them set in config.yaml)
Run python action.py.
If you are not using SendGrid, the html of the digest will be written to digest.html. You can then use your favorite webbrowser to view it.

You may want to use something like crontab to schedule the digest.

Running with a user interface

Install the requirements in src/requirements.txt as well as gradio. Set the evironment variables OPENAI_API_KEY, FROM_EMAIL and SENDGRID_API_KEY

Run python src/app.py and go to the local URL. From there you will be able to preview the papers from today, as well as the generated digests.

Extending and Contributing

You may (and are encourage to) modify the code in this repository to suit your personal needs. If you think your modifications would be in any way useful to others, please submit a pull request.

These types of modifications include things like changes to the prompt, different language models, or additional ways for the digest is delivered to you.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
readme_images		readme_images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personalized-Arxiv-digest

What this repo does

Some examples:

Usage

Running as a github action using SendGrid.

Running as a github action with SMTP credentials.

Running as a github action without emails

Running from the command line

Running with a user interface

Extending and Contributing

About

Releases

Packages

Languages

License

vamiller12/Arxiv-Digest

Folders and files

Latest commit

History

Repository files navigation

Personalized-Arxiv-digest

What this repo does

Some examples:

Usage

Running as a github action using SendGrid.

Running as a github action with SMTP credentials.

Running as a github action without emails

Running from the command line

Running with a user interface

Extending and Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages