https://github.com/datawraith/arxiv-frontpage

My personal ArXiv frontpage
https://github.com/datawraith/arxiv-frontpage

Last synced: 9 months ago
JSON representation

My personal ArXiv frontpage

Host: GitHub
URL: https://github.com/datawraith/arxiv-frontpage
Owner: DataWraith
License: mit
Created: 2025-04-13T08:41:53.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-05T05:32:09.000Z (about 1 year ago)
Last Synced: 2025-06-05T07:32:17.995Z (about 1 year ago)
Language: HTML
Size: 24.8 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# arxiv-frontpage

A tool that creates a personalized frontpage of arXiv computer science papers ranked by your interests.

## Demo

My frontpage for today can be viewed here:

## What does this do?

Inspired by , this project fetches new computer science papers from [arXiv](https://arxiv.org) and uses a classifier to infer tags from the paper metadata. Tags are displayed below each paper abstract once the classifier's confidence reaches a threshold.

Each tag is associated with an "interestingness" multiplier, and the final frontpage ranks papers by multiplying the confidence that a given tag is present with its interestingness modifier. The resulting score is then summed over all tags, giving you a personalized ranking of fresh papers.

The GitHub Actions automatically pull new data and regenerate the site once on every weekday -- if you fork the repo, you may need to change the repository settings to allow Actions to commit and push changes.

## How does it work?

1. **Tag Configuration**: Tags are defined in `data/tags.json` and mapped to their interestingness multiplier.
2. **Training Data**: Each tag must have an associated `.jsonl` file in the `data/train` directory.
3. **Paper Collection**: The system fetches recent papers from arXiv's CS categories via RSS feed.
4. **Classification**: A Probabilistic Label Tree classifier (via [napkinXC](https://napkinxc.readthedocs.io)) determines the relevance of each tag for each paper.
5. **Ranking**: Papers are scored and the frontpage is generated.

The generated frontpage includes a copy button that displays the JSON data you need to put into the training files to improve future classifications.

You can also run the project locally using [uv](https://github.com/astral-sh/uv) -- see the `Justfile` for the available commands.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datawraith/arxiv-frontpage

Awesome Lists containing this project

README