awesome-arxiv

Curated resources for discovering, reading, and working with arXiv papers
https://github.com/artnitolog/awesome-arxiv

Last synced: 6 days ago
JSON representation

Search & Discovery
- alphaXiv - access platform for discovering new papers, interactive discussion of arXiv preprints, allowing researchers to comment line-by-line, ask questions and engage directly with the authors.
- ArxivXplorer
- Connected Papers
- Emergent Mind - up questions and topic links.
- Litmaps
- Paper Digest
- PaperMatch - a semantic search engine that finds similar papers from natural language input or arXiv ID across all of arXiv.
- Paperscape - source project that visualizes the entire arXiv repository as an interactive map. Each paper is represented as a node, with its size indicating the number of citations and its position determined by citation relationships to other papers.
- ResearchRabbit
- Semantic Scholar
- searchthearXiv
- Paperscape - source project that visualizes the entire arXiv repository as an interactive map. Each paper is represented as a node, with its size indicating the number of citations and its position determined by citation relationships to other papers.
- ResearchRabbit
- Semantic Scholar
- searchthearXiv
- Paper Digest
- PaperMatch - a semantic search engine that finds similar papers from natural language input or arXiv ID across all of arXiv.
Notifications & Recommenders
- AlphaSignal - minute daily summary, monitoring sources from arXiv and social media.
- Benty Fields
- Scholar Inbox - to-date with the most relevant progress based on personal research interests. Daily indexes all of arXiv, bioRxiv, medRxiv and ChemRxiv and other open access proceedings.
- huggingface Daily Papers - curated platform that highlights trending and impactful ML research, updated daily by contributors like AK and the broader AI community. Each paper entry includes metadata, links to related models or datasets, and a discussion section where users can engage with authors and peers.
- ML Papers of The Week
- AlphaSignal - minute daily summary, monitoring sources from arXiv and social media.
- Benty Fields
- huggingface Daily Papers - curated platform that highlights trending and impactful ML research, updated daily by contributors like AK and the broader AI community. Each paper entry includes metadata, links to related models or datasets, and a discussion section where users can engage with authors and peers.
- ML Papers of The Week
- Scholar Inbox - to-date with the most relevant progress based on personal research interests. Daily indexes all of arXiv, bioRxiv, medRxiv and ChemRxiv and other open access proceedings.
SDKs & CLI Tools
- ArXiv MCP Server
- arxiv-dl - opinionated CLI tool for downloading papers. Priorities ease of use for researchers.
- arxiv.py
- arxiv_summarizer - based tool designed to fetch and summarize arXiv papers, supporting single-paper summaries, batch processing, and keyword-based searches.
- arXivScraper
- cli-arxiv - converted and are used to recommend future articles.
- Docling
- cli-arxiv - converted and are used to recommend future articles.
- Docling
- arxiv_summarizer - based tool designed to fetch and summarize arXiv papers, supporting single-paper summaries, batch processing, and keyword-based searches.
- arXivScraper
- ArXiv MCP Server
- arxiv-dl - opinionated CLI tool for downloading papers. Priorities ease of use for researchers.
- arxiv.py
Reading & Browser Enhancers
- arxiv-utils
- arxiv2notion - based enhancements.
- Explainpaper
- PaperMemory - page tools.
- SciSpace Copilot
- zotero-arxiv-workflow
- arxiv-utils
- arxiv2notion - based enhancements.
- Explainpaper
- PaperMemory - page tools.
- SciSpace Copilot
- zotero-arxiv-workflow
Datasets
- Arxiver Dataset - markdown format, published between January 2023 and October 2023. The dataset includes original metadata such as IDs, titles, abstracts, authors, publication dates. Available under a CC BY-NC-SA 4.0 license.
- arXiv Paper Abstracts - label text classification. Contains paper titles, abstracts, and subject categories. Released under CC0 1.0 license.
- Arxiver Dataset - markdown format, published between January 2023 and October 2023. The dataset includes original metadata such as IDs, titles, abstracts, authors, publication dates. Available under a CC BY-NC-SA 4.0 license.
- arXiv Paper Abstracts - label text classification. Contains paper titles, abstracts, and subject categories. Released under CC0 1.0 license.
- arxiv-summarisation - scale dataset designed for training and evaluating abstractive summarization models on scientific papers, comprising over 431k articles with corresponding abstracts.
- ArxivFormula
- arxiv-summarisation - scale dataset designed for training and evaluating abstractive summarization models on scientific papers, comprising over 431k articles with corresponding abstracts.
- ArxivFormula
- Cornell University arXiv Dataset - text PDFs (the dataset doesn't contain parsed papers directly). CC0 1.0 license.
- MINT-1T (arXiv) - scale multimodal pre-training tasks. This is a subset of the MINT-1T collection, released under CC-BY-4.0 license.
- Multimodal ArXiv - language models. It comprises two subsets: ArXivCap, a figure-caption dataset containing 6.4M images and 3.9M captions, and ArXivQA, a QA dataset generated by prompting GPT-4V based on scientific figures with 100k questions. CC BY-NC-SA 4.0 license.
- S2ORC - By 1.0.
- SciEvo - scale dataset designed to support scientometric research and the study of scientific knowledge evolution. Provides a collection of over 2M publications, including detailed metadata and citation graphs. Available under the Apache 2.0 license.
- unarXive
- Multimodal ArXiv - language models. It comprises two subsets: ArXivCap, a figure-caption dataset containing 6.4M images and 3.9M captions, and ArXivQA, a QA dataset generated by prompting GPT-4V based on scientific figures with 100k questions. CC BY-NC-SA 4.0 license.
- S2ORC - By 1.0.
- SciEvo - scale dataset designed to support scientometric research and the study of scientific knowledge evolution. Provides a collection of over 2M publications, including detailed metadata and citation graphs. Available under the Apache 2.0 license.
- unarXive
- Cornell University arXiv Dataset - text PDFs (the dataset doesn't contain parsed papers directly). CC0 1.0 license.
- MINT-1T (arXiv) - scale multimodal pre-training tasks. This is a subset of the MINT-1T collection, released under CC-BY-4.0 license.

Programming Languages

Python 12 JavaScript 6 TypeScript 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-arxiv

Search & Discovery

Notifications & Recommenders

SDKs & CLI Tools

Reading & Browser Enhancers

Datasets