https://github.com/lacerbi/paper2llm
Convert academic PDFs with figures into text-only Markdown — for humans and LLMs
https://github.com/lacerbi/paper2llm
llm-context llms ocr pdf-converter pdf2md
Last synced: about 1 month ago
JSON representation
Convert academic PDFs with figures into text-only Markdown — for humans and LLMs
- Host: GitHub
- URL: https://github.com/lacerbi/paper2llm
- Owner: lacerbi
- License: mit
- Created: 2025-03-07T20:58:30.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-09T11:13:41.000Z (7 months ago)
- Last Synced: 2025-03-09T11:22:51.930Z (7 months ago)
- Topics: llm-context, llms, ocr, pdf-converter, pdf2md
- Language: TypeScript
- Homepage: https://lacerbi.github.io/paper2llm/
- Size: 2.29 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `paper2llm` 📄→✨
Convert PDFs with a focus on academic papers into human-and-LLM-friendly **text-only Markdown files**.
### Features
- Text, tables and equations are parsed using [Mistral OCR](https://mistral.ai/en/news/mistral-ocr).
- Figures are converted to a textual description using a selected vision model (see below).
- Additional postprocessing is available, such as splitting the file into multiple parts (main, appendix, backmatter) and fetching a bibtex.
- **Example:** We converted all [our research group](https://www.helsinki.fi/en/researchgroups/machine-and-human-intelligence) papers to Markdown in [this repo](https://github.com/acerbilab/pubs-llms).### Requirements
- You need a Mistral AI [API key](https://console.mistral.ai/api-keys) to use `paper2llm`. Their [free API tier](https://docs.mistral.ai/deployment/laplateforme/tier/) is compatible with `paper2llm`, within rate limits.
- For the image-to-text conversion, multiple providers are supported.
- **You should read the [API Keys Security Guide](https://github.com/lacerbi/paper2llm/blob/main/paper2llm-web/docs/security/README.md) before using the app with your API keys.**### Credits
`paper2llm` was initially written by [Luigi Acerbi](https://lacerbi.github.io/) using [Claude 3.7 Sonnet](https://www.anthropic.com/news/claude-3-7-sonnet) and [Athanor](https://github.com/lacerbi/athanor). Subsequent revisions are done with whatever the state-of-the-art coding assistant and agent is at the moment (e.g., [Claude Code](https://www.anthropic.com/claude-code)).
You can follow me on [X](https://x.com/AcerbiLuigi) and [Bluesky](https://bsky.app/profile/lacerbi.bsky.social).
## Image Descriptions and Vision Models
After the OCR step, figures are converted to a Markdown text description using vision models such as Mistral AI's [Mistral Small](https://mistral.ai/news/mistral-small-3-1) or Google's [Gemini 2.5 Flash](https://deepmind.google/technologies/gemini/flash/). You can select the desired vision model via a dropdown menu, based on which API keys you entered.
Notes on vision models choice.
- Both Mistral AI and Google Gemini offer a **free API tier**.
- [**Gemini 2.5 Flash**](https://deepmind.google/technologies/gemini/flash/) is our currently recommended model for `paper2llm`. It is included in the [Gemini API free tier](https://ai.google.dev/gemini-api/docs/pricing) or otherwise very cheap, and shows very good performance.
- If you prefer to stick to only using the Mistral AI API, the default free Mistral AI model, [Mistral Small](https://mistral.ai/news/mistral-small-3-1), is a top-performing model in its size category and works generally well.
- [Pixtral Large](https://mistral.ai/en/news/pixtral-large) may work better for understanding complex diagrams and concepts, but it's a premier model; the API call is not rejected, but it might redirect to a free model if no API credits are available.
- Other premium models such as OpenAI's GPT-4o, Anthropic's Claude Sonnet 4 or Google Gemini 2.5 Pro might work better for complex figures, but beware of API costs.## Disclaimers
- We have no affiliation or financial relationship with Mistral AI, besides sympathy for a European AI company and appreciation for their AI models, nor with any other LLM providers.
- This is a _research preview_, as they say. Use at your own risk and with all the caveats of modern AI and LLM usage.
- In particular, image descriptions might be off in clear or subtle ways and you should double-check and fix them as needed.## License
`paper2llm` is released under the terms of the [MIT License](LICENSE).