Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/viveknathani/teachyourselfmath
A math PDF extraction engine, built for the internet. ❤️
https://github.com/viveknathani/teachyourselfmath
engine math node parser postgresql queues redis
Last synced: 3 days ago
JSON representation
A math PDF extraction engine, built for the internet. ❤️
- Host: GitHub
- URL: https://github.com/viveknathani/teachyourselfmath
- Owner: viveknathani
- License: mit
- Created: 2023-10-21T20:27:33.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-23T04:09:17.000Z (13 days ago)
- Last Synced: 2024-12-26T12:07:34.182Z (10 days ago)
- Topics: engine, math, node, parser, postgresql, queues, redis
- Language: TypeScript
- Homepage: https://teachyourselfmath.app/
- Size: 851 KB
- Stars: 65
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# teachyourselfmath
![build](https://github.com/viveknathani/teachyourselfmath/actions/workflows/build.yaml/badge.svg) ![deployment](https://github.com/viveknathani/teachyourselfmath/actions/workflows/deploy.yaml/badge.svg) ![stars](https://img.shields.io/github/stars/viveknathani/teachyourselfmath?style=social)
I wished for a free website with a large list of math problems, teachyourselfmath is that website.
### Description
If a document containing math problem exists, we'd like to extract every problem from it and dump it in a database. LaTeX is something that can be understood by both, computers and humans. Hence, the problem boils down to converting a PDF into LaTeX, removing the irrelevant parts, and storing the remaining parts.
When I started this project, I used [Meta's model](https://facebookresearch.github.io/nougat/) to parse academic PDF documents and find the LaTeX math in it. This approach was promising but the model is computationally expensive to run. Thankfully, LLMs have gotten much better at finding the contents of an image. If I could first convert the PDF into an image, LaTeX extraction would then be easier. And this is exactly what I am doing right now.
For a longer and less technical rant about this project - read [this](https://vivekn.dev/blog/teachyourselfmath).
### Setup
1. We use [pdf2pic](https://www.npmjs.com/package/pdf2pic) which has the following system dependencies: ghostscript, graphicsmagick.
2. You will need PostgreSQL and Redis to run this.
2. `yarn`
3. `yarn build`
4. Setup the `.env` file using the `.env.example` file.
5. `yarn start`!### Contributing
I am happy to accept pull requests. No hard rules.
### Acknowledgements
created by Vivek Nathani ([@viveknathani_](https://twitter.com/viveknathani_)), licensed under the [MIT License](./LICENSE).