Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/viveknathani/teachyourselfmath

A math PDF extraction engine, built for the internet. ❤️
https://github.com/viveknathani/teachyourselfmath

engine math node parser postgresql queues redis

Last synced: 2 days ago
JSON representation

A math PDF extraction engine, built for the internet. ❤️

Host: GitHub
URL: https://github.com/viveknathani/teachyourselfmath
Owner: viveknathani
License: mit
Created: 2023-10-21T20:27:33.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2025-01-09T05:08:52.000Z (about 1 month ago)
Last Synced: 2025-02-08T05:04:51.325Z (12 days ago)
Topics: engine, math, node, parser, postgresql, queues, redis
Language: TypeScript
Homepage: https://teachyourselfmath.app/
Size: 853 KB
Stars: 70
Watchers: 3
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# teachyourselfmath

![build](https://github.com/viveknathani/teachyourselfmath/actions/workflows/build.yaml/badge.svg) ![deployment](https://github.com/viveknathani/teachyourselfmath/actions/workflows/deploy.yaml/badge.svg) ![stars](https://img.shields.io/github/stars/viveknathani/teachyourselfmath?style=social)

I wished for a free website with a large list of math problems, teachyourselfmath is that website.

screenshot of website

### Description

If a document containing math problem exists, we'd like to extract every problem from it and dump it in a database. LaTeX is something that can be understood by both, computers and humans. Hence, the problem boils down to converting a PDF into LaTeX, removing the irrelevant parts, and storing the remaining parts.

When I started this project, I used [Meta's model](https://facebookresearch.github.io/nougat/) to parse academic PDF documents and find the LaTeX math in it. This approach was promising but the model is computationally expensive to run. Thankfully, LLMs have gotten much better at finding the contents of an image. If I could first convert the PDF into an image, LaTeX extraction would then be easier. And this is exactly what I am doing right now.

For a longer and less technical rant about this project - read [this](https://vivekn.dev/blog/teachyourselfmath).

### Setup

1. We use [pdf2pic](https://www.npmjs.com/package/pdf2pic) which has the following system dependencies: ghostscript, graphicsmagick.
2. You will need PostgreSQL and Redis to run this.
2. `yarn`
3. `yarn build`
4. Setup the `.env` file using the `.env.example` file.
5. `yarn start`!

### Contributing

I am happy to accept pull requests. No hard rules.

### Acknowledgements

created by Vivek Nathani ([@viveknathani_](https://twitter.com/viveknathani_)), licensed under the [MIT License](./LICENSE).