Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hazembz/pdf-fuzz
PoC bulk search you pdf files using text look up
https://github.com/hazembz/pdf-fuzz
elasticsearch pdf-document python react
Last synced: 3 months ago
JSON representation
PoC bulk search you pdf files using text look up
- Host: GitHub
- URL: https://github.com/hazembz/pdf-fuzz
- Owner: HazemBZ
- Created: 2024-07-27T17:25:46.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-13T09:19:07.000Z (5 months ago)
- Last Synced: 2024-10-11T22:02:54.940Z (3 months ago)
- Topics: elasticsearch, pdf-document, python, react
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pdf-fuzz
PoC bulk search your pdf files using fuzzy text look up.
## How to
__Requirements__
- Docker
- docker-compose__Run this project__
1.Clone project and submodules: run `git clone --recurse-submodules https://github.com/HazemBZ/pdf-fuzz`.
2.Drop a folder with pdf files inside `pdf_fuzz_back/assets` folder (smaller number of files -> less time to process).
3.Index db with pdf contents: `docker-compose exec backend bash -c "python manage.py reindex"`.
4.Spin up containers: run `docker-compose up`.
__Update your pdf file__
After changing the contents of `pdf_fuzz_back/assets`, reindex with: `docker-compose exec backend bash -c "python manage.py reindex"`.
## TODOs
- [x] v0 PoC.
- [x] CI/CD: docker-compose -> one click project spin up.
- [x] BE: ETL solution for text lookup -> Faster lookups, extract once use forever.
- [ ] FE: Handle queries w/ ReactQuery -> DX.
- [ ] FE/BE: Files uploader -> QoL.
- [ ] BE: Task Queue solution for files processing -> Seperation of concerns.