https://github.com/hazembz/pdf-fuzz
PoC bulk search you pdf files using text look up
https://github.com/hazembz/pdf-fuzz
elasticsearch pdf-document python react
Last synced: about 2 months ago
JSON representation
PoC bulk search you pdf files using text look up
- Host: GitHub
- URL: https://github.com/hazembz/pdf-fuzz
- Owner: HazemBZ
- Created: 2024-07-27T17:25:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-13T09:19:07.000Z (over 1 year ago)
- Last Synced: 2025-02-01T22:01:50.639Z (about 1 year ago)
- Topics: elasticsearch, pdf-document, python, react
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pdf-fuzz
PoC bulk search your pdf files using fuzzy text look up.
## Setup
__Requirements__
- Docker
- docker-compose
__Run this project__
Clone project and submodules:
```sh
git clone --recurse-submodules https://github.com/HazemBZ/pdf-fuzz
```
Spin up containers:
```sh
docker-compose up
```
Access app at: `http://localhost:88`
## Documentation
System architectures are described [here](docs/diagrams/architecture.md).
## TODOs
__V0-PoC__
- [x] CI/CD: docker-compose -> one click project spin up.
- [x] BE: ETL solution for text lookup -> Faster lookups, extract once use forever.
- [x] FE: Handle queries w/ ReactQuery -> DX.
- [x] FE/BE: Files uploader -> QoL.
__V1__
- [x] BE: Task Queue solution for files processing -> Seperation of concerns.
- [x] FE/FE: Basic file deduplication
- [x] BE: Refactor into pipelines and orchestrators
- [x] BE: Test code
- [x] BE: (docs) Add diagrams
- [x] CI/CD: Auto migration setup