https://github.com/saidsef/tika-document-to-text
Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows
https://github.com/saidsef/tika-document-to-text
docker-container document-to-text document-to-text-ui extract-text helm-chart kubernetes kubernetes-deployment nodejs python text-extraction text-to-speech
Last synced: 2 months ago
JSON representation
Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows
- Host: GitHub
- URL: https://github.com/saidsef/tika-document-to-text
- Owner: saidsef
- License: mit
- Created: 2018-04-10T03:26:14.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2026-04-01T00:14:00.000Z (2 months ago)
- Last Synced: 2026-04-01T02:42:41.035Z (2 months ago)
- Topics: docker-container, document-to-text, document-to-text-ui, extract-text, helm-chart, kubernetes, kubernetes-deployment, nodejs, python, text-extraction, text-to-speech
- Language: JavaScript
- Homepage:
- Size: 676 KB
- Stars: 5
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Apache Tika Implementation [](#deployment) [](#deployment) [](#deployment)
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
## Prerequisite
- [Kubernetes Cluster](https://kubernetes.io/docs/tutorials/) >= 1.27
- [ArgoCD](https://argoproj.github.io/argo-cd/) (Optional)
## Deployment
### Kubernetes Deployment
> Create `namespace`, via `kubectl create ns web`
> Assuming you've checked out this repo
```shell
kubectl kustomize deployment/ | kubectl apply -f -
```
Or, to deploy via argocd:
```bash
kubectl apply -f deployment/argocd/application.yml
```
> *NOTE:* Remeber to update `Ingress` hostname
Take it for a test drive:
Via CLI:
> You'll need to forward service via `kubectl port-forward -n web svc/tika-ui 8080`
```shell
curl -d @test/url.json http://localhost:8080/ -H 'Content-Type: application/json'
```
Or, via Web UI:
Using a browser visit:
```shell
http://loclahost:8080/
```