Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saidsef/tika-document-to-text
Apache Tika - Toolkit detects and extracts metadata
https://github.com/saidsef/tika-document-to-text
apache-tika docker-container docker-image document-to-text document-to-text-ui extract-text extracts-metadata hacktoberfest k8s kubernetes text-to-speech
Last synced: about 2 months ago
JSON representation
Apache Tika - Toolkit detects and extracts metadata
- Host: GitHub
- URL: https://github.com/saidsef/tika-document-to-text
- Owner: saidsef
- License: mit
- Created: 2018-04-10T03:26:14.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2024-10-17T17:19:41.000Z (3 months ago)
- Last Synced: 2024-10-20T01:51:20.875Z (3 months ago)
- Topics: apache-tika, docker-container, docker-image, document-to-text, document-to-text-ui, extract-text, extracts-metadata, hacktoberfest, k8s, kubernetes, text-to-speech
- Language: JavaScript
- Homepage:
- Size: 519 KB
- Stars: 5
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Apache Tika Implementation [![CI](https://github.com/saidsef/faas-convert-to-text/actions/workflows/docker.yml/badge.svg)](#deployment) [![Tagging](https://github.com/saidsef/faas-convert-to-text/actions/workflows/tagging.yml/badge.svg)](#deployment) [![Release](https://github.com/saidsef/faas-convert-to-text/actions/workflows/release.yml/badge.svg)](#deployment)
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
## Prerequisite
- [Kubernetes Cluster](https://kubernetes.io/docs/tutorials/) >= 1.26
- [ArgoCD](https://argoproj.github.io/argo-cd/) (Optional)## Deployment
### Kubernetes Deployment
> Create `namespace`, via `kubectl create ns web`
> Assuming you've checked out this repo```shell
kubectl kustomize deployment/ | kubectl apply -f -
```Or, to deploy via argocd:
```bash
kubectl apply -f deployment/argocd/application.yml
```> *NOTE:* Remeber to update `Ingress` hostname
Take it for a test drive:
Via CLI:
> You'll need to forward service via `kubectl port-forward -n web svc/tika-ui 8080`
```shell
curl -d @test/url.json http://localhost:8080/ -H 'Content-Type: application/json'
```Or, via Web UI:
Using a browser visit:
```shell
http://loclahost:8080/
```