https://github.com/apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
https://github.com/apache/tika

content extraction java metadata tika

Last synced: 10 months ago
JSON representation

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Host: GitHub
URL: https://github.com/apache/tika
Owner: apache
License: apache-2.0
Created: 2009-05-21T02:12:11.000Z (about 17 years ago)
Default Branch: main
Last Pushed: 2025-09-03T04:13:20.000Z (10 months ago)
Last Synced: 2025-09-04T00:47:23.242Z (10 months ago)
Topics: content, extraction, java, metadata, tika
Language: Java
Homepage: https://tika.apache.org/
Size: 238 MB
Stars: 3,186
Watchers: 97
Forks: 846
Open Issues: 56
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE.txt

Awesome Lists containing this project

awesome-java - Apache Tika
awesome-ccamel - apache/tika - The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). (Java)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/apache/tika

Awesome Lists containing this project