https://github.com/apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
https://github.com/apache/tika
content extraction java metadata tika
Last synced: 14 days ago
JSON representation
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
- Host: GitHub
- URL: https://github.com/apache/tika
- Owner: apache
- License: apache-2.0
- Created: 2009-05-21T02:12:11.000Z (almost 16 years ago)
- Default Branch: main
- Last Pushed: 2025-03-24T15:43:59.000Z (22 days ago)
- Last Synced: 2025-03-26T21:19:16.480Z (20 days ago)
- Topics: content, extraction, java, metadata, tika
- Language: Java
- Homepage: https://tika.apache.org/
- Size: 236 MB
- Stars: 2,877
- Watchers: 98
- Forks: 811
- Open Issues: 51
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE.txt
Awesome Lists containing this project
- awesome - apache/tika - The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). (Java)
- awesome-ccamel - apache/tika - The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). (Java)