https://github.com/apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
https://github.com/apache/tika
content extraction java metadata tika
Last synced: about 2 months ago
JSON representation
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
- Host: GitHub
- URL: https://github.com/apache/tika
- Owner: apache
- License: apache-2.0
- Created: 2009-05-21T02:12:11.000Z (about 16 years ago)
- Default Branch: main
- Last Pushed: 2025-05-12T14:37:42.000Z (about 2 months ago)
- Last Synced: 2025-05-13T11:03:47.622Z (about 2 months ago)
- Topics: content, extraction, java, metadata, tika
- Language: Java
- Homepage: https://tika.apache.org/
- Size: 236 MB
- Stars: 2,981
- Watchers: 98
- Forks: 820
- Open Issues: 52
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-ccamel - apache/tika - The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). (Java)