An open API service indexing awesome lists of open source software.

https://github.com/baughmann/tikara

The metadata and text content extractor for almost every file type.
https://github.com/baughmann/tikara

apache-tika content-extraction document-parsing document-processing docx image-to-text java language-detection llm metadata metadata-extraction ml natural-language-processing ocr pdf-to-text retrieval-augmented-generation text-extraction text-mining

Last synced: 6 days ago
JSON representation

The metadata and text content extractor for almost every file type.

Awesome Lists containing this project