Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kimtth/pyspark-tika-text-extraction

πŸš΄β€β™‚οΈβ›·Data Lake, Performance tuning for text extraction from a huge amount of files.
https://github.com/kimtth/pyspark-tika-text-extraction

apache-spark apache-tika data-pipeline datalake multithreading pyspark spark tika-python

Last synced: 1 day ago
JSON representation

πŸš΄β€β™‚οΈβ›·Data Lake, Performance tuning for text extraction from a huge amount of files.

Awesome Lists containing this project