https://github.com/beechit/solrfal-textextract
dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS
https://github.com/beechit/solrfal-textextract
Last synced: 10 months ago
JSON representation
dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS
- Host: GitHub
- URL: https://github.com/beechit/solrfal-textextract
- Owner: beechit
- Created: 2017-01-11T10:02:34.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2021-03-19T10:26:02.000Z (over 5 years ago)
- Last Synced: 2025-07-02T10:51:17.043Z (12 months ago)
- Language: PHP
- Size: 20.5 KB
- Stars: 1
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Text extraction for Apache Solr + TYPO3
=======================================
This TYPO3 extension provides a hook/aspect that uses the signal of ext:solrfal during indexing to extract the contents
of known text files.
It uses the binary `pdftotext` for this (when present on the machine) and has a fallback to the standalone apache Tika jar (when present on the system).
There are some additional checks when processing pdf files to determine if the contents is encrypted.
If encrypted it tries the fallback to `tika`.