https://github.com/beechit/solrfal-textextract

dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS
https://github.com/beechit/solrfal-textextract

Last synced: 10 months ago
JSON representation

dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS

Host: GitHub
URL: https://github.com/beechit/solrfal-textextract
Owner: beechit
Created: 2017-01-11T10:02:34.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2021-03-19T10:26:02.000Z (over 5 years ago)
Last Synced: 2025-07-02T10:51:17.043Z (12 months ago)
Language: PHP
Size: 20.5 KB
Stars: 1
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          Text extraction for Apache Solr + TYPO3

=======================================

This TYPO3 extension provides a hook/aspect that uses the signal of ext:solrfal during indexing to extract the contents 

of known text files.

 

It uses the binary `pdftotext` for this (when present on the machine) and has a fallback to the standalone apache Tika jar (when present on the system).

There are some additional checks when processing pdf files to determine if the contents is encrypted. 

If encrypted it tries the fallback to `tika`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/beechit/solrfal-textextract

Awesome Lists containing this project

README