Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alleyinteractive/searchpress-attachments
SearchPress Attachment File Contents
https://github.com/alleyinteractive/searchpress-attachments
Last synced: 4 days ago
JSON representation
SearchPress Attachment File Contents
- Host: GitHub
- URL: https://github.com/alleyinteractive/searchpress-attachments
- Owner: alleyinteractive
- Created: 2021-05-27T16:53:35.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-02-09T13:57:48.000Z (almost 3 years ago)
- Last Synced: 2024-04-09T23:07:10.902Z (7 months ago)
- Language: PHP
- Size: 13.7 KB
- Stars: 1
- Watchers: 31
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SearchPress Attachments Add-On
The [ingest attachment plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html) lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika. The source field must be a base64 encoded binary.
## Requirements
* [SearchPress v0.4+](https://github.com/alleyinteractive/searchpress)
* Elasticsearch v7+
* [Elasticsearch's Ingest attachment plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html)## Instructions
To use this plugin, you must first ensure that the ingest attachment plugin is active on the Elasticsearch node that your project is using. You can confirm which plugins are currently active on your node by sending a GET request to `/_cat/plugins`. Instructions for installing the plugin are unique to the hosting service, so be sure to confirm that the plugin can be used in advance of planning to use this add-on.
Once the ingest attachment plugin is installed on your node, you must also ensure that SearchPress is also active. Once your project meets these two requirements, simply install and activate this add-on. No further configuration is necessary.
You may find that your indexing operations fail if the index request attempts to send too much data. If this is the case, you will likely need to find a good balance between the value set for `sp_attachments_max_file_size` and SearchPress' bulk size `\SP_Sync_Meta()->bulk`. Lowering both of these will result in smaller remote request sizes sent to the ES instance, but the degree to which either (or both) are lowered will depend on project specifics.
## Filters
* `sp_attachments_max_file_size` - Filters the max file size for indexed attachments. If a file exceeds this limit, the file contents will not be added to the index for this document. The attachment data will otherwise be indexed. Defaults to 5MB.
* `sp_attachments_index_file_contents` - Filters whether or not a given file's contents should be indexed. This can be used to override indexing file contents that would otherwise be skipped by `sp_attachments_max_file_size`.
*