https://github.com/dadoonet/fscrawler
Elasticsearch File System Crawler (FS Crawler)
https://github.com/dadoonet/fscrawler
crawler elasticsearch java tika
Last synced: 17 days ago
JSON representation
Elasticsearch File System Crawler (FS Crawler)
- Host: GitHub
- URL: https://github.com/dadoonet/fscrawler
- Owner: dadoonet
- License: apache-2.0
- Created: 2012-06-08T17:23:03.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T07:51:25.000Z (6 months ago)
- Last Synced: 2024-10-29T15:11:04.219Z (6 months ago)
- Topics: crawler, elasticsearch, java, tika
- Language: Java
- Homepage: https://fscrawler.readthedocs.io/
- Size: 14.9 MB
- Stars: 1,353
- Watchers: 73
- Forks: 300
- Open Issues: 139
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# File System Crawler for Elasticsearch
Welcome to the FS Crawler for [Elasticsearch](https://elastic.co/)
This crawler helps to index binary documents such as PDF, Open Office, MS Office.
**Main features**:
* Local file system (or a mounted drive) crawling and index new files, update existing ones and removes old ones.
* Remote file system over SSH/FTP crawling.
* REST interface to let you "upload" your binary documents to elasticsearch.## Latest versions
Current "most stable" versions are:
| Elasticsearch | FS Crawler | Released | Docs |
|---------------|---------------|------------|-------------------------------------------------------------------------------|
| 6.x, 7.x, 8.x | 2.10-SNAPSHOT | | [2.10-SNAPSHOT](https://fscrawler.readthedocs.io/en/latest/) |[](https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-distribution/)

[](https://s01.oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-distribution/)


## Build and Quality Status
[](https://github.com/dadoonet/fscrawler/actions/workflows/maven.yml)
[](https://fscrawler.readthedocs.io/en/latest/?badge=latest)[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)## GitHub stats



## Documentation
The guide has been moved to [ReadTheDocs](https://fscrawler.readthedocs.io/en/latest/).

## Contribute
Works on my machine - and yours ! Spin up pre-configured, standardized dev environments of this repository, by clicking on the button below.
[](https://gitpod.io/#/https://github.com/dadoonet/fscrawler)
# License

Read more about the [Apache2 License](https://fscrawler.readthedocs.io/en/latest/index.html#license).
# Thanks
Thanks to [JetBrains](https://www.jetbrains.com/?from=FSCrawler) for the IntelliJ IDEA License!
Thanks to SonarCloud for the free analysis!
[](https://sonarcloud.io/summary/new_code?id=dadoonet_fscrawler)