https://github.com/fclairamb/myse
MySE
https://github.com/fclairamb/myse
Last synced: 8 months ago
JSON representation
MySE
- Host: GitHub
- URL: https://github.com/fclairamb/myse
- Owner: fclairamb
- Created: 2015-03-20T17:20:20.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2020-10-17T13:59:08.000Z (over 5 years ago)
- Last Synced: 2025-03-23T22:53:24.810Z (about 1 year ago)
- Language: Java
- Homepage: http://myse.io
- Size: 1.56 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MySE
MySE stands for _my search engine_.
## Goals
MySE allows to index the content of multiple sources of data into a simple search engine.
## Install
* Windows / Mac Os X / Linux manual : [the JAR](http://update.myse.io)
* Linux Debian with APT: [The APT repository](http://apt.webingenia.com)
## Current status
It's not production-ready, use it at your own risk. That being said, there's very little
risk in letting in run over your files as it only reads them.
## Supported system
It is made in java so it should work everywhere. It requires at least java 1.7. It has been tested on:
* Linux Debian
* Windows 7+
* Mac Os X
## Supported sources
* Local disk
* Apache Commons VFS
* Samba (for Windows shared drive)
* Dropbox
* Google drive
* FTPS / FTPeS (Apache Commons didn't work so well on these)
* Web exploration (basic web crawler, buggy at this stage)
## How it works
All data is stored in ~/.myse/
The search engine list all the files and tries to fetch text data from them.
The frontend uses [angular](https://angularjs.org/).
Core libraries are:
* [Jetty 9](http://eclipse.org/jetty/) (embedded)
* [Elasticsearch](https://www.elastic.co/products/elasticsearch) (embedded)
* [H2](http://www.h2database.com/html/main.html) (embedded) with JPA
* [Apache Tika](https://tika.apache.org/)
* [Apache Commons VFS](http://commons.apache.org/proper/commons-vfs/)
* [JCIFS](https://jcifs.samba.org/)
* [Google Drive API](https://developers.google.com/drive/web/quickstart/quickstart-java)
* [Dropbox API](https://www.dropbox.com/developers/core/start/java)
## Known issues
* Samba: Files starting with a space are constantly marked for indexation and deletion but never actually indexed. It's a jcifs issue, you can only remove the trailing space at this stage.
* Indexation is slow: It's actually intended to avoid scanning too much at once.