https://github.com/nating/lucene-search-engine
Indexing and Searching for documents using Apache Lucene. 🔍
https://github.com/nating/lucene-search-engine
Last synced: 9 months ago
JSON representation
Indexing and Searching for documents using Apache Lucene. 🔍
- Host: GitHub
- URL: https://github.com/nating/lucene-search-engine
- Owner: nating
- License: mit
- Created: 2018-10-20T18:34:11.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-20T20:27:07.000Z (over 7 years ago)
- Last Synced: 2025-01-02T10:46:18.389Z (over 1 year ago)
- Language: C
- Homepage:
- Size: 8.52 MB
- Stars: 0
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# lucene-search-engine 🔍
A [Lucene](https://lucene.apache.org/) implementation of indexing and searching documents from the [Cranfield collection](http://ir.dcs.gla.ac.uk/resources/test_collections/cran/), evaluated using [trec_eval](https://trec.nist.gov/trec_eval/).
## Usage
*(You will need [maven](https://maven.apache.org/), [java](https://www.java.com/), [python](https://www.python.org/) x)*
To download and run the code:
```bash
git clone https://github.com/nating/lucene-search-engine
cd lucene-search-engine
./run.sh
```
You will see the trec_eval results of the Lucene application.
## Output
The output of run.sh should look like this:
|Metric |Geoff's |Seamus' |Difference |
|---|---|---|---|
|num_q |225.0 |225.0 |0.0 |
|num_rel |218235.0|178376.0|39859.0 |
|um_rel |1837.0 |1837.0 |0.0 |
|num_rel_ret|1771.0 |1714.0 |57.0 |
|ap |0.425 |0.375 |0.05 |
|...|...|...|...|
These outputs represent:
* The metrics from trec_eval.
* The results of the application (Geoff's).
* The results given as an example/ground truth in the TCD Computer Science module CS7IS3 (Seamus').
* The difference between the two results for that metric.
## Files
* [**run.sh**](https://github.com/nating/lucene-search-engine/blob/master/run.sh) packages the maven app, indexes the cran files, performs the cran queries across the indexes, runs trec_eval, creates a results file and outputs the results of trec_eval.
* [**IndexFiles.java**](https://github.com/nating/lucene-search-engine/blob/master/luceneapp/src/main/java/com/mycompany/luceneapp/IndexFiles.java) contains the code for indexing the cranfield documents.
* [**CustomAnalyzer.java**](https://github.com/nating/lucene-search-engine/blob/master/luceneapp/src/main/java/com/mycompany/luceneapp/CustomAnalyzer.java) contains the custom analyzer for analyzing document and query content.
* [**SearchFiles.java**](https://github.com/nating/lucene-search-engine/blob/master/luceneapp/src/main/java/com/mycompany/luceneapp/SearchFiles.java) contains the code for running the cranfield queries and creating the results file.