https://github.com/clydedacruz/worddoc-indexer-py
Indexes Word Docs after removing stopwords and lemmatization. Allows a simple boolean conjunctive query over the index
https://github.com/clydedacruz/worddoc-indexer-py
boolean-retrieval information-retrieval python
Last synced: 6 months ago
JSON representation
Indexes Word Docs after removing stopwords and lemmatization. Allows a simple boolean conjunctive query over the index
- Host: GitHub
- URL: https://github.com/clydedacruz/worddoc-indexer-py
- Owner: clydedacruz
- License: gpl-2.0
- Created: 2018-08-31T16:35:07.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-08-31T17:44:16.000Z (about 7 years ago)
- Last Synced: 2025-02-09T23:29:08.717Z (8 months ago)
- Topics: boolean-retrieval, information-retrieval, python
- Language: Python
- Size: 104 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# worddoc-indexer-py
## Dependencies
Python version: Use python version 3.5 or greaterTo install dependencies, run : `pip install nltk`
Then download nltk data : in the python prompt:
```
import nltk
nltk.download('wordnet')
```## Usage
To create index : `python create_index.py data`
To query : `python query_index.py ...`