https://github.com/rexlow/curiouskid
https://github.com/rexlow/curiouskid
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/rexlow/curiouskid
- Owner: rexlow
- Created: 2021-02-05T10:05:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-02-07T04:50:32.000Z (over 5 years ago)
- Last Synced: 2025-01-23T11:23:58.314Z (over 1 year ago)
- Language: Python
- Size: 7.81 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Curious Kid
A POC repository to get some ideas our of my head. Some of the work that will be included in this repository
1. Important word extraction
2. Identify important word segments from a sentence
3. Tokenization and Part-of-Speech (POS) tagging with `spacy`
4. Identity clauses and verbs
5. NER tagger
6. Generate questions from text blobs
7. Maybe deep learning approach?
## To build
Detail insturctions will be included when the work is done.
### Download Encoders and Word Vectors
```
bash download_importance.sh
```
### Install dependencies
```
pip3 install -r requirements.txt
```
### Install Spacy models
Pick either `en_core_web_sm` or `en_core_web_trf` for name entity recognition task.
```
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_trf
```
## Usage
### Important word extraction
```
python3 importance.py
```