An open API service indexing awesome lists of open source software.

https://github.com/rexlow/curiouskid


https://github.com/rexlow/curiouskid

Last synced: over 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# Curious Kid

A POC repository to get some ideas our of my head. Some of the work that will be included in this repository

1. Important word extraction
2. Identify important word segments from a sentence
3. Tokenization and Part-of-Speech (POS) tagging with `spacy`
4. Identity clauses and verbs
5. NER tagger
6. Generate questions from text blobs
7. Maybe deep learning approach?

## To build

Detail insturctions will be included when the work is done.

### Download Encoders and Word Vectors

```
bash download_importance.sh
```

### Install dependencies

```
pip3 install -r requirements.txt
```

### Install Spacy models

Pick either `en_core_web_sm` or `en_core_web_trf` for name entity recognition task.

```
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_trf
```

## Usage

### Important word extraction

```
python3 importance.py
```