Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zdavatz/fachinfo_ai
Doing NLTK and AI on Swiss Fachinfos with Python.
https://github.com/zdavatz/fachinfo_ai
ai fachinfo nltk python
Last synced: 2 days ago
JSON representation
Doing NLTK and AI on Swiss Fachinfos with Python.
- Host: GitHub
- URL: https://github.com/zdavatz/fachinfo_ai
- Owner: zdavatz
- License: gpl-3.0
- Created: 2017-02-10T17:08:12.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-20T16:02:16.000Z (over 1 year ago)
- Last Synced: 2024-05-01T23:16:28.761Z (7 months ago)
- Topics: ai, fachinfo, nltk, python
- Language: Python
- Size: 119 KB
- Stars: 0
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fachinfo_ai
Doing NLTK and AI on Swiss Fachinfos with Python. Parsing all the important words from all FIs in Switzerland.
#### Requirements:
* List of stopwords in folder input (filename: stopwords.txt)
* Amiko sqlite DB in folder dbs (filename: amiko_db_full_idx_de.db)#### Setup:
* Create `dbs` dir and put the files `amiko_db_full_idx_de.db` and `amiko_db_full_idx_fr.db` generated with [cpp2sqlite](https://github.com/zdavatz/cpp2sqlite) there.
* From `$SRC_DIR` run with `/usr/local/bin/python3 smartinfo.py --lang=de`#### Output:
* Frequency csv file in folder output (filename: frequency.csv)
* Auto-generated stopwords file in folder output (filename: auto_stopwords.csv)#### Requirements for Linux
* pip install nltk, bs4, lxml
* import nltk
* nltk.download('stopwords','punkt')#### For Mac
* https://github.com/sashkab/homebrew-python
```
brew tap sashkab/python
brew install python35
cd $HOME/software
wget https://bootstrap.pypa.io/get-pip.py
sudo /usr/local/opt/python35/bin/python3.5 $HOME/software/get-pip.py
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install nltk
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install bs4
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install lxml
/usr/local/opt/python35/bin/python3.5
cd $SRC
mkdir dbs
```
in the Python interactive shell do `import nltk` and then do `nltk.download('stopwords')` and `nltk.download('punkt')`
then run `/usr/local/opt/python35/bin/python3.5 smartinfo.py --lang=fr`#### sqlite Database to download under the GPLv3.0 License
* http://pillbox.oddb.org/amiko_frequency.db