https://github.com/andrewrporter/wikipedia-stoplist
Build robust StopLists from Wikipedia articles
https://github.com/andrewrporter/wikipedia-stoplist
natural-language-processing nlp python stoplist wikipedia wikipedia-api
Last synced: about 1 year ago
JSON representation
Build robust StopLists from Wikipedia articles
- Host: GitHub
- URL: https://github.com/andrewrporter/wikipedia-stoplist
- Owner: AndrewRPorter
- License: mit
- Created: 2019-07-02T23:11:46.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-10-17T21:40:05.000Z (over 6 years ago)
- Last Synced: 2025-01-28T03:21:36.978Z (over 1 year ago)
- Topics: natural-language-processing, nlp, python, stoplist, wikipedia, wikipedia-api
- Language: Python
- Homepage:
- Size: 23.4 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
wikipedia-stoplist
==================
This project will serve as an exploration into building robust stoplists from Wikipedia article contents.
Usage
=====
`$ python main.py --num-pages 50 --term-freq 0.6 --limit 200`
This will generate a csv file called output.csv. You can pass this CSV file in to be analyzed again
with:
`$ python main.py --input output.csv --term-freq 0.6 --limit 200`