Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
https://github.com/ggteixeira/corpus-cleaner
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 5 days ago
JSON representation
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
- Host: GitHub
- URL: https://github.com/ggteixeira/corpus-cleaner
- Owner: ggteixeira
- Created: 2021-12-17T20:09:06.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2021-12-18T17:24:57.000Z (almost 3 years ago)
- Last Synced: 2023-03-05T19:58:13.591Z (over 1 year ago)
- Topics: beautifulsoup4, bs4, corpora, corpus, corpus-linguistics, crawler, linguistics, nlp, python, scraper, web-scraping
- Language: Python
- Homepage:
- Size: 2.75 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0