Projects in Awesome Lists tagged with corpus-builder
A curated list of projects in awesome lists tagged with corpus-builder .
https://github.com/adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 14 Mar 2025
https://github.com/google/corpuscrawler
Crawler for linguistic corpora
corpus-builder corpus-linguistics crawling linguistics minority-language
Last synced: 14 Mar 2025
https://github.com/dohliam/ebook-corpus
Ebook Corpus - A parser and extractor for electronic books
corpus corpus-builder corpus-linguistics ebook-parsing ebooks epub fb2 mobi
Last synced: 14 Apr 2025
https://github.com/andythefactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
article-extractor corpus corpus-builder corpus-tools dataset datasets html-to-markdown html2text news news-aggregator news-crawler readability scraping scraping-websites text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 18 Feb 2025
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 14 May 2025
https://github.com/writecrow/crow_frontend
The user interface for the Corpus & Repository of Writing, built in Angular
angular corpora corpus corpus-builder corpus-linguistics natural-language-processing
Last synced: 21 Mar 2025
https://github.com/writecrow/crow_backend
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
api backend corpus corpus-builder corpus-generator corpus-linguistics natural-language-processing
Last synced: 21 Mar 2025
https://github.com/c0ntradicti0n/corpuscookapp
App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions
amp corpus-builder corpus-linguistics kivy-application nlp-machine-learning python3 twisted
Last synced: 11 Mar 2025