Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

doc docx pdf-reading r read-transcripts text-data text-mining

Last synced: 20 May 2024

https://github.com/PedroBarcha/old-books-dataset

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

binarization binarized-dataset books-dataset dataset ground-truth groundtruth ocr-database ocr-dataset old-books old-documents text text-data text-database

Last synced: 21 Apr 2024

https://github.com/asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Last synced: 19 Apr 2024

https://github.com/asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python tensorflow texar text-data text-generation xlnet

Last synced: 11 Apr 2024