An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with text-data

A curated list of projects in awesome lists tagged with text-data .

https://github.com/asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python tensorflow texar text-data text-generation xlnet

Last synced: 14 May 2025

https://github.com/asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Last synced: 08 Oct 2025

https://github.com/asyml/forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing pipeline python text-data

Last synced: 04 Apr 2025

https://github.com/lolei/redditcleaner

Cleans Reddit Text Data :scroll: :broom:

data-cleaning hacktoberfest nlp praw psaw pushshift python reddit text-data

Last synced: 22 Jul 2025

https://github.com/trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

doc docx pdf-reading r read-transcripts text-data text-mining

Last synced: 16 Mar 2025

https://github.com/balaka-18/rake_new2

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

keyword-extraction keyword-search keywords nlp python-library text text-data

Last synced: 30 Jun 2025

https://github.com/tylerjthomas9/scrapesec.jl

Scrape EDGAR filings from https://www.sec.gov/

edgar finance financial-data julia scraper sec text-data

Last synced: 07 May 2025

https://github.com/hsankesara/the-tweets-of-wisdom

A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

nlp text-data text-datasets tweepy tweets

Last synced: 13 Oct 2025

https://github.com/mrchypark/gomsubtitledata

곰tv 자막 데이터 수집 코드

data drama korean movies r subtitles text text-data

Last synced: 17 Oct 2025

https://github.com/signaln/parallelio

For reading from and writing to parallel data files in Python

machine-learning natural-language-processing pre-processing preprocessing text text-data

Last synced: 14 Jan 2026

https://github.com/infinitode/crsd

A synthetic customer review sentiment dataset for sentiment analysis generated using different AI models.

ai data dataset datasets huggingface-datasets mit-license ml nlp open-source python sentiment sentiment-analysis sentiment-classification text-data

Last synced: 10 Jun 2026

https://github.com/mhenderson/pages2df

Read morning pages into a data frame in R.

morning-pages rstats rstats-package text-data

Last synced: 05 Mar 2025

https://github.com/putuwaw/slr-emotion-classification

Systematic Literature Review: Machine Learning Methods in Emotion Classification in Textual Data

emotion-classification sisfokom systematic-literature-review text-data

Last synced: 20 Feb 2026

https://github.com/klaragtknst/text_topic

This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.

documents elasticsearch embeddings fca ner ner-clustering sentence-transformers text-data top2vec topic-aggregation topics-modeling visualisation

Last synced: 26 Feb 2025

https://github.com/infinitode/duplipy

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

ai augmentation data-analysis data-preprocessing data-science images language-models nlp preprocessing text-data text-datasets text-formatting

Last synced: 15 Apr 2026

https://github.com/fareedkhan-dev/nlp-1k-stories-dataset-genres-100

This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.

dataset deep-learning llm machine-learning nlp python text-data

Last synced: 09 Jun 2026