Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with text-mining

A curated list of projects in awesome lists tagged with text-mining .

https://github.com/deanmalmgren/textract

extract text from any document. no muss. no fuss.

data-mining natural-language-processing python text-mining

Last synced: 16 Dec 2024

https://github.com/chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

artificial-intelligence data-science language-model natural-language-processing nlp open python text-mining

Last synced: 21 Dec 2024

https://github.com/ujjwalkarn/datasciencer

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 20 Dec 2024

https://github.com/ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 16 Nov 2024

https://github.com/konlpy/konlpy

Python package for Korean natural language processing.

hacktoberfest korean korean-nlp morphology nlp python text-mining

Last synced: 21 Dec 2024

https://github.com/dgrtwo/tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

book bookdown r text-mining tidyverse

Last synced: 19 Dec 2024

https://github.com/juliasilge/tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

natural-language-processing r text-mining tidy-data tidyverse

Last synced: 19 Dec 2024

https://github.com/shangjingbo1226/autophrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

automatic compound-words lexicon multi-language phrase quality-phrases text-mining

Last synced: 21 Dec 2024

https://github.com/kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

gensim machine-learning natural-language-processing nlp text-classification text-mining tf-idf word2vec

Last synced: 21 Dec 2024

https://github.com/csurfer/rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

algorithm keyword-extraction nltk python text-mining

Last synced: 20 Dec 2024

https://github.com/gsh199449/spider

A configurable web spider with a easy-to-use web console

cralwer gatherplatform spider text-mining web-console

Last synced: 18 Nov 2024

https://github.com/opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

annotation faceted-search fulltext-search investigative-journalism journalism named-entity-recognition ocr ontologies osint python research-tool search search-engine search-interface semantic skos text-analysis text-mining thesaurus ui

Last synced: 18 Dec 2024

https://github.com/dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec

Last synced: 25 Oct 2024

https://github.com/cpsievert/ldavis

R package for web-based interactive topic model visualization.

javascript r text-mining topic-modeling visualization

Last synced: 21 Dec 2024

https://github.com/cpsievert/LDAvis

R package for web-based interactive topic model visualization.

javascript r text-mining topic-modeling visualization

Last synced: 27 Oct 2024

https://github.com/adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

computational-linguistics corpus-linguistics german-language natural-language-processing nlp text-mining

Last synced: 26 Oct 2024

https://github.com/lining0806/textmining

Python文本挖掘系统 Research of Text Mining System

jieba sklearn stopwords text-mining tf-idf user-dict

Last synced: 17 Dec 2024

https://github.com/jalajthanaki/nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

deep-learning feature-engineering feature-extraction feature-selection natural-language-processing parsing part-of-speech python-scripting-language python2 text-mining

Last synced: 22 Dec 2024

https://github.com/jalajthanaki/NLPython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

deep-learning feature-engineering feature-extraction feature-selection natural-language-processing parsing part-of-speech python-scripting-language python2 text-mining

Last synced: 27 Nov 2024

https://github.com/ropensci-archive/rplos

:warning: ARCHIVED :warning: R client for the PLoS Journals API

metadata pdf plos r r-package rstats text-mining web-api xml

Last synced: 29 Nov 2024

https://github.com/mcs07/chemdataextractor

Automatically extract chemical information from scientific documents

chemistry information-extraction natural-language-processing nlp python text-mining

Last synced: 18 Dec 2024

https://github.com/neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

google-colab grammar-checks hacktoberfest jupyter kaggle-kernels natural-language-processing nlp nlp-keywords-extraction nlp-library nlp-machine-learning nlp-parsing nlp-profiler profiler profiling profiling-datasets text-mining

Last synced: 21 Dec 2024

https://github.com/blueprints-for-text-analytics-python/blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

machine-learning natural-language-processing python text-mining

Last synced: 17 Nov 2024

https://github.com/bnosac/udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r r-package r-pkg rcpp text-mining tokenizer udpipe

Last synced: 21 Dec 2024

https://github.com/bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

html-extraction html-extractor html-parsing python text-extraction text-mining

Last synced: 28 Oct 2024

https://github.com/giacbrd/ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

fasttext gensim machine-learning neural-network online-learning scikit-learn shallow-learning supervised-learning text-classification text-mining word-embeddings word2vec

Last synced: 27 Nov 2024

https://github.com/giacbrd/shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

fasttext gensim machine-learning neural-network online-learning scikit-learn shallow-learning supervised-learning text-classification text-mining word-embeddings word2vec

Last synced: 18 Dec 2024

https://github.com/ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

nlp peer-reviewed r r-package rstats text-mining tokenizer

Last synced: 22 Nov 2024

https://github.com/trinker/qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting

Last synced: 21 Dec 2024

https://github.com/karolzak/support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

ai artificial-intelligence azure azure-app-service azure-machine-learning azure-web-app-service azure-webapp classification classifier machine-learning ml model numpy pandas python text-analysis text-classification text-mining text-processing web-service

Last synced: 18 Dec 2024

https://github.com/luozhouyang/autophrasex

Automated Phrase Mining from Massive Text Corpora in Python.

autophrase phrase-extraction phrase-mining text-mining

Last synced: 17 Dec 2024

https://github.com/mkearney/textfeatures

👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

feature-extraction machine-learning mkearney-r-package neural-network neural-networks r rstats text-mining word2vec

Last synced: 17 Dec 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 22 Nov 2024

https://github.com/emilhvitfeldt/r-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 18 Dec 2024

https://github.com/Planeshifter/text-miner

text mining utilities for Node.js

nlp text-mining

Last synced: 10 Nov 2024

https://github.com/planeshifter/text-miner

text mining utilities for Node.js

nlp text-mining

Last synced: 20 Dec 2024

https://github.com/brandonrobertz/sparselsh

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

clustering data-mining machine-learning sparse-matrices text-mining

Last synced: 16 Dec 2024

https://github.com/josiahparry/genius

Easily access song lyrics from Genius in a tibble.

music-information-retrieval song-lyrics text-mining

Last synced: 06 Nov 2024

https://github.com/tiesdekok/python_nlp_tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

computational-linguistics natural-language-processing nlp nltk python research spacy text-mining textblob textual-analysis

Last synced: 14 Oct 2024

https://github.com/dipanjans/learning-social-media-analytics-with-r

This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt

analytics facebook flickr foursquare ggplot2 github guardian news r sentiment-analysis social-data social-media social-network-analysis stackexchange stackoverflow text-mining topic-modeling twitter

Last synced: 16 Nov 2024

https://github.com/aphp/edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

clinical-data-warehouse deep-learning fast french medical multi-task nlp pytorch rule-based spacy text-mining

Last synced: 21 Dec 2024

https://github.com/YaleDHLab/intertext

Detect and visualize text reuse

data-visualization minhash text-mining web-app

Last synced: 20 Nov 2024

https://github.com/trinker/lexicon

A data package containing lexicons and dictionaries for text analysis

hash lexicon lookup names-frequent r stopwords text-dictionaries text-mining

Last synced: 20 Dec 2024

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 25 Nov 2024

https://github.com/nicholasmamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 19 Dec 2024

https://github.com/bnosac/ruimtehol

R package to Embed All the Things! using StarSpace

classification embeddings natural-language-processing nlp r similarity starspace text-mining

Last synced: 17 Dec 2024

https://github.com/juliasilge/janeaustenr

An R Package for Jane Austen's Complete Novels :orange_book:

jane-austen novels r text-mining

Last synced: 16 Dec 2024

https://github.com/SentometricsResearch/sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

nlp prediction sentiment-analysis text-mining time-series

Last synced: 11 Nov 2024

https://github.com/AllenDang/PipeIt

PipeIt is a text transformation, conversion, cleansing and extraction tool.

text-mining text-processing

Last synced: 12 Nov 2024

https://github.com/dmitryryumin/emnlp-2023-papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

bert computational-linguistics emnlp emnlp2023 gpt language-models llms machine-learning machine-translation multilingual-nlp named-entity-recognition natural-language-processing ner nlp nlp-applications sentiment-analysis syntax-and-semantics text-mining transformers word-embeddings

Last synced: 15 Nov 2024

https://github.com/allendang/pipeit

PipeIt is a text transformation, conversion, cleansing and extraction tool.

text-mining text-processing

Last synced: 08 Nov 2024

https://github.com/trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

doc docx pdf-reading r read-transcripts text-data text-mining

Last synced: 27 Oct 2024

https://github.com/wrathematics/ngram

Fast n-Gram Tokenization

ngram r text text-mining

Last synced: 17 Dec 2024

https://github.com/bnosac/pattern.nlp

R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian

nlp pattern pos-tagging r sentiment-analysis text-mining

Last synced: 11 Nov 2024

https://github.com/amcrisan/Adjutant

Runs a pubmed query, returns results and allows user to explore high-level structure of returned documents

pubmed r shiny summaries text-mining

Last synced: 04 Dec 2024

https://github.com/kevalmorabia97/sedtwik-event-detection-from-tweets

Segmentation based event detection from Tweets. Published at NAACL SRW 2019

event-detection machine-learning nlp segment-based-event-detection text-mining tweets twitter

Last synced: 21 Nov 2024

https://github.com/greenelab/snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

analysis dataset hetnet machine-learning methodology nlp script snorkel text-mining tool workflow

Last synced: 13 Nov 2024

https://github.com/pjhampton/woolly

The Text Mining Elixir

text-analysis text-mining

Last synced: 17 Nov 2024

https://github.com/inaridiy/webforai

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

article-extractor extractor html-to-markdown html2markdown html2md html2text readability scraping text-mining

Last synced: 19 Dec 2024

https://github.com/duhaime/minhash

Quickly estimate the similarity between many sets

locality-sensitive-hashing lsh minhash text-mining

Last synced: 14 Oct 2024

https://github.com/ropensci/jstor

Import journal data from DfR (JSTOR)

jstor peer-reviewed r r-package rstats text-analysis text-mining

Last synced: 22 Nov 2024

https://github.com/juliasilge/learntidytext

Learn about text mining 📄 with tidy data principles

online-course rstats text-mining

Last synced: 28 Oct 2024

https://github.com/trinker/textstem

Tools for fast text stemming & lemmatization

lemmatization r stemming text-mining

Last synced: 27 Oct 2024

https://github.com/greenelab/pubtator

Retrieve and process PubTator annotations

data nlp pubmed pubtator snorkel text-mining tool

Last synced: 13 Nov 2024