Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/CogComp/cogcomp-nlpy

CogComp's light-weight Python NLP annotators

data-mining natural-language-processing nlp text-mining text-processing

Last synced: 29 Jun 2024

https://github.com/YaleDHLab/intertext

Detect and visualize text reuse

data-visualization minhash text-mining web-app

Last synced: 27 Jun 2024

https://github.com/bhattbhavesh91/texthero-demo

Tutorial to demonstrate the power of Texthero which is a library used for Text preprocessing, representation and visualization from zero to hero.

nlp nlp-pipeline text-clustering text-mining text-preprocessing text-representation text-visualization texthero texthero-tutorial word-embeddings

Last synced: 20 Jun 2024

https://github.com/Planeshifter/text-miner

text mining utilities for Node.js

nlp text-mining

Last synced: 19 Jun 2024

https://github.com/opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

annotation faceted-search fulltext-search investigative-journalism journalism named-entity-recognition ocr ontologies osint python research-tool search search-engine search-interface semantic skos text-analysis text-mining thesaurus ui

Last synced: 09 Jun 2024

https://github.com/AllenDang/PipeIt

PipeIt is a text transformation, conversion, cleansing and extraction tool.

text-mining text-processing

Last synced: 05 Jun 2024

https://github.com/sebastiz/EndoMineR

Endoscopic and Pathological data extraction for various endo-pathological data extraction

cancer-research endoscopy gastroenterology peer-reviewed r r-package rstats semi-structured-data text-mining

Last synced: 04 Jun 2024

https://github.com/amcrisan/Adjutant

Runs a pubmed query, returns results and allows user to explore high-level structure of returned documents

pubmed r shiny summaries text-mining

Last synced: 04 Jun 2024

https://github.com/konlpy/konlpy

Python package for Korean natural language processing.

hacktoberfest korean korean-nlp morphology nlp python text-mining

Last synced: 27 May 2024

https://github.com/gsh199449/spider

A configurable web spider with a easy-to-use web console

cralwer gatherplatform spider text-mining web-console

Last synced: 26 May 2024

https://github.com/giocomai/castarter

Content Analysis Starter Toolkit for the R programming language

rstats tada text-mining

Last synced: 20 May 2024

https://github.com/trinker/qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting

Last synced: 20 May 2024

https://github.com/trinker/readability

Fast readability scores for text data

r readability text-mining

Last synced: 20 May 2024

https://github.com/ropensci-archive/microdemic

:warning: ARCHIVED :warning: microsoft academic client

api-client r r-package rstats scholarly-articles scholarly-metadata text-mining

Last synced: 20 May 2024

https://github.com/trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

doc docx pdf-reading r read-transcripts text-data text-mining

Last synced: 20 May 2024

https://github.com/trinker/lexicon

A data package containing lexicons and dictionaries for text analysis

hash lexicon lookup names-frequent r stopwords text-dictionaries text-mining

Last synced: 20 May 2024

https://github.com/bnosac/pattern.nlp

R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian

nlp pattern pos-tagging r sentiment-analysis text-mining

Last synced: 20 May 2024

https://github.com/mkearney/textfeatures

👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

feature-extraction machine-learning mkearney-r-package neural-network neural-networks r rstats text-mining word2vec

Last synced: 20 May 2024

https://github.com/trinker/textstem

Tools for fast text stemming & lemmatization

lemmatization r stemming text-mining

Last synced: 20 May 2024

https://github.com/systats/textlearnR

A simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.

classification hyperparameter-optimization keras nlp r text-mining

Last synced: 20 May 2024

https://github.com/ropensci/jstor

Import journal data from DfR (JSTOR)

jstor peer-reviewed r r-package rstats text-analysis text-mining

Last synced: 20 May 2024

https://github.com/ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

nlp peer-reviewed r r-package rstats text-mining tokenizer

Last synced: 20 May 2024

https://github.com/trinker/gofastr

Make a DocumentTermMatrix faster

data-reshaping document-term-matrix manipulation r text-mining

Last synced: 20 May 2024

https://github.com/contefranz/OpTop

Optimal topic identification from a pool of Latent Dirichlet Allocation models

latent-dirichlet-allocation lda model-selection natural-language-processing nlp text-mining topic-modeling

Last synced: 19 May 2024

https://github.com/trinker/sentimentpy

A Python port of the #rstats sentimentr package

emotion nlp polarity sentiment text-mining

Last synced: 19 May 2024

https://github.com/chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

artificial-intelligence data-science language-model natural-language-processing nlp open python text-mining

Last synced: 18 May 2024

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 16 May 2024

https://github.com/goldblat/klapa

Text patterns clustering in PostgreSQL

clustering patterns postgresql text-mining

Last synced: 13 May 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 13 May 2024

https://github.com/kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

gensim machine-learning natural-language-processing nlp text-classification text-mining tf-idf word2vec

Last synced: 13 May 2024

https://github.com/raminrahimzada/az-corpus-nlp

Dataset Materials , NLP for Azerbaijan language

azerbaijan dataset machine-learning text-analysis text-mining text-processing

Last synced: 13 May 2024

https://github.com/cortega26/PDF-Text-Analizer

This repository houses a script that can download PDFs from a specified URL, convert them to text, and perform text analysis. This analysis includes identifying the language, eliminating stopwords, and counting word and phrase frequency. It's worth noting that the script is capable of analyzing texts in multiple languages.

nlp ocr pdf pdf-converter text-analysis text-mining text-summarization

Last synced: 10 May 2024

https://github.com/Yingjie4Science/SDGdetector

A novel R package that can identify and visualize 17 Sustainable Development Goals and associated 169 Targets in text

cran r r-package sdg sdgs sustainability sustainable-development-goals text-mining

Last synced: 09 May 2024

https://github.com/keon/awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

awesome awesome-list deep-learning language machine-learning natural-language-processing nlp text-mining

Last synced: 08 May 2024

https://github.com/koshort/koshort

(deprecated) :cat: koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.

crawler korean nlp python streaming text-mining

Last synced: 08 May 2024

https://github.com/caufieldjh/awesome-bioie

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

awesome awesome-list bioinformatics biomedical biomedical-data biomedical-language information-extraction medical-informatics natural-language-processing nlp text-mining

Last synced: 05 May 2024

https://github.com/ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 02 May 2024

https://github.com/cpsievert/LDAvis

R package for web-based interactive topic model visualization.

javascript r text-mining topic-modeling visualization

Last synced: 02 May 2024

https://github.com/dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec

Last synced: 02 May 2024

https://github.com/bnosac/ruimtehol

R package to Embed All the Things! using StarSpace

classification embeddings natural-language-processing nlp r similarity starspace text-mining

Last synced: 01 May 2024

https://github.com/nalimilan/R.TeMiS

R.TeMiS: R Text Mining Solution

r text-mining

Last synced: 01 May 2024

https://github.com/pjhampton/woolly

The Text Mining Elixir

text-analysis text-mining

Last synced: 01 May 2024

https://github.com/blueprints-for-text-analytics-python/blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

machine-learning natural-language-processing python text-mining

Last synced: 29 Apr 2024

https://github.com/deanmalmgren/textract

extract text from any document. no muss. no fuss.

data-mining natural-language-processing python text-mining

Last synced: 28 Apr 2024

https://github.com/giacbrd/ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

fasttext gensim machine-learning neural-network online-learning scikit-learn shallow-learning supervised-learning text-classification text-mining word-embeddings word2vec

Last synced: 27 Apr 2024

https://github.com/jalajthanaki/NLPython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

deep-learning feature-engineering feature-extraction feature-selection natural-language-processing parsing part-of-speech python-scripting-language python2 text-mining

Last synced: 17 Apr 2024

https://github.com/giovanni-cutri/lyrics-text-mining

Text mining techniques conducted on lyrics of some popular songs.

lyrics quanteda r sentiment-analysis text-mining udpipe

Last synced: 16 Apr 2024

https://github.com/jakelever/pubrunner

A framework for keeping biomedical text mining result up-to-date

bionlp infrastructure pubmed pubmed-central python snakemake text-mining

Last synced: 15 Apr 2024

https://github.com/currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

author-extraction content-extraction date-extraction machine-learning news news-articles news-extraction news-extractor python text-cleaning text-mining web-scraping webscraping

Last synced: 12 Apr 2024

https://github.com/lasigeBioTM/BiONT

BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction

biomedical-text-mining deep-learning ontologies relation-extraction text-mining

Last synced: 08 Apr 2024

https://github.com/0x0be/scrapeadvisor

A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility

data-mining data-science python3 r scraping sentiment-analysis sentiment-classification text-mining tripadvisor tripadvisor-scraper web-scraping

Last synced: 07 Apr 2024

https://github.com/dgrtwo/tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

book bookdown r text-mining tidyverse

Last synced: 31 Mar 2024

https://github.com/SentometricsResearch/sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

nlp prediction sentiment-analysis text-mining time-series

Last synced: 31 Mar 2024

https://github.com/neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

google-colab grammar-checks hacktoberfest jupyter kaggle-kernels natural-language-processing nlp nlp-keywords-extraction nlp-library nlp-machine-learning nlp-parsing nlp-profiler profiler profiling profiling-datasets text-mining

Last synced: 27 Mar 2024

https://github.com/adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

computational-linguistics corpus-linguistics german-language natural-language-processing nlp text-mining

Last synced: 27 Mar 2024

https://github.com/bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

html-extraction html-extractor html-parsing python text-extraction text-mining

Last synced: 27 Mar 2024

https://github.com/wrathematics/ngram

Fast n-Gram Tokenization

ngram r text text-mining

Last synced: 26 Mar 2024

https://github.com/ropensci-archive/rplos

:warning: ARCHIVED :warning: R client for the PLoS Journals API

metadata pdf plos r r-package rstats text-mining web-api xml

Last synced: 24 Mar 2024

https://github.com/juliasilge/tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

natural-language-processing r text-mining tidy-data tidyverse

Last synced: 21 Mar 2024