Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
- GitHub: https://github.com/topics/nlp
- Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing
- Created by: Alan Turing
- Aliases: natural-language-processing, nlp-machine-learning, nlp-resources,
- Last updated: 2024-11-11 00:19:59 UTC
- JSON Representation
https://github.com/lan-ce-lot/pythorch-text-classification
对豆瓣影评进行文本分类情感分析,利用爬虫豆瓣爬取评论,进行数据清洗,分词,采用BERT、CNN、LSTM等模型进行训练,采用tensorboardX可视化训练过程,自然语言处理项目\A project for text classification, based on torch 1.7.1
bert cnn douban lstm natural-language-processing nlp qt qt5 qt6 rnn scrapy sentiment-analysis tensorboard tensorboardx text-classification ui
Last synced: 12 Oct 2024
https://github.com/ZhixiuYe/Intra-Bag-and-Inter-Bag-Attentions
Code for NAACL 2019 paper: Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions
deeplearning distant-supervision nlp pytorch relation-extraction
Last synced: 01 Nov 2024
https://github.com/clovaai/focusseq2seq
[EMNLP 2019] Mixture Content Selection for Diverse Sequence Generation (Question Generation / Abstractive Summarization)
emnlp2019 generation nlp pytorch question-generation summarization
Last synced: 12 Nov 2024
https://github.com/Nipun1212/Claude_api
Claude_api is a Python package that provides a convenient way to interact with Claude 2 from Anthropic.
anthropic anthropic-claude claude claude-ai claude-api nlp
Last synced: 11 Nov 2024
https://github.com/logpai/bughub
A collection of free-text bug reports for duplicate issue identification
bug-reports datasets duplicate-detection nlp
Last synced: 07 Nov 2024
https://github.com/aphp/edsnlp
Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
clinical-data-warehouse deep-learning fast french medical multi-task nlp pytorch rule-based spacy text-mining
Last synced: 14 Oct 2024
https://github.com/McGill-NLP/weblinx
WebLINX is a benchmark for building web navigation agents with conversational capabilities
agent agents computer-vision llm multimodal navigation nlp web
Last synced: 20 Oct 2024
https://github.com/shibing624/nerpy
🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。
bert bert-softmax bert-span named-entity-recognition ner nlp pytorch transformers
Last synced: 31 Oct 2024
https://github.com/bayeru/chat-to-your-database
Chat to your database with AI. An experimental app to test the abilities of LLMs to query SQL databases using natural language.
chatgpt chatgpt-app database langchain langchain-typescript llm llms mysql natural-language-processing nlp openai postgres sql sqlite
Last synced: 10 Aug 2024
https://github.com/winkjs/wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
bag-of-words natural-language-processing ngrams nlp phonetize sentence-boundary-detection stem stop-words tokenize
Last synced: 09 Nov 2024
https://github.com/johnbumgarner/wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
antonyms bag-of-words definitions dictionary homophones hypernyms hyponyms lexicon nlp python python3 synonyms text-analysis textual-analysis wordlists wordnet wordnets wordsearch
Last synced: 04 Aug 2024
https://github.com/MagedSaeed/farasapy
A Python implementation of Farasa toolkit
arabic arabic-nlp diacritization farasa named-entity-recognition nlp postagging python-library python3 python36 stemmers tokenizer
Last synced: 03 Aug 2024
https://github.com/pooya-mohammadi/deep_utils
An open-source toolkit which is full of handy functions, including the most used models and utilities for deep-learning practitioners!
augmentation coco computer-vision cutmix deep-learning face-detection face-recognition machine-learning modelcheckpoint nlp object-detection python pytorch senet tensorflow utils vggface2 yolov5
Last synced: 09 Nov 2024
https://github.com/deep-diver/en-fr-mlt-tensorflow
English-French Machine Language Translation in Tensorflow
deep-learning english-to-french machine-translation nlp tensorflow
Last synced: 01 Nov 2024
https://github.com/guo-yong-zhi/wordcloud.jl
word cloud generator in julia
collision-detection julia layout-algorithm nlp packing-algorithm visualization wordcloud
Last synced: 30 Oct 2024
https://github.com/yohasebe/lemmatizer
Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy
lemmatizer nlp ruby rubynlp wordnet
Last synced: 08 Nov 2024
https://github.com/gagan3012/project-code-py
Leetcode using AI
leetcode ml nlp python python-questions streamlit transformer
Last synced: 27 Oct 2024
https://github.com/DFKI-NLP/TRE
[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
information-extraction machine-learning multi-task-learning nlp relation-extraction transformer
Last synced: 01 Nov 2024
https://github.com/clipperhouse/jargon
Tokenizers and lemmatizers for Go
data-science go lemmatizer nlp tokenizer
Last synced: 14 Nov 2024
https://github.com/proycon/flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
annotation-tool clariah clarin computational-linguistics folia javascript linguistic-annotation-framework linguistics nlp python web-application
Last synced: 31 Oct 2024
https://github.com/ahmedbesbes/media-agent
Scrape data from social media and chat with it using Langchain
langchain large-language-models llms nlp nlproc python tweepy
Last synced: 06 Nov 2024
https://github.com/ahmedbesbes/twitter-agent
Scrape data from social media and chat with it using Langchain
langchain large-language-models llms nlp nlproc python tweepy
Last synced: 22 Aug 2024
https://github.com/prrao87/tweet-stance-prediction
Applying NLP transfer learning techniques to predict Tweet stance toward a topic
natural-language-processing nlp openai-gpt python text-classification transfer-learning transformers ulmfit
Last synced: 02 Nov 2024
https://github.com/textlint-rule/sentence-splitter
Split {Japanese, English} text into sentences.
english japanese javascript nlp segement sentence
Last synced: 04 Aug 2024
https://github.com/kororo/excelcy
Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.
entity excel nlp python python3 spacy spacy-extensions spacy-nlp spacy-pipeline training xlsx
Last synced: 14 Oct 2024
https://github.com/martinomensio/spacy-dbpedia-spotlight
A spaCy wrapper for DBpedia Spotlight
dbpedia-spotlight hacktoberfest natural-language-processing nlp spacy
Last synced: 14 Oct 2024
https://github.com/rubenszimbres/repo-2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
autoencoder deep-learning face-recognition keras lasagne lstm lstm-neural-networks mathematica natural-language-processing nlp nlp-machine-learning python python-3 python3 rstats theano theano-models time-series-analysis timeseries word2vec
Last synced: 07 Nov 2024
https://github.com/martinomensio/spacy-sentence-bert
Sentence transformers models for SpaCy
bert models nlp sentence-bert sentence-transformers spacy
Last synced: 14 Oct 2024
https://github.com/orthagonal/langchainex
Language Chain Library for Elixir
Last synced: 01 Nov 2024
https://github.com/SunLemuria/OpenGPTAndBeyond
Open efforts to implement ChatGPT-like models and beyond.
alpaca chatbot chatglm chatgpt large-language-models llm nlp openai opensource
Last synced: 06 Nov 2024
https://mcgill-nlp.github.io/weblinx/
WebLINX is a benchmark for building web navigation agents with conversational capabilities
agent agents computer-vision llm multimodal navigation nlp web
Last synced: 03 Aug 2024
https://github.com/hongzhaohua/jstarcraft-nlp
专注于解决自然语言处理领域的几个核心问题:词法分析,句法分析,语义分析,语种检测,信息抽取,文本聚类和文本分类. 为相关领域的研发人员提供完整的通用设计与参考实现. 涵盖了多种自然语言处理算法,适配了多个自然语言处理框架. 兼容Lucene/Solr/ElasticSearch插件.
ansj corenlp elasticsearch hanlp ik java jcseg jieba language-detection lucene mmseg mynlp nlp solr thulac word
Last synced: 08 Nov 2024
https://github.com/adamlui/chatgpt-widescreen
🖥️ Adds Widescreen + Fullscreen modes to ChatGPT for enhanced viewing
ai artificial-intelligence chat chatbot chatgpt chatgpt3 chrome-extension gpt gpt-3 gpt-4 greasemonkey javascript machine-learning nlp openai ui userscripts ux widescreen
Last synced: 12 Oct 2024
https://github.com/jmisilo/clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
computer-vision cv deep-learning image-caption image-caption-generator image-captioning machine-learning nlp python pytorch
Last synced: 04 Nov 2024
https://github.com/SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
computational-linguistics natural-language-processing nlp russian-specific text-analytics
Last synced: 07 Aug 2024
https://github.com/davidberenstein1957/crosslingual-coreference
A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.
coreference coreference-resolution hacktoberfest natural-language-processing nlp python spacy
Last synced: 01 Nov 2024
https://github.com/awslabs/speech-representations
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
deep-learning nlp speech-recognition
Last synced: 25 Oct 2024
https://github.com/deepset-ai/haystack-demos
Fully working applications that demonstrate how to use Haystack to implement common NLP use cases
nlp python question-answering semantic-search
Last synced: 06 Nov 2024
https://github.com/lettier/lda-topic-modeling
A PureScript, browser-based implementation of LDA topic modeling.
bayesian bulma bulma-css clustering data-science functional-programming gibbs-sampling latent-dirichlet-allocation lda machine-learning machine-learning-algorithms natural-language-processing nlp nlp-machine-learning purescript reactive reactive-programming text-mining thermite topic-modeling
Last synced: 14 Oct 2024
https://github.com/rerender2021/echo
A simple asr translator powered by avernakis react.
asr ave avernakis nlp offline translation
Last synced: 06 Nov 2024
https://github.com/pommedeterresautee/fastrtext
R wrapper for fastText
classification embeddings fasttext machine-learning neural-network nlp rstats text-classification word-embeddings
Last synced: 26 Oct 2024
https://github.com/bnosac/ruimtehol
R package to Embed All the Things! using StarSpace
classification embeddings natural-language-processing nlp r similarity starspace text-mining
Last synced: 11 Nov 2024
https://github.com/lonePatient/BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
bert chinese chinese-text-classification nlp pytorch text-classification
Last synced: 02 Nov 2024
https://github.com/KudoAI/bravegpt
🦁 Brave Search add-on that brings the magic of ChatGPT to search results (powered by GPT-4!)
ai artificial-intelligence brave brave-search chat chatbot chatgpt chatgpt3 gpt gpt-3 gpt-4 greasemonkey javascript machine-learning nlp openai search userscripts web websearch
Last synced: 08 Nov 2024
https://github.com/makaveli10/stockprediction_transformer
Intra day Stock Prediction 10 minutes into the future
intraday-stock-trading nlp stock-price-prediction transformer
Last synced: 27 Oct 2024
https://github.com/kudoai/bravegpt
🦁 Brave Search add-on that brings the magic of ChatGPT to search results (powered by GPT-4!)
ai artificial-intelligence brave brave-search chat chatbot chatgpt chatgpt3 gpt gpt-3 gpt-4 greasemonkey javascript machine-learning nlp openai search userscripts web websearch
Last synced: 12 Oct 2024
https://github.com/alvinwan/timefhuman
Convert natural language date-like strings--dates, date ranges, and lists of dates--to Python objects
date-parser datetime datetime-inputs nlp python3
Last synced: 26 Oct 2024
https://github.com/lonepatient/bert-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
bert chinese chinese-text-classification nlp pytorch text-classification
Last synced: 06 Nov 2024
https://github.com/d99kris/spacy-cpp
C++ wrapper library for the NLP library spaCy
c-plus-plus linux nlp nlp-libraries spacy
Last synced: 14 Oct 2024
https://github.com/pku-yuangroup/hallucination-attack
Attack to induce LLMs within hallucinations
adversarial-attacks ai-safety deep-learning hallucinations llm llm-safety machine-learning nlp
Last synced: 10 Nov 2024
https://github.com/talschuster/crosslingualcontextualemb
Cross-Lingual Alignment of Contextual Word Embeddings
allennlp bert contextual-embeddings crosslingual elmo nlp pytorch wordembeddings zeroshot-learning
Last synced: 08 Nov 2024
https://github.com/princeton-nlp/LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
Last synced: 10 Nov 2024
https://chats-lab.github.io/KokoMind/
KokoMind: Can LLMs Understand Social Interactions?
chatgpt deep-learning gpt-4 language-model neural-network nlp
Last synced: 03 Aug 2024
https://github.com/etherealengine/digital-beings
A platform for letting researchers connect an intelligent AI directly to real time communication networks and 3D worlds. Your AI, Anywhere.
ai artificial-intelligence bot computer-vision cv digital-beings digital-humans machine-learning ml nlp telegram
Last synced: 12 Nov 2024
https://github.com/TalSchuster/CrossLingualContextualEmb
Cross-Lingual Alignment of Contextual Word Embeddings
allennlp bert contextual-embeddings crosslingual elmo nlp pytorch wordembeddings zeroshot-learning
Last synced: 03 Aug 2024
https://github.com/oxford-cs-deepnlp-2017/practical-open
Oxford Deep NLP 2017 course - Open practical
deep-learning machine-learning natural-language-processing nlp oxford
Last synced: 26 Oct 2024
https://github.com/harunzafer/nuve
Natural Language Processing Library for Turkish in C#
ngram-extraction nlp nuve turkish
Last synced: 12 Nov 2024
https://github.com/xlang-ai/icl-selective-annotation
[ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"
active-learning in-context-learning language-model natural-language-processing nlp sample-selection
Last synced: 13 Nov 2024
https://github.com/dengbocong/nlp-dialogue
A full-process dialogue system that can be deployed online
bot bots chatbot conversational-ai deep-learning machine-learning natural-language-processing nlp nlu
Last synced: 08 Nov 2024
https://github.com/clovaai/webvicob
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
Last synced: 12 Nov 2024
https://github.com/xv44586/toolkit4nlp
transformers implement (architecture, task example, serving and more)
Last synced: 13 Oct 2024
https://github.com/thunlp/multird
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Last synced: 10 Nov 2024
https://github.com/salvatorera/ml-news-of-the-week
A collection of the the best ML and AI news every week (research, news, resources)
agents ai artificial-intelligence computer-vision llms machine-learning nlp python rag retrieval-augmented-generation transformer
Last synced: 26 Oct 2024
https://github.com/zlxy9892/ml_code
A repository for recording the machine learning code
apriori artificial-intelligence bayesian clustering cv decision-tree deep-learning k-means keras knn logistic-regression machine-learning mnist neural-network nlp pca scikit-learn sklearn svd tensorflow
Last synced: 10 Oct 2024
https://github.com/google-research-datasets/wiki-atomic-edits
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.
deep-learning deep-neural-networks nlp nlp-machine-learning wikipedia
Last synced: 08 Nov 2024
https://github.com/JDongian/python-jamo
Hangul syllable decomposition and synthesis using jamo.
Last synced: 03 Aug 2024
https://github.com/mgechev/ngx-tfjs
🤖 TensorFlow.js bindings for Angular
angular machine-learning nlp tensorflowjs
Last synced: 01 Nov 2024
https://github.com/adhikary97/Sharetape-Open-Source
Script that takes any long form video or podcast and outputs clips for social media
instagram-reels nlp podcast tiktok video-clipper video-clips youtube
Last synced: 04 Aug 2024
https://github.com/chunml/nlp
This is where I put all my work in Natural Language Processing
natural-language-processing nlp python tensorflow tensorflow-experiments tensorflow-tutorials
Last synced: 12 Nov 2024
https://github.com/MoritzLaurer/GPT-google-sheets
Code and documentation for running generative LLMs like ChatGPT or GPT4 in google sheets without any coding knowledge. Transform unstructured text to structured data.
chatgpt gpt3 gpt4 nlp nlp-machine-learning
Last synced: 03 Aug 2024
https://github.com/tokestermw/spacy_hunspell
:pencil2: Hunspell extension for spaCy 2.0.
hunspell hunspell-extension nlp spacy spacy-extension spell-check spellchecker spelling spelling-correction
Last synced: 13 Nov 2024
https://github.com/feldberlin/timething
Timething is a library for aligning text transcripts with their audio recordings.
alignment audio cli forced-alignment huggingface nlp python speech speech-recognition tts
Last synced: 27 Oct 2024
https://github.com/ropensci-archive/monkeylearn
:no_entry: ARCHIVED :no_entry: Accesses the Monkeylearn API for Text Classifiers and Extractors
classifier extractor monkeylearn nlp nlp-machine-learning peer-reviewed r r-package rstats
Last synced: 25 Oct 2024
https://github.com/IlyaGusev/tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
classification clustering cpp data-science document-similarity fasttext machine-learning nlp
Last synced: 04 Nov 2024
https://github.com/explosion/spacy-experimental
🧪 Cutting-edge experimental spaCy components and features
lemmatizer machine-learning natural-language-processing nlp spacy spacy-extension spacy-pipeline tokenizer
Last synced: 07 Oct 2024
https://github.com/cisnlp/Glot500?tab=readme-ov-file
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL 2023)
acl dataset glot glot500 multilingual multilingual-models multilingual-nlp natural-language-processing nlp xlm xlm-r
Last synced: 05 Oct 2024
https://github.com/GlobalMaksimum/sadedegel
A General Purpose NLP library for Turkish
acikhack2 ai artificial-intelligence bert binder corpus data-science deep-learning embeddings heroku machine-learning natural-language-processing neural-network neural-networks news-summarizer nlp python
Last synced: 12 Nov 2024
https://github.com/explosion/spacy-lookups-data
📂 Additional lookup tables and data resources for spaCy
lemmatization machine-learning natural-language-processing nlp spacy
Last synced: 07 Oct 2024
https://github.com/cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
computer-vision multimodal-deep-learning nlp vision-and-language
Last synced: 04 Nov 2024
https://github.com/hpcaitech/CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
colossal-ai deep-learning dlrm embeddings nlp pytorch recommandation-system
Last synced: 07 Nov 2024
https://github.com/fdalvi/neurox
A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.
explainable-ai natural-language-processing neurons nlp nlp-machine-learning
Last synced: 07 Nov 2024
https://github.com/bjascob/pyinflect
A python module for word inflections designed for use with spaCy.
inflection nlp python spacy spacy-extension
Last synced: 07 Nov 2024
https://github.com/kyubyong/name2nat
name2nat: a Python package for nationality prediction from a name
Last synced: 10 Nov 2024
https://github.com/lonepatient/electra_pytorch
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
bert deeplearning electra glue language-model nlp pytorch
Last synced: 06 Nov 2024
https://github.com/fdalvi/NeuroX
A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.
explainable-ai natural-language-processing neurons nlp nlp-machine-learning
Last synced: 03 Aug 2024
https://github.com/ikegami-yukino/oseti
Dictionary based Sentiment Analysis for Japanese
japanese-language nlp sentiment-analysis sentiment-polarity
Last synced: 26 Oct 2024
https://github.com/epwalsh/nlp-models
NLP research experiments, built on PyTorch within the AllenNLP framework.
allennlp nlp pytorch pytorch-nlp
Last synced: 01 Nov 2024
https://github.com/nlp-uoregon/okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf
Last synced: 11 Nov 2024
https://github.com/nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf
Last synced: 05 Oct 2024
https://github.com/saidziani/arabic-news-article-classification
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization
Last synced: 28 Oct 2024
https://github.com/lsys/lexicalrichness
:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).
data-mining data-science information-retrieval lexical-analysis lexical-analyzer linguistic-analysis natural-language natural-language-processing nlp python
Last synced: 02 Nov 2024
https://github.com/saidziani/Arabic-News-Article-Classification
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization
Last synced: 03 Aug 2024
https://github.com/qdata/lamp
ECML 2019: Graph Neural Networks for Multi-Label Classification
computer-vision graph-attention-networks graph-neural-networks multi-label-classification nlp transformers
Last synced: 12 Nov 2024