Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/linkedin/detext

DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks

classification deep-neural-networks detext-framework nlp ranking text-embeddings

Last synced: 14 Oct 2024

https://github.com/seanlee97/xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能

lexical-analysis ner nlp pinyin postagging radical segmentation sentence-embeddings sentence-similarity sentiment-analysis spell-checker

Last synced: 15 Oct 2024

https://github.com/SeanLee97/xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能

lexical-analysis ner nlp pinyin postagging radical segmentation sentence-embeddings sentence-similarity sentiment-analysis spell-checker

Last synced: 30 Oct 2024

https://github.com/amaiya/ktrain

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

computer-vision deep-learning graph-neural-networks keras machine-learning nlp python tabular-data tensorflow

Last synced: 15 Oct 2024

https://github.com/aurelio-labs/semantic-router

Superfast AI decision making and intelligent processing of multi-modal data.

ai artificial-intelligence chatbot computer-vision generative-ai machine-learning nlp

Last synced: 15 Oct 2024

https://github.com/Hyperparticle/one-pixel-attack-keras

Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

cifar10 cnn deep-learning image-processing imagenet keras machine-learning neural-network nlp tensorflow

Last synced: 04 Nov 2024

https://github.com/hyperparticle/one-pixel-attack-keras

Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

cifar10 cnn deep-learning image-processing imagenet keras machine-learning neural-network nlp tensorflow

Last synced: 15 Oct 2024

https://github.com/natasha/natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

embeddings morphology ner nlp python russian sentence-segmentation syntax tokenizer visualization

Last synced: 14 Oct 2024

https://github.com/huggingface/hmtl

🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP

multi-task-learning natural-language-processing nlp pytorch

Last synced: 29 Oct 2024

https://github.com/dengbocong/nlp-paper

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 14 Oct 2024

https://github.com/bheinzerling/bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

embeddings multilingual natural-language-processing nlp subword-embeddings

Last synced: 14 Oct 2024

https://github.com/MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

bert embeddings multilingual-models multilingual-topic-models neural-topic-models nlp nlp-library nlp-machine-learning text-as-data topic-coherence topic-modeling transformer

Last synced: 04 Nov 2024

https://github.com/DengBoCong/nlp-paper

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 03 Aug 2024

https://github.com/jncraton/languagemodels

Explore large language models in 512MB of RAM

llm nlp python

Last synced: 15 Oct 2024

https://github.com/yeyzheng/kgqa-based-on-medicine

基于医药知识图谱的智能问答系统

kgqa nlp python3 qa rdf

Last synced: 30 Oct 2024

https://github.com/robocorp/rpaframework

Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python

ai automation documentai nlp ocr opencv python robocorp robot robotframework rpa rpa-robots

Last synced: 09 Oct 2024

https://github.com/ahmetaa/zemberek-nlp

NLP tools for Turkish.

language morphology nlp turkish zemberek-nlp

Last synced: 14 Oct 2024

https://github.com/kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

gensim machine-learning natural-language-processing nlp text-classification text-mining tf-idf word2vec

Last synced: 30 Oct 2024

https://github.com/chizhu/kgqa_hlm

基于知识图谱的《红楼梦》人物关系可视化及问答系统

kgqa nlp

Last synced: 14 Oct 2024

https://github.com/uber-research/pplm

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

deep-learning language-modeling machine-learning natural-language-generation natural-language-processing nlp

Last synced: 29 Oct 2024

https://github.com/roatienza/Deep-Learning-Experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 30 Oct 2024

https://github.com/pemistahl/lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text

language-classification language-detection language-identification language-recognition natural-language-processing nlp python-library

Last synced: 09 Oct 2024

https://github.com/uber-research/PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

deep-learning language-modeling machine-learning natural-language-generation natural-language-processing nlp

Last synced: 04 Aug 2024

https://github.com/NVIDIA-Merlin/Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

bert gtp huggingface language-model nlp pytorch recommender-system recsys seq2seq session-based-recommendation tabular-data transformer xlnet

Last synced: 05 Nov 2024

https://github.com/roatienza/deep-learning-experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 15 Oct 2024

https://github.com/nvidia-merlin/transformers4rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

bert gtp huggingface language-model nlp pytorch recommender-system recsys seq2seq session-based-recommendation tabular-data transformer xlnet

Last synced: 15 Oct 2024

https://github.com/liucongg/gpt2-newstitle

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

chinese gpt2 news-summarization nlp text-generation torch transformer

Last synced: 30 Oct 2024

https://github.com/sajal2692/data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

data-science keras machine-learning nlp pandas portfolio python scikit-learn

Last synced: 10 Oct 2024

https://github.com/datumbox/datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

big-data data-science java machine-learning nlp statistics

Last synced: 15 Oct 2024

https://github.com/mihail911/nlp-library

curated collection of papers for the nlp practitioner 📖👩‍🔬

deep-learning dialogue language-model machine-learning neural-machine-translation neural-network nlp nlp-datasets

Last synced: 04 Nov 2024

https://github.com/liucongg/GPT2-NewsTitle

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

chinese gpt2 news-summarization nlp text-generation torch transformer

Last synced: 02 Aug 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 29 Oct 2024

https://github.com/tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 15 Oct 2024

https://tatsu-lab.github.io/alpaca_eval/

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 28 Oct 2024

https://github.com/chenyuntc/PyTorchText

1st Place Solution for Zhihu Machine Learning Challenge . Implementation of various text-classification models.(知乎看山杯第一名解决方案)

fasttext lstm nlp pytorch textcnn textrcnn textrnn

Last synced: 31 Oct 2024

https://github.com/chenyuntc/pytorchtext

1st Place Solution for Zhihu Machine Learning Challenge . Implementation of various text-classification models.(知乎看山杯第一名解决方案)

fasttext lstm nlp pytorch textcnn textrcnn textrnn

Last synced: 30 Oct 2024

https://github.com/kakaobrain/kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

deeplearning generative-model gpt gpt3 huggingface kakaobrain kogpt korean nlp transformers

Last synced: 04 Nov 2024

https://github.com/localminimum/QANet

A Tensorflow implementation of QANet for machine reading comprehension

cnn machine-comprehension nlp squad tensorflow

Last synced: 06 Nov 2024

https://github.com/uber-archive/plato-research-dialogue-system

This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.

conversational-agent conversational-ai conversational-ui deep-learning dialogue-systems machine-learning nlp

Last synced: 04 Aug 2024

https://github.com/allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

data-processing large-language-models llm machile-learning nlp

Last synced: 05 Nov 2024

https://github.com/iwangjian/Paper-Reading-ConvAI

📖 Paper reading list in conversational AI (constantly updating 🤗).

conversational-ai dialogue-generation dialogue-systems natural-language-generation nlp paper-list

Last synced: 04 Nov 2024

https://github.com/graykode/gpt-2-pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

gpt-2 gpt2 implementation natural-language-processing nlp pytorch story-telling text-generator

Last synced: 30 Oct 2024

https://github.com/bigscience-workshop/bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

machine-learning models nlp training

Last synced: 04 Nov 2024

https://github.com/greyblake/whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

ai algorithm classifier detect-language language language-recognition nlp rust rustlang text-analysis text-classification text-classifier whatlang

Last synced: 12 Oct 2024

https://github.com/graykode/gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

gpt-2 gpt2 implementation natural-language-processing nlp pytorch story-telling text-generator

Last synced: 26 Oct 2024

https://github.com/thunlp/OpenDelta

A plug-and-play library for parameter-efficient-tuning (Delta Tuning)

deep-learning nlp nlp-library parameter-efficient-learning pretrained-language-model

Last synced: 03 Aug 2024

https://github.com/google-research-datasets/wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

cc-by-sa-3 machine-learning multilingual multimodal nlp wikipedia

Last synced: 08 Nov 2024

https://github.com/VKCOM/YouTokenToMe

Unsupervised text tokenizer focused on computational efficiency

bpe natural-language-processing nlp tokenization word-segmentation

Last synced: 04 Nov 2024

https://github.com/vkcom/youtokentome

Unsupervised text tokenizer focused on computational efficiency

bpe natural-language-processing nlp tokenization word-segmentation

Last synced: 25 Sep 2024

https://github.com/autoliuweijie/K-BERT

Source code of K-BERT (AAAI2020)

aaai2020 bert k-bert nlp

Last synced: 03 Aug 2024

https://github.com/SCIR-HI/Med-ChatGLM

Repo for Chinese Medical ChatGLM 基于中文医学知识的ChatGLM指令微调

chinese medical medqa nlp

Last synced: 27 Oct 2024

https://github.com/wikipedia2vec/wikipedia2vec

A tool for learning vector representations of words and entities from Wikipedia

embeddings natural-language-processing nlp python text-classification wikipedia

Last synced: 06 Nov 2024

https://github.com/lixiang0/web_kg

爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱

baidu baike knowledge-graph neo4j nlp spider wiki

Last synced: 10 Oct 2024

https://github.com/lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

chinese-nlp chinese-text-segmentation chinese-word-segmentation elasticsearch-analyzer elasticsearch-tokenizer java jcseg jcseg-analyzer keywords-extraction lucene-analyzer lucene-tokenizer mmseg natural-language-processing nlp nlp-keywords-extraction opensearch-analyzer opensearch-tokenizer pos-tagging solr-plugin

Last synced: 09 Nov 2024

https://github.com/cluebenchmark/cluecorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 09 Nov 2024

https://github.com/ymcui/chinese-llama-alpaca-3

中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3

alpaca large-language-models llama llama-2 llama-3 llama3 llm nlp

Last synced: 10 Oct 2024

https://github.com/ymcui/Chinese-LLaMA-Alpaca-3

中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3

alpaca large-language-models llama llama-2 llama-3 llama3 llm nlp

Last synced: 03 Aug 2024

https://github.com/CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 03 Aug 2024

https://github.com/morvanzhou/nlp-tutorials

Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com

attention bert elmo gpt nlp seq2seq transformer tutorial w2v

Last synced: 06 Nov 2024

https://github.com/grammarly/gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

bert grammatical-error-correction natural-language-processing nlp roberta sequence-labeling text-simplification transformers xlnet

Last synced: 03 Aug 2024

https://github.com/anupamchugh/iowncode

A curated collection of iOS, ML, AR resources sprinkled with some UI additions

alamofire arkit computer-vision coreml coremltools ios keras ml-kit natural-language-processing nlp realitykit swift swiftui vision vision-framework

Last synced: 09 Aug 2024

https://github.com/explosion/curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

albert bert camembert dolly2 falcon gptneox llama llm llms nlp pytorch roberta transformer transformers xlm-roberta

Last synced: 07 Oct 2024

https://github.com/rodrigopivi/Chatito

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

chatbot chatbots chatito dataset dataset-generation named-entity-recognition nlg nlp nlu text-classification

Last synced: 31 Oct 2024

https://github.com/stanford-oval/WikiChat

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.

chatbot emnlp2023 factuality language-model natural-language-processing nlp

Last synced: 05 Nov 2024

https://github.com/stanford-oval/wikichat

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.

chatbot emnlp2023 factuality language-model natural-language-processing nlp

Last synced: 06 Nov 2024

https://github.com/lonepatient/bert-multi-label-text-classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

albert bert fine-tuning multi-label-classification nlp pytorch pytorch-implmention text-classification transformers xlnet

Last synced: 06 Nov 2024

https://github.com/lonePatient/Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

albert bert fine-tuning multi-label-classification nlp pytorch pytorch-implmention text-classification transformers xlnet

Last synced: 04 Aug 2024

https://github.com/tensorlayer/seq2seq-chatbot

Chatbot in 200 lines of code using TensorLayer

bot chat chatbot corpus lstm nlp python rnn tensorflow tensorlayer

Last synced: 10 Nov 2024

https://github.com/whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

large-language-models machine-learning nlg nlp observability prompt-engineering prompt-injection

Last synced: 31 Oct 2024