Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/jncraton/languagemodels

Explore large language models in 512MB of RAM

llm nlp python

Last synced: 01 Aug 2024

https://github.com/ahmetaa/zemberek-nlp

NLP tools for Turkish.

language morphology nlp turkish zemberek-nlp

Last synced: 01 Aug 2024

https://github.com/kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

gensim machine-learning natural-language-processing nlp text-classification text-mining tf-idf word2vec

Last synced: 31 Jul 2024

https://github.com/uber-research/PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

deep-learning language-modeling machine-learning natural-language-generation natural-language-processing nlp

Last synced: 04 Aug 2024

https://github.com/roatienza/Deep-Learning-Experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 31 Jul 2024

https://github.com/robocorp/rpaframework

Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python

ai automation documentai nlp ocr opencv python robocorp robot robotframework rpa rpa-robots

Last synced: 01 Aug 2024

https://github.com/mihail911/nlp-library

curated collection of papers for the nlp practitioner 📖👩‍🔬

deep-learning dialogue language-model machine-learning neural-machine-translation neural-network nlp nlp-datasets

Last synced: 01 Aug 2024

https://github.com/liucongg/GPT2-NewsTitle

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

chinese gpt2 news-summarization nlp text-generation torch transformer

Last synced: 02 Aug 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 30 Jul 2024

https://github.com/NVIDIA-Merlin/Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

bert gtp huggingface language-model nlp pytorch recommender-system recsys seq2seq session-based-recommendation tabular-data transformer xlnet

Last synced: 01 Aug 2024

https://github.com/tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 31 Jul 2024

https://tatsu-lab.github.io/alpaca_eval/

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 31 Jul 2024

https://github.com/chenyuntc/PyTorchText

1st Place Solution for Zhihu Machine Learning Challenge . Implementation of various text-classification models.(知乎看山杯第一名解决方案)

fasttext lstm nlp pytorch textcnn textrcnn textrnn

Last synced: 31 Jul 2024

https://github.com/kakaobrain/kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

deeplearning generative-model gpt gpt3 huggingface kakaobrain kogpt korean nlp transformers

Last synced: 01 Aug 2024

https://github.com/localminimum/QANet

A Tensorflow implementation of QANet for machine reading comprehension

cnn machine-comprehension nlp squad tensorflow

Last synced: 01 Aug 2024

https://github.com/uber-archive/plato-research-dialogue-system

This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.

conversational-agent conversational-ai conversational-ui deep-learning dialogue-systems machine-learning nlp

Last synced: 04 Aug 2024

https://github.com/pemistahl/lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text

language-classification language-detection language-identification language-recognition natural-language-processing nlp python-library

Last synced: 01 Aug 2024

https://github.com/bigscience-workshop/bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

machine-learning models nlp training

Last synced: 01 Aug 2024

https://github.com/thunlp/OpenDelta

A plug-and-play library for parameter-efficient-tuning (Delta Tuning)

deep-learning nlp nlp-library parameter-efficient-learning pretrained-language-model

Last synced: 03 Aug 2024

https://github.com/greyblake/whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

ai algorithm classifier detect-language language language-recognition nlp rust rustlang text-analysis text-classification text-classifier whatlang

Last synced: 31 Jul 2024

https://github.com/iwangjian/Paper-Reading-ConvAI

📖 Paper reading list in dialogue systems and natural language generation (constantly updating 🤗).

conversational-ai dialogue-generation dialogue-systems natural-language-generation nlp paper-list

Last synced: 01 Aug 2024

https://github.com/vkcom/youtokentome

Unsupervised text tokenizer focused on computational efficiency

bpe natural-language-processing nlp tokenization word-segmentation

Last synced: 02 Aug 2024

https://github.com/VKCOM/YouTokenToMe

Unsupervised text tokenizer focused on computational efficiency

bpe natural-language-processing nlp tokenization word-segmentation

Last synced: 01 Aug 2024

https://github.com/autoliuweijie/K-BERT

Source code of K-BERT (AAAI2020)

aaai2020 bert k-bert nlp

Last synced: 03 Aug 2024

https://github.com/graykode/gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

gpt-2 gpt2 implementation natural-language-processing nlp pytorch story-telling text-generator

Last synced: 30 Jul 2024

https://github.com/wikipedia2vec/wikipedia2vec

A tool for learning vector representations of words and entities from Wikipedia

embeddings natural-language-processing nlp python text-classification wikipedia

Last synced: 01 Aug 2024

https://github.com/lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

chinese-nlp chinese-text-segmentation chinese-word-segmentation elasticsearch-analyzer elasticsearch-tokenizer java jcseg jcseg-analyzer keywords-extraction lucene-analyzer lucene-tokenizer mmseg natural-language-processing nlp nlp-keywords-extraction opensearch-analyzer opensearch-tokenizer pos-tagging solr-plugin

Last synced: 01 Aug 2024

https://github.com/ymcui/Chinese-LLaMA-Alpaca-3

中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3

alpaca large-language-models llama llama-2 llama-3 llama3 llm nlp

Last synced: 03 Aug 2024

https://github.com/CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 03 Aug 2024

https://github.com/grammarly/gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

bert grammatical-error-correction natural-language-processing nlp roberta sequence-labeling text-simplification transformers xlnet

Last synced: 03 Aug 2024

https://github.com/anupamchugh/iowncode

A curated collection of iOS, ML, AR resources sprinkled with some UI additions

alamofire arkit computer-vision coreml coremltools ios keras ml-kit natural-language-processing nlp realitykit swift swiftui vision vision-framework

Last synced: 09 Aug 2024

https://github.com/rodrigopivi/Chatito

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

chatbot chatbots chatito dataset dataset-generation named-entity-recognition nlg nlp nlu text-classification

Last synced: 31 Jul 2024

https://github.com/explosion/curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

albert bert camembert dolly2 falcon gptneox llama llm llms nlp pytorch roberta transformer transformers xlm-roberta

Last synced: 02 Aug 2024

https://github.com/stanford-oval/WikiChat

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.

chatbot emnlp2023 factuality language-model natural-language-processing nlp

Last synced: 01 Aug 2024

https://github.com/lonePatient/Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

albert bert fine-tuning multi-label-classification nlp pytorch pytorch-implmention text-classification transformers xlnet

Last synced: 04 Aug 2024

https://github.com/tensorlayer/seq2seq-chatbot

Chatbot in 200 lines of code using TensorLayer

bot chat chatbot corpus lstm nlp python rnn tensorflow tensorlayer

Last synced: 31 Jul 2024

https://github.com/cltk/cltk

The Classical Language Toolkit

ai greek historical-linguistics latin ling nlp nltk python spacy stanza

Last synced: 01 Aug 2024

https://github.com/pemistahl/lingua-rs

The most accurate natural language detection library for Rust, suitable for short text and mixed-language text

language-classification language-detection language-identification language-processing language-recognition natural-language-processing nlp nlp-machine-learning rust rust-crate rust-library

Last synced: 01 Aug 2024

https://github.com/Separius/BERT-keras

Keras implementation of BERT with pre-trained weights

keras language-modeling nlp pretrained-models tensorflow theano transfer-learning transformer

Last synced: 01 Aug 2024

https://github.com/kengz/aiva

AIVA (A.I. Virtual Assistant): General-purpose virtual assistant for developers.

bot fb nlp nodejs slack telegram

Last synced: 31 Jul 2024

https://github.com/thunlp/THUOCL

THUOCL(THU Open Chinese Lexicon)中文词库

chinese nlp

Last synced: 01 Aug 2024

https://github.com/allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

data-processing large-language-models llm machile-learning nlp

Last synced: 01 Aug 2024

https://github.com/patrickschur/language-detection

A language detection library for PHP. Detects the language from a given text string.

language language-detection n-grams natural-language-processing nlp php training

Last synced: 30 Jul 2024

https://github.com/princeton-nlp/PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812

nlp relation-extraction

Last synced: 01 Aug 2024

https://github.com/bin123apple/autocoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 02 Aug 2024

https://github.com/denis2054/transformers-for-nlp-2nd-edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

bert chatgpt chatgpt-api dall-e dall-e-api deep-learning gpt-3-5-turbo gpt-4 gpt-4-api huggingface-transformers machine-learning natural-language-processing nlp openai python pytorch roberta-model transformers trax

Last synced: 02 Aug 2024

https://github.com/PaddlePaddle/RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

dense-retrieval information-retrieval nlp question-answering

Last synced: 02 Aug 2024

https://github.com/dbpedia-spotlight/dbpedia-spotlight

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.

content-tagging dbpedia-spotlight entity-extraction entity-linking nlp rdfa-annotation semantic-web text-annotation

Last synced: 03 Aug 2024

https://github.com/openvenues/pypostal

Python bindings to libpostal for fast international address parsing/normalization

address address-parser binding international nlp

Last synced: 31 Jul 2024

https://github.com/yvann-ba/Robby-chatbot

AI chatbot 🤖 for chat with CSV, PDF, TXT files 📄 and YTB videos 🎥 | using Langchain🦜 | OpenAI | Streamlit ⚡

ai chatbot gpt-4 langchain nlp openai streamlit

Last synced: 31 Jul 2024

https://github.com/cocacola-lab/chatie

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

ai chatgpt chatgpt-app event-extraciton event-extraction eventexecutor information-extraction knowledge-graph llm ner nlp openai relation-extraction tool zero-shot

Last synced: 02 Aug 2024

https://github.com/Tencent/PatrickStar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

bert gpt nlp pretrained-models pytorch

Last synced: 03 Aug 2024

https://github.com/bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 01 Aug 2024

https://github.com/foochane/books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。

big-data c cpp database datamining dl git java keras ml nlp python scala tensorflow

Last synced: 01 Aug 2024

https://github.com/alvations/pywsd

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

lesk nlp python wordnet wsd

Last synced: 31 Jul 2024

https://github.com/openeventdata/mordecai

Full text geoparsing as a Python library

geocoding geonames geoparsing nlp spacy toponym-resolution

Last synced: 07 Aug 2024

https://github.com/whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

large-language-models machine-learning nlg nlp observability prompt-engineering prompt-injection

Last synced: 31 Jul 2024

https://github.com/keras-team/keras-nlp

Modular Natural Language Processing workflows with Keras

deep-learning keras machine-learning nlp tensorflow

Last synced: 01 Aug 2024

https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

bert chatgpt chatgpt-api dall-e dall-e-api deep-learning gpt-3-5-turbo gpt-4 gpt-4-api huggingface-transformers machine-learning natural-language-processing nlp openai python pytorch roberta-model transformers trax

Last synced: 29 Jul 2024

https://github.com/huggingface/naacl_transfer_learning_tutorial

Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA

naacl nlp transfer-learning tutorial

Last synced: 01 Aug 2024

https://github.com/salesforce/xgen

Salesforce open-source LLMs with 8k sequence length.

language-model large-language-models llm nlp

Last synced: 01 Aug 2024

https://github.com/salesforce/xGen

Salesforce open-source LLMs with 8k sequence length.

language-model large-language-models llm nlp

Last synced: 01 Aug 2024

https://github.com/lonePatient/albert_pytorch

A Lite Bert For Self-Supervised Learning Language Representations

albert bert language-model mask ngram nlp pytorch

Last synced: 01 Aug 2024

https://github.com/ibrahimjelliti/Deeplearning.ai-Natural-Language-Processing-Specialization

This repository contains my full work and notes on Coursera's NLP Specialization (Natural Language Processing) taught by the instructor Younes Bensouda Mourri and Łukasz Kaiser offered by deeplearning.ai

attention-mechanism coursera deep-learning deeplearning-ai encoder-decoder logistic-regression machine-learning naive-bayes neural neural-networks nlp probabilistic-models sequence-models specialization

Last synced: 07 Aug 2024

https://github.com/curiosity-ai/catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

ai artificial-intelligence csharp embeddings machine-learning natural-language-processing natural-language-understanding nlp

Last synced: 31 Jul 2024

https://github.com/bilibili/Index-1.9B

A SOTA lightweight multilingual LLM

llm nlp

Last synced: 01 Aug 2024

https://github.com/microsoft/tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

mixture-of-experts moe nlp pytorch transformer

Last synced: 01 Aug 2024

https://github.com/naver/splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)

bert information-retrieval nlp passage-retrieval sparse splade

Last synced: 02 Aug 2024