Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/stanford-oval/wikichat

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.

chatbot emnlp2023 factuality language-model natural-language-processing nlp

Last synced: 13 Nov 2024

https://github.com/thunlp/thuocl

THUOCL(THU Open Chinese Lexicon)中文词库

chinese nlp

Last synced: 10 Nov 2024

https://github.com/lonePatient/Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

albert bert fine-tuning multi-label-classification nlp pytorch pytorch-implmention text-classification transformers xlnet

Last synced: 04 Aug 2024

https://github.com/tensorlayer/seq2seq-chatbot

Chatbot in 200 lines of code using TensorLayer

bot chat chatbot corpus lstm nlp python rnn tensorflow tensorlayer

Last synced: 17 Nov 2024

https://github.com/whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

large-language-models machine-learning nlg nlp observability prompt-engineering prompt-injection

Last synced: 31 Oct 2024

https://github.com/bilibili/Index-1.9B

A SOTA lightweight multilingual LLM

llm nlp

Last synced: 07 Nov 2024

https://github.com/carpedm20/memn2n-tensorflow

"End-To-End Memory Networks" in Tensorflow

memory-network nlp tensorflow

Last synced: 14 Nov 2024

https://github.com/thunlp/THUOCL

THUOCL(THU Open Chinese Lexicon)中文词库

chinese nlp

Last synced: 06 Nov 2024

https://github.com/kengz/aiva

AIVA (A.I. Virtual Assistant): General-purpose virtual assistant for developers.

bot fb nlp nodejs slack telegram

Last synced: 13 Nov 2024

https://github.com/goru001/inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

data-augmentation deep-learning indic-languages nlp pytorch sentence-embeddings sentence-encoding sentence-similarity word-embeddings

Last synced: 18 Nov 2024

https://github.com/cltk/cltk

The Classical Language Toolkit

ai greek historical-linguistics latin ling nlp nltk python spacy stanza

Last synced: 14 Oct 2024

https://github.com/pemistahl/lingua-rs

The most accurate natural language detection library for Rust, suitable for short text and mixed-language text

language-classification language-detection language-identification language-processing language-recognition natural-language-processing nlp nlp-machine-learning rust rust-crate rust-library

Last synced: 14 Oct 2024

https://github.com/Separius/BERT-keras

Keras implementation of BERT with pre-trained weights

keras language-modeling nlp pretrained-models tensorflow theano transfer-learning transformer

Last synced: 02 Nov 2024

https://github.com/bin123apple/autocoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 09 Nov 2024

https://github.com/patrickschur/language-detection

A language detection library for PHP. Detects the language from a given text string.

language language-detection n-grams natural-language-processing nlp php training

Last synced: 25 Oct 2024

https://github.com/denis2054/transformers-for-nlp-2nd-edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

bert chatgpt chatgpt-api dall-e dall-e-api deep-learning gpt-3-5-turbo gpt-4 gpt-4-api huggingface-transformers machine-learning natural-language-processing nlp openai python pytorch roberta-model transformers trax

Last synced: 09 Nov 2024

https://github.com/princeton-nlp/pure

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812

nlp relation-extraction

Last synced: 11 Nov 2024

https://github.com/ethz-adrl/ifopt

An Eigen-based, light-weight C++ Interface to Nonlinear Programming Solvers (Ipopt, Snopt)

catkin cmake cpp eigen ipopt mathematical-programming nlp nonlinear-optimization optimization robotics ros snopt trajectory-optimization

Last synced: 17 Nov 2024

https://github.com/foochane/books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。

big-data c cpp database datamining dl git java keras ml nlp python scala tensorflow

Last synced: 01 Nov 2024

https://github.com/junruxiong/IncarnaMind

Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs

ai chatbot generative-ai gpt langchain llm nlp openai pdf

Last synced: 06 Nov 2024

https://github.com/xiaomi/minlp

XiaoMi Natural Language Processing Toolkits

nlp python3 tensorflow

Last synced: 13 Oct 2024

https://github.com/princeton-nlp/PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812

nlp relation-extraction

Last synced: 03 Nov 2024

https://github.com/naver/splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)

bert information-retrieval nlp passage-retrieval sparse splade

Last synced: 15 Nov 2024

https://github.com/junruxiong/incarnamind

Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs

ai chatbot generative-ai gpt langchain llm nlp openai pdf

Last synced: 18 Oct 2024

https://github.com/yvann-ba/robby-chatbot

AI chatbot 🤖 for chat with CSV, PDF, TXT files 📄 and YTB videos 🎥 | using Langchain🦜 | OpenAI | Streamlit ⚡

ai chatbot gpt-4 langchain nlp openai streamlit

Last synced: 10 Oct 2024

https://github.com/yvann-ba/Robby-chatbot

AI chatbot 🤖 for chat with CSV, PDF, TXT files 📄 and YTB videos 🎥 | using Langchain🦜 | OpenAI | Streamlit ⚡

ai chatbot gpt-4 langchain nlp openai streamlit

Last synced: 29 Oct 2024

https://github.com/bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 05 Nov 2024

https://github.com/cocacola-lab/chatie

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

ai chatgpt chatgpt-app event-extraciton event-extraction eventexecutor information-extraction knowledge-graph llm ner nlp openai relation-extraction tool zero-shot

Last synced: 09 Nov 2024

https://github.com/openvenues/pypostal

Python bindings to libpostal for fast international address parsing/normalization

address address-parser binding international nlp

Last synced: 16 Nov 2024

https://github.com/PaddlePaddle/RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

dense-retrieval information-retrieval nlp question-answering

Last synced: 11 Nov 2024

https://github.com/paddlepaddle/rocketqa

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

dense-retrieval information-retrieval nlp question-answering

Last synced: 13 Nov 2024

https://github.com/keras-team/keras-hub

Modular Natural Language Processing workflows with Keras

deep-learning keras machine-learning nlp tensorflow

Last synced: 07 Oct 2024

https://github.com/shibing624/similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。

bm25 deep-learning faiss image-search image-similarity matching nlp pytorch search-engine similarity similarity-search text-matching

Last synced: 22 Oct 2024

https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

bert chatgpt chatgpt-api dall-e dall-e-api deep-learning gpt-3-5-turbo gpt-4 gpt-4-api huggingface-transformers machine-learning natural-language-processing nlp openai python pytorch roberta-model transformers trax

Last synced: 24 Oct 2024

https://github.com/dbpedia-spotlight/dbpedia-spotlight

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.

content-tagging dbpedia-spotlight entity-extraction entity-linking nlp rdfa-annotation semantic-web text-annotation

Last synced: 17 Nov 2024

https://github.com/tencent/patrickstar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

bert gpt nlp pretrained-models pytorch

Last synced: 13 Nov 2024

https://github.com/Tencent/PatrickStar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

bert gpt nlp pretrained-models pytorch

Last synced: 16 Nov 2024

https://github.com/alvations/pywsd

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

lesk nlp python wordnet wsd

Last synced: 31 Oct 2024

https://github.com/soskek/bookcorpus

Crawl BookCorpus

bookcorpus corpus crawler nlp scraper

Last synced: 16 Nov 2024

https://github.com/openeventdata/mordecai

Full text geoparsing as a Python library

geocoding geonames geoparsing nlp spacy toponym-resolution

Last synced: 14 Oct 2024

https://github.com/web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

agent nlp

Last synced: 16 Nov 2024

https://github.com/microsoft/Tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

mixture-of-experts moe nlp pytorch transformer

Last synced: 01 Nov 2024

https://github.com/huggingface/naacl_transfer_learning_tutorial

Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA

naacl nlp transfer-learning tutorial

Last synced: 04 Nov 2024

https://github.com/salesforce/xgen

Salesforce open-source LLMs with 8k sequence length.

language-model large-language-models llm nlp

Last synced: 15 Nov 2024

https://github.com/microsoft/tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

mixture-of-experts moe nlp pytorch transformer

Last synced: 07 Oct 2024

https://github.com/salesforce/xGen

Salesforce open-source LLMs with 8k sequence length.

language-model large-language-models llm nlp

Last synced: 03 Nov 2024

https://github.com/lonepatient/albert_pytorch

A Lite Bert For Self-Supervised Learning Language Representations

albert bert language-model mask ngram nlp pytorch

Last synced: 13 Nov 2024

https://github.com/lonePatient/albert_pytorch

A Lite Bert For Self-Supervised Learning Language Representations

albert bert language-model mask ngram nlp pytorch

Last synced: 02 Nov 2024

https://github.com/curiosity-ai/catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

ai artificial-intelligence csharp embeddings machine-learning natural-language-processing natural-language-understanding nlp

Last synced: 28 Oct 2024

https://github.com/mila-iqia/babyai

BabyAI platform. A testbed for training agents to understand and execute language commands.

imitation-learning nlp nlp-machine-learning openai-gym reinforcement-learning-environments

Last synced: 13 Nov 2024

https://github.com/ibrahimjelliti/Deeplearning.ai-Natural-Language-Processing-Specialization

This repository contains my full work and notes on Coursera's NLP Specialization (Natural Language Processing) taught by the instructor Younes Bensouda Mourri and Łukasz Kaiser offered by deeplearning.ai

attention-mechanism coursera deep-learning deeplearning-ai encoder-decoder logistic-regression machine-learning naive-bayes neural neural-networks nlp probabilistic-models sequence-models specialization

Last synced: 07 Aug 2024

https://github.com/scofield7419/sequence-labeling-bilstm-crf

The BiLSTM-CRF model implementation in Tensorflow, for sequence labeling tasks.

bilstm-crf ner nlp python35 sequence-labeling tensorflow

Last synced: 11 Nov 2024

https://github.com/thunlp/openattack

An Open-Source Package for Textual Adversarial Attack.

adversarial-attacks adversarial-example natural-language-processing nlp pytorch

Last synced: 17 Nov 2024

https://github.com/messense/jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

chinese-word-segmentation jieba jieba-chinese nlp wasm

Last synced: 16 Oct 2024

https://github.com/inspirehep/magpie

Deep neural network framework for multi-label text classification

classification deep-learning machine-learning multi-label-classification neural-network nlp prediction word2vec

Last synced: 15 Nov 2024

https://github.com/google-research/long-range-arena

Long Range Arena for Benchmarking Efficient Transformers

attention deep-learning flax jax nlp transformers

Last synced: 17 Nov 2024

https://github.com/thunlp/OpenAttack

An Open-Source Package for Textual Adversarial Attack.

adversarial-attacks adversarial-example natural-language-processing nlp pytorch

Last synced: 01 Nov 2024

https://github.com/decalogue/chat

基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话

algorithm chat chatbot context database graph kb machine-learning natural-language-processing natural-language-understanding neo4j nlp nlu python python3 qa question-answering sentence-similarity

Last synced: 10 Oct 2024

https://github.com/koursaros-ai/nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

cloud deep-learning docker elasticsearch helm kubernetes machine-learning microservices nboost nlp proxy python pytorch search-api search-engine semantic-search tensorflow

Last synced: 01 Nov 2024

https://github.com/mayabot/mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

fasttext nlp pinyin segment starspace

Last synced: 13 Nov 2024

https://github.com/polyrabbit/wecron

:heavy_check_mark: 微信上的定时提醒 - Cron on WeChat

angular cron crontab ionic nlp postgresql python timer wechat weixin

Last synced: 13 Oct 2024

https://github.com/polyrabbit/WeCron

:heavy_check_mark: 微信上的定时提醒 - Cron on WeChat

angular cron crontab ionic nlp postgresql python timer wechat weixin

Last synced: 29 Oct 2024

https://github.com/tomaarsen/attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

llm llms nlp python transformers

Last synced: 19 Oct 2024

https://github.com/cbaziotis/ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation

Last synced: 06 Nov 2024

https://github.com/dgarnitz/vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

ai data-engineering embeddings machine-learning nlp vectors

Last synced: 03 Sep 2024

https://github.com/michaelthwan/searchGPT

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

ai chatgpt grounded-api grounded-bot language-model llm machine-learning nlp nlp-machine-learning openai python retrieval retrieval-model

Last synced: 11 Nov 2024

https://github.com/google-research/prompt-tuning

Original Implementation of Prompt Tuning from Lester, et al, 2021

flax jax language-model machine-learning nlp prompt-tuning

Last synced: 17 Nov 2024

https://github.com/ymcui/MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)

bert language-model macbert nlp pytorch tensorflow transformers

Last synced: 16 Nov 2024

https://github.com/hit-scir/chinese-mixtral-8x7b

中文Mixtral-8x7B(Chinese-Mixtral-8x7B)

large-language-models llm mixtral-8x7b nlp

Last synced: 17 Nov 2024

https://github.com/abadojack/whatlanggo

Natural language detection library for Go

go language nlp text-processing

Last synced: 26 Oct 2024

https://github.com/abadojack/whatlangGo

Natural language detection library for Go

go language nlp text-processing

Last synced: 24 Oct 2024

https://github.com/HIT-SCIR/Chinese-Mixtral-8x7B

中文Mixtral-8x7B(Chinese-Mixtral-8x7B)

large-language-models llm mixtral-8x7b nlp

Last synced: 29 Oct 2024

https://github.com/ICLRandD/Blackstone

:black_circle: A spaCy pipeline and model for NLP on unstructured legal text.

caselaw law legaltech nlp spacy-models

Last synced: 09 Nov 2024