Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/GaoQ1/rasa_nlu_gq

turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)

bert bilstm-idcnn jieba natural-language nlp nlu rasa rasa-nlu rasa-nlu-gao tensorflow

Last synced: 02 Nov 2024

https://github.com/gentaiscool/code-switching-papers

A curated list of research papers and resources on code-switching

bilingual code-mixed code-mixing code-switch code-switching language nlp papers research speech

Last synced: 31 Dec 2024

https://github.com/hankcs/multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation

bi-lstm-crf cws dynet multi-criteria-cws nlp

Last synced: 03 Jan 2025

https://github.com/dair-ai/nlp_newsletter

📰Natural language processing (NLP) newsletter

deep-learning machine-learning nlp

Last synced: 08 Jan 2025

https://github.com/hhstore/blog

My Tech Blog: about Mojo / Rust / Golang / Python / Kotlin / Flutter / VueJS / Blockchain etc.

ai android blockchain blog dart docker flutter golang gpt ios k8s kotlin mojo nlp python rust vuejs web3 zig

Last synced: 08 Jan 2025

https://github.com/kevinlu1248/pyate

PYthon Automated Term Extraction

ai nlp symbolic-ai term-extraction

Last synced: 28 Sep 2024

https://github.com/hankcs/hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统

chinese-text-segmentation hanlp lucene nlp solr traditional-chinese

Last synced: 06 Jan 2025

https://github.com/daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

japanese morphological-analysis nlp rust segmentation tokenization tokenizer

Last synced: 08 Jan 2025

https://github.com/jameshwade/gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code

chatgpt nlp openai package-development rstats rstudio-addin

Last synced: 08 Jan 2025

https://github.com/princeton-nlp/webshop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

decision-making language language-grounding ml nlp rl rl-environment shopping sim-to-real web-based

Last synced: 06 Jan 2025

https://github.com/JamesHWade/gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code

chatgpt nlp openai package-development rstats rstudio-addin

Last synced: 04 Dec 2024

https://github.com/feedly/transfer-nlp

NLP library designed for reproducible experimentation management

framework language-model natural-language-understanding nlp playground pytorch transfer-learning

Last synced: 04 Jan 2025

https://github.com/jsksxs360/AHANLP

啊哈自然语言处理包,提供包括分词、依存句法分析、语义角色标注、自动摘要、语义相似度计算、LDA 主题预测、词云等服务。

chinese nlp

Last synced: 30 Oct 2024

https://github.com/sekwiatkowski/Komputation

Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.

artificial-intelligence convolutional-neural-networks cuda framework gpu jvm kotlin machine-learning neural-networks nlp nvidia recurrent-neural-networks seq2seq

Last synced: 02 Nov 2024

https://github.com/igorbrigadir/stopwords

Default English stopword lists from many different sources

en-stopwords english-stopwords natural-language-processing nlp stopwords

Last synced: 06 Jan 2025

https://github.com/deepset-ai/haystack-tutorials

Here you can find all the Tutorials for Haystack 📓

generative-qa haystack llm nlp semantic-search text-generation tutorials

Last synced: 04 Jan 2025

https://github.com/natasha/corus

Links to Russian corpora + Python functions for loading and parsing

corpora datasets nlp python russian

Last synced: 03 Jan 2025

https://github.com/ludwigstumpp/llm-leaderboard

A joint community effort to create one central leaderboard for LLMs.

leaderboard llm machine-learning nlp

Last synced: 03 Jan 2025

https://github.com/thunlp/NSC

Neural Sentiment Classification

nlp

Last synced: 27 Nov 2024

https://github.com/thunlp/nsc

Neural Sentiment Classification

nlp

Last synced: 08 Jan 2025

https://github.com/google-research/retvec

RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.

deep-learning natural-language-processing nlp python tensorflow text-classification

Last synced: 04 Jan 2025

https://github.com/zhongkaifu/rnnsharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

c-sharp crf deep-learning dotnet lstm machine-learning nlp recurrent-neural-networks rnn rnn-model sequence-labeling

Last synced: 08 Jan 2025

https://github.com/zhongkaifu/RNNSharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

c-sharp crf deep-learning dotnet lstm machine-learning nlp recurrent-neural-networks rnn rnn-model sequence-labeling

Last synced: 09 Dec 2024

https://github.com/amirshnll/persian-swear-words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword

Last synced: 04 Jan 2025

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 02 Nov 2024

https://github.com/sunzeyeah/RLHF

Implementation of Chinese ChatGPT

chatgpt deep-learning deepspeed glm nlp pangu pytorch

Last synced: 31 Oct 2024

https://github.com/boat-group/fancy-nlp

NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.

bert bert-chinese bert-classifier bert-embeddings bert-ner bilstm-crf bimpm chinese-nlp crf esim keras named-entity-recognition nlp python-library semantic-similarity tensorflow text-classification tf2

Last synced: 04 Jan 2025

https://github.com/rizerphe/obsidian-companion

Autocomplete your obsidian notes with AI, including ChatGPT, through a copilot-like interface.

ai ai21labs chatgpt groq groq-ai large-language-models llm llm-local nlp obsidian-md obsidian-plugin ollama oobabooga openai

Last synced: 08 Jan 2025

https://github.com/extreme-bert/extreme-bert

ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.

bert deep-learning language-model language-models machine-learning natural-language-processing nlp python pytorch transformer

Last synced: 16 Nov 2024

https://github.com/LudwigStumpp/llm-leaderboard

A joint community effort to create one central leaderboard for LLMs.

leaderboard llm machine-learning nlp

Last synced: 02 Nov 2024

https://github.com/hmunachi/nanodl

A Jax-based library for designing and training transformer models from scratch.

attention attention-mechanism deep-learning distributed-training flax gpt jax llama machine-learning mistral nlp transformer

Last synced: 04 Jan 2025

https://github.com/amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword

Last synced: 20 Nov 2024

https://github.com/jenojp/negspacy

spaCy pipeline object for negating concepts in text

negation negation-phrases negex nlp python spacy spacy-extension spacy-pipeline

Last synced: 03 Jan 2025

https://github.com/pen-ho/medical_knowledge_graph_app-master

医药知识图谱自动问答系统实现,包括构建知识图谱、基于知识图谱的流水线问答以及前端实现。实体识别(基于词典+BERT_CRF)、实体链接(Sentence-BERT做匹配)、意图识别(基于提问词+领域词词典)。

django-application echarts entity-linking kbqa kgqa knowledge-graph mention-detection neo4j ner nlp pytorch-transformers relation-detection relation-extraction

Last synced: 08 Jan 2025

https://github.com/bonzanini/nlp-tutorial

Tutorial: Natural Language Processing in Python

natural-language-processing nlp python

Last synced: 27 Nov 2024

https://github.com/microsoft/vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

bertel can-ner cross-lingual-ner entity-disambiguation entity-extraction entity-linking entity-resolution grn language-understanding linkingpark ml named-entity-recognition ner nlp nlp-resources unitrans xl-ner

Last synced: 06 Jan 2025

https://github.com/krishnap25/mauve

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

deep-learning huggingface-transformers nlp pytorch text-generation

Last synced: 04 Jan 2025

https://github.com/RTIInternational/gobbli

Deep learning with text doesn't have to be scary.

deep-learning docker nlp python

Last synced: 04 Nov 2024

https://github.com/linonetwo/segmentit

任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

chinese chinese-nlp nlp segmentation

Last synced: 06 Jan 2025

https://github.com/shineware/KOMORAN

Korean Morphological Analyzer by shineware

komoran korean-nlp korean-text-processing morphological-analysis nlp shineware

Last synced: 12 Nov 2024

https://github.com/hsankesara/deepresearch

This repository is the collection of research papers in Deep learning, computer vision and NLP.

computer-vision deep-learning keras machine-learning nlp nueral-networks python3 research-paper

Last synced: 03 Jan 2025

https://github.com/phantominsights/summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

nlp praw python3 reddit-bot spacy web-scraper wordcloud

Last synced: 08 Jan 2025

https://github.com/PhantomInsights/summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

nlp praw python3 reddit-bot spacy web-scraper wordcloud

Last synced: 12 Nov 2024

https://github.com/tensorchord/modelz-llm

OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)

llm nlp openai-api transformer

Last synced: 05 Jan 2025

https://github.com/stanford-oval/genie-server

The home server version of Almond

hacktoberfest nlp raspberrypi voice

Last synced: 06 Jan 2025

https://github.com/neuml/txtchat

💭 Retrieval augmented generation (RAG) and language model powered search applications

large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search txtai

Last synced: 28 Oct 2024

https://github.com/quadrismegistus/prosodic

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

finnish-language-analysis linguistics metrical-parser nlp poetry rhythm

Last synced: 08 Jan 2025

https://github.com/rameshaditya/scoper

Fuzzy and semantic search for captioned YouTube videos.

fuzzy-search machine-learning ml nlp search search-algorithm semantic youtube youtube-api

Last synced: 09 Jan 2025

https://github.com/nisaaragharia/advanced_rag

Advanced Retrieval-Augmented Generation (RAG) through practical notebooks, using the power of the Langchain, OpenAI GPTs ,META LLAMA3 ,Agents.

agent agents ai chatgpt genai langchain llama3 llm machine-learning nlp openai rag retrival-augmented vectordb

Last synced: 07 Jan 2025

https://github.com/opensemanticsearch/open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

annotation documents elasticsearch enrichment etl extract extract-information extract-text extractor ingest ingestion-pipeline ingests-documents named-entity-recognition nlp ocr pdf python rdf solr solr-dataimporter

Last synced: 06 Jan 2025

https://github.com/polm/cutlet

Japanese to romaji converter in Python

japanese nlp romaji

Last synced: 12 Nov 2024

https://github.com/bjascob/lemminflect

A python module for English lemmatization and inflection.

inflection lemmatization nlp nlp-machine-learning python spacy spacy-extensions

Last synced: 03 Jan 2025

https://github.com/yohasebe/engtagger

English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

english nlp pos-tagging ruby rubynlp

Last synced: 04 Jan 2025

https://github.com/30lm32/ml-projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

ab-testing deep-learning docker gensim geolocation imbalanced-data kdtree keras lstm-neural-networks machine-learning mlflow nlp random-forest spam-classification svm tensorboard tensorflow text-classification timeseries-analysis word2vec

Last synced: 15 Nov 2024

https://github.com/sakuranew/BERT-AttributeExtraction

USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。

ai attribute-extraction bert deeplearning feature-extraction fine-tuning knowledge-graph nlp relation-extraction

Last synced: 02 Nov 2024

https://github.com/esteininger/vector-search

The definitive guide to using Vector Search to solve your semantic search production workload needs.

lucene nlp search-engine vector-search

Last synced: 07 Nov 2024

https://github.com/lucasxlu/LagouJob

Data Analysis & Mining for lagou.com

data-analysis data-mining lagou machine-learning nlp python3 web-crawler

Last synced: 25 Nov 2024

https://github.com/lucasjinreal/weibo_terminator_workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

crawler nlp scraper sentiment-analysis weibo-terminator

Last synced: 09 Jan 2025

https://github.com/grumpyp/aixplora

AIxplora is a open-source tool which let's you query all kind of files not limited to any length or format.

audio chat chatbot chatgpt embeddings embeddings-model generativeai llm llms nlp openai ownfiles pdf question-answering search second-brain vectorstore

Last synced: 06 Nov 2024

https://github.com/gmihaila/ml_things

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.

google-colab machine-learning nlp nlp-machine-learning notebooks python-snippets pytorch snippets transformer

Last synced: 05 Jan 2025

https://github.com/akanyaani/gpt-2-tensorflow2.0

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

gpt gpt-2 gpt2 implementation nlp openai pre-training pretraining tensorflow tensorflow2 text-generation transformer

Last synced: 13 Nov 2024

https://github.com/likejazz/siamese-lstm

Siamese LSTM for evaluating semantic similarity between sentences of the Quora Question Pairs Dataset.

deep-learning keras lstm nlp

Last synced: 09 Jan 2025

https://github.com/houbb/pinyin

The high performance pinyin tool for java.(java 高性能中文转拼音工具。支持同音字。)

dfa high-performance nlp pinyin pinyin-analysis pinyin-data pinyin-segmentation pinyin4j segment tiny tiny-pinyin tongyinzi

Last synced: 04 Jan 2025

https://github.com/oxford-cs-deepnlp-2017/practical-1

Oxford Deep NLP 2017 course - Practical 1: word2vec

deep-learning natural-language-processing nlp oxford word2vec

Last synced: 09 Jan 2025

https://github.com/zachnagengast/similarity-search-kit

🔎 SimilaritySearchKit is a Swift package providing on-device text embeddings and semantic search functionality for iOS and macOS applications.

apple-neural-engine coreml information-retrieval nlp pretrained-models question-answering semantic-search semantic-similarity swift text-embeddings vector-embeddings

Last synced: 08 Jan 2025

https://github.com/gabeur/mmt

Multi-Modal Transformer for Video Retrieval

fusion language multimodal nlp video vision

Last synced: 18 Nov 2024

https://github.com/abelriboulot/onnxt5

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

inference nlp nlp-machine-learning onnx onnxruntime sentiment-analysis summarization text-classification text-generation transformer transformers translation

Last synced: 07 Nov 2024

https://github.com/quanteda/spacyr

R wrapper to spaCy NLP

extract-entities nlp r spacy speech-tagging

Last synced: 06 Jan 2025

https://github.com/dongjunlee/text-cnn-tensorflow

Convolutional Neural Networks for Sentence Classification(TextCNN) implements by TensorFlow

classification deep-learning hb-experiment nlp sentiment-analysis tensorflow tensorflow-models text-cnn

Last synced: 09 Jan 2025

https://github.com/tomasonjo/langchain2neo4j

Integrating Neo4j database into langchain ecosystem

chatbot chatgpt gpt-3 gpt-4 langchain langchain-python neo4j nlp

Last synced: 26 Sep 2024

https://github.com/kyubyong/nlp_made_easy

Explains nlp building blocks in a simple manner.

beam-search bpe nlp seq2seq

Last synced: 27 Dec 2024

https://github.com/amirbar/rnn.wgan

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

gan gans nlp text-generation text-generator wgan

Last synced: 08 Jan 2025

https://github.com/bheinzerling/pyrouge

A Python wrapper for the ROUGE summarization evaluation package

evaluation-metrics nlp rouge summarization

Last synced: 09 Jan 2025