Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: 31 Oct 2024

https://github.com/naver/claf

CLaF: Open-Source Clova Language Framework

clova framework language natural-language-processing nlp pytorch

Last synced: 08 Nov 2024

https://github.com/cohere-ai/sandbox-topically

Topic modeling helpers using managed language models from Cohere. Name text clusters using large GPT models.

machine-learning nlp python topic-modeling

Last synced: 07 Oct 2024

https://github.com/ticki/eudex

A blazingly fast phonetic reduction/hashing algorithm.

nlp

Last synced: 02 Nov 2024

https://github.com/brutalcoding/aub.ai

AubAI brings you on-device gen-AI capabilities, including offline text generation and more, directly within your app.

android dart flutter gemini gemini-nano gen-ai genai indiedev ios ipados linux llamacpp localllama macos mistral-7b native-apps nlp on-device on-device-ai pubdev

Last synced: 10 Oct 2024

https://github.com/akaza-im/akaza

Yet another Japanese IME for IBus/Linux

ibus ime nlp rust

Last synced: 07 Nov 2024

https://github.com/jaidevd/numerizer

A Python module to convert natural language numerics into ints and floats.

information-extraction nlp regular-expressions spacy spacy-extension

Last synced: 14 Oct 2024

https://github.com/vipul-sharma20/sharingan

Tool to extract news articles from newspaper and give the context about the news

context-extraction news-extraction nlp opencv

Last synced: 10 Nov 2024

https://github.com/davidberenstein1957/classy-classification

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.

few-shot-classifcation hacktoberfest machine-learning natural-language-processing nlp nlu sentence-transformers spacy text-classification

Last synced: 14 Oct 2024

https://github.com/akoksal/Turkish-Word2Vec

Pre-trained Word2Vec Model for Turkish

gensim nlp turkish word2vec

Last synced: 02 Aug 2024

https://github.com/ajdavidl/Portuguese-NLP

List of resources and tools developed with focus on Portuguese.

nlp portuguese portuguese-language

Last synced: 02 Aug 2024

https://github.com/Fixy-TR/fixy

Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.

acikhack2 ai artificial-intelligence bert data-science deep-learning deeplearning keras natural-language-processing neural-network neural-networks nlp python

Last synced: 02 Aug 2024

https://github.com/ayaka14732/llama-2-jax

JAX implementation of the Llama 2 model

jax llama llama2 natural-language-processing nlp

Last synced: 26 Oct 2024

https://github.com/nisaaragharia/advanced_rag

Advanced Retrieval-Augmented Generation (RAG) through practical notebooks, using the power of the Langchain, OpenAI GPTs ,META LLAMA3 ,Agents.

agent agents ai chatgpt genai langchain llama3 llm machine-learning nlp openai rag retrival-augmented vectordb

Last synced: 10 Oct 2024

https://github.com/thunlp/thuctc

An Efficient Chinese Text Classifier

chinese-nlp nlp

Last synced: 10 Nov 2024

https://github.com/neuml/rag

🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.

large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search txtai

Last synced: 20 Oct 2024

https://github.com/coteries/cedille-ai

✒️ Cedille is a large French language model (6B), released under an open-source license

machine-learning nlg nlp

Last synced: 04 Nov 2024

https://github.com/thunlp/THUCTC

An Efficient Chinese Text Classifier

chinese-nlp nlp

Last synced: 08 Nov 2024

https://github.com/erfanzar/easydel

Accelerate, Optimize performance with streamlined training and serving options with JAX.

easydel flax gpt jax machine-learning mojo nlp optax transformers

Last synced: 07 Nov 2024

https://github.com/explosion/displacy-ent

:boom: displaCy-ent.js: An open-source named entity visualiser for the modern web

css javascript named-entities natural-language-processing nlp spacy visualization

Last synced: 25 Sep 2024

https://github.com/sea-snell/implicit-language-q-learning

Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"

implicit-q-learning iql language-model nlp offline-rl python pytorch q-learning reinforcement-learning

Last synced: 27 Oct 2024

https://github.com/kavgan/rouge-2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

evaluation evaluation-toolkit java metrics nlp rouge rouge-l rouge-n rouge-s rouge-su text-summarization unicode-text

Last synced: 30 Oct 2024

https://github.com/MaartenGr/Concept

Concept Modeling: Topic Modeling on Images and Text

computer-vision image-processing nlp topic-modeling

Last synced: 05 Nov 2024

https://github.com/dkpro/dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.

dkpro java natural-language-processing nlp uima uima-components

Last synced: 30 Oct 2024

https://github.com/vishwasg217/fin-sight

FinSight - Financial Insights at Your Fingertip: FinSight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining crucial insights and summaries about a company in a user-friendly manner.

fintech langchain llama-index llms nlp streamlit

Last synced: 10 Oct 2024

https://github.com/OpenNewsLabs/guri-vr

https://gurivr.com

nlp virtual-reality vr webvr

Last synced: 06 Aug 2024

https://github.com/stanford-oval/genie-toolkit

The Genie open source kit for voice assistant (formerly known as Almond)

hacktoberfest natural-language nlp semantic-parsers voice-assistant

Last synced: 06 Nov 2024

https://github.com/maartengr/concept

Concept Modeling: Topic Modeling on Images and Text

computer-vision image-processing nlp topic-modeling

Last synced: 26 Oct 2024

https://github.com/textvec/textvec

Text vectorization tool to outperform TFIDF for classification tasks

machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf

Last synced: 29 Oct 2024

https://github.com/iPieter/RobBERT

A Dutch RoBERTa-based language model

bert bert-model language-model nlp nlp-resources roberta transformers

Last synced: 03 Aug 2024

https://github.com/rizerphe/obsidian-companion

Autocomplete your obsidian notes with AI, including ChatGPT, through a copilot-like interface.

ai ai21labs chatgpt groq groq-ai large-language-models llm llm-local nlp obsidian-md obsidian-plugin ollama oobabooga openai

Last synced: 10 Oct 2024

https://github.com/WZBSocialScienceCenter/tmtoolkit

Text Mining and Topic Modeling Toolkit for Python with parallel processing power

evaluation nlp parallel-processing python socialscience text-processing topic-modeling

Last synced: 02 Aug 2024

https://github.com/houbb/word-checker

🇨🇳🇬🇧Chinese and English word spelling corrector.(中文易错别字检测,中文拼写检测纠正。英文单词拼写校验工具)

cc csc english-word java nlp spelling spelling-correction word

Last synced: 07 Nov 2024

https://github.com/milaan9/python_natural_language_processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching

Last synced: 11 Oct 2024

https://github.com/yanndubs/hash-embeddings

PyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.

embeddings hashing nips nips-challenge nlp pytorch reproducible-research word-embeddings

Last synced: 27 Oct 2024

https://github.com/guotong1988/NL2SQL-RULE

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179

bert deep-learning knowledge knowledge-representation nl2sql nlp pytorch rule-inject-to-model semantic-parsing text2sql

Last synced: 02 Aug 2024

https://github.com/soumyadip007/microsoft-student-partner-workshop-learning-materials-ai-nlp

This repository contains all codes and materials of the current session. It contains the required code on Natural Language Processing, Artificial intelligence.

ai cloud distributed-networking microsoft nlp peer-to-peer workshop

Last synced: 27 Oct 2024

https://github.com/intelligo-mn/neuro

🔮 Neuro.js is machine learning library for building AI assistants and chat-bots.

ai ai-assistants bot chat-bot chat-bots chatbot machine-learning natural-language-processing nlp nodejs

Last synced: 28 Aug 2024

https://github.com/dair-ai/emotion_dataset

:smile: Dataset for Emotion Recognition Research

dataset machine-learning nlp pytorch

Last synced: 10 Nov 2024

https://github.com/ines/spacy-js

🎀 JavaScript API for spaCy with Python REST API

javascript natural-language-processing nlp python rest-api spacy

Last synced: 30 Oct 2024

https://github.com/franck-dernoncourt/pubmed-rct

PubMed 200k RCT dataset: a large dataset for sequential sentence classification.

corpus machine-learning medical nlp randomized-controlled-trials sentence-classification

Last synced: 14 Oct 2024

https://github.com/ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

nlp peer-reviewed r r-package rstats text-mining tokenizer

Last synced: 05 Aug 2024

https://github.com/tomasonjo/neogpt-explorer

Knowledge-graph based chatbot using GPT3 and Neo4j

chatbot gpt-3 graph neo4j nlp streamlit

Last synced: 10 Oct 2024

https://github.com/ShawnyXiao/2017-CCF-BDCI-AIJudge

2017-CCF-BDCI-让AI当法官(初赛):7th/415 (Top 1.68%)

2017 bdci ccf data-mining multiclass-classification nlp

Last synced: 01 Nov 2024

https://github.com/Attempto/APE

Parser for Attempto Controlled English (ACE)

ace attempto cnl nlp swi-prolog

Last synced: 02 Aug 2024

https://github.com/beader/ruijin_round2

瑞金医院MMC人工智能辅助构建知识图谱大赛复赛

nlp relation-extraction tianchi

Last synced: 02 Aug 2024

https://github.com/explosion/spacymoji

💙 Emoji handling and meta data for spaCy with custom extension attributes

emoji emoji-unicode emojis natural-language-processing nlp spacy spacy-extension spacy-pipeline

Last synced: 07 Oct 2024

https://github.com/houbb/nlp-hanzi-similar

The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)

chinese data han nlp ocr word-correction

Last synced: 07 Nov 2024

https://github.com/hrwhisper/SpamMessage

中文垃圾短信识别(手写分类器)

machine-learning nlp python

Last synced: 04 Aug 2024

https://github.com/thammegowda/nllb-serve

Meta's "No Language Left Behind" models served as web app and REST API

machine-translation multilingual nlp transformers translation

Last synced: 30 Oct 2024

https://github.com/martinomensio/spacy-universal-sentence-encoder

Google USE (Universal Sentence Encoder) for spaCy

models nlp spacy tensorflow-hub use

Last synced: 30 Oct 2024

https://github.com/opensemanticsearch/open-semantic-entity-search-api

Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of entities like persons, organizations and places for (semi)automatic semantic tagging & analysis of documents by linked data knowledge graph like SKOS thesaurus, RDF ontology, database(s) or list(s) of names

api disambiguation entity-extraction knowledge-graph knowledgebase linked-data linked-data-api linkeddata named-entities named-entity-recognition natural-language-processing nlp python reconciliation reconciliation-service rest-api semantic semantic-analysis semantic-annotation thesaurus

Last synced: 27 Oct 2024

https://github.com/PaddlePaddle/PALM

a Fast, Flexible, Extensible and Easy-to-use NLP Large-scale Pretraining and Multi-task Learning Framework.

baidu multi-task-learning nlp paddlepaddle pretrain-model transformers

Last synced: 07 Aug 2024

https://github.com/simongray/clojure-dsl-resources

A curated list of Clojure resources for dealing with domain-specific languages.

data-transformation domain-specific-language dsl nlp parsing

Last synced: 22 Oct 2024

https://github.com/xatkit-bot-platform/xatkit

The simplest way to build all types of smart chatbots and digital assistants

bot chatbot-framework chatbots conversational-ai digital-assistant dsl low-code nlp no-code

Last synced: 07 Nov 2024

https://github.com/coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

benchmark lawtech legal legaltech nlp

Last synced: 02 Nov 2024

https://github.com/sorgerlab/indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.

bioinformatics biology computational-biology indra modeling nlp pysb sbml systems-biology

Last synced: 14 Oct 2024

https://github.com/hscspring/all4nlp

All For NLP, especially Chinese.

ai deeplearning machinelearning nlp

Last synced: 27 Oct 2024

https://github.com/dengbocong/text-similarity

文本相似度(匹配)计算,提供Baseline、训练、推理、指标分析...代码包含TensorFlow/Pytorch双版本

bert deep-learning mechine-learing model nlp pytorch similarity text-classification transformer

Last synced: 08 Nov 2024

https://github.com/uzay-g/espial

Espial is an engine for automated organization and discovery of personal knowledge

knowledge knowledge-graph nlp python

Last synced: 27 Oct 2024

https://github.com/Uzay-G/espial

Espial is an engine for automated organization and discovery of personal knowledge

knowledge knowledge-graph nlp python

Last synced: 01 Nov 2024

https://github.com/daspartho/prompt-extend

extending stable diffusion prompts with suitable style cues using text generation

deep-learning gpt-2 huggingface-spaces huggingface-transformers machine-learning nlp prompt stable-diffusion text-generation

Last synced: 03 Aug 2024

https://github.com/CyberZHG/keras-xlnet

Implementation of XLNet that can load pretrained checkpoints

glue keras language-model nlp xlnet

Last synced: 03 Aug 2024

https://github.com/cyberzhg/keras-xlnet

Implementation of XLNet that can load pretrained checkpoints

glue keras language-model nlp xlnet

Last synced: 27 Sep 2024

https://github.com/mannefedov/compling_nlp_hse_course

Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ

computational-linguistics course hse machine-learning natural-language-processing nlp python

Last synced: 02 Aug 2024

https://github.com/yohasebe/wp2txt

A command-line toolkit to extract text content and category data from Wikipedia dump files

corpus machine-learning nlp ruby wikipedia wikipedia-dump

Last synced: 08 Nov 2024

https://github.com/HKUSTDial/NL2SQL_Handbook

This is a continuously updated handbook for readers to easily track the latest NL2SQL techniques in the literature and provide practical guidance for researchers and practitioners.

awesome finetuning llms nl-to-code nl-to-sql nl2sql nlp nlp-resources survey text-to-sql text2sql tutorial

Last synced: 02 Nov 2024

https://github.com/shjwudp/shu

中文书籍收录整理, Collection of Chinese Books

books dataset nlp

Last synced: 27 Oct 2024

https://github.com/j2kao/fcc_nn_research

(somewhat) cleaned-up notebooks used in researching public comments for FCC Proceeding 17-108 (Net Neutrality Repeal)

fcc net-neutrality nlp

Last synced: 09 Aug 2024

https://github.com/dccuchile/wefe

WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Word Embeddings models. Please feel welcome to open an issue in case you have any questions or a pull request if you want to contribute to the project!

bias-detection bias-reduction fairness-ai fairness-ml library nlp nlp-library python3 word-embedding-evaluation word-embedding-fairness word-embeddings

Last synced: 05 Aug 2024

https://github.com/ownthink/semantic

语义理解/口语理解,项目包含有词法分析:中文分词、词性标注、命名实体识别;口语理解:领域分类、槽填充、意图识别。

nlp nlu slu

Last synced: 07 Nov 2024

https://github.com/IlyaGusev/summarus

Models for automatic abstractive summarization

deep-learning machine-learning nlp pytorch summarization

Last synced: 04 Nov 2024

https://github.com/rylans/getlang

Natural language detection package in pure Go

language-model natural-language nlp

Last synced: 26 Oct 2024

https://github.com/ymcui/lert

LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)

bert lert nlp plm pre-train pytorch tensorflow transformer

Last synced: 28 Oct 2024