Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/prajjwal1/fluence

A deep learning library based on Pytorch focussed on low resource language research and robustness

attention deep-learning nlp pytorch transformers

Last synced: 07 Aug 2024

https://github.com/jantrienes/nereval

Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.

evaluation-metrics machine-learning named-entity-recognition nlp

Last synced: 07 Aug 2024

https://github.com/Flight-School/sentiment

A command-line utility that evaluates the emotional sentiment of natural language text.

macos nlp polarity sentiment-analysis swift

Last synced: 09 Aug 2024

https://github.com/proycon/lamachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

clam computational-linguistics docker-image flat folia frog installer linux linux-distribution natural-language-processing nlp python software-distribution vagrant virtual-machine webservices

Last synced: 12 Oct 2024

https://github.com/adapter-hub/hub

ARCHIVED. Please use https://docs.adapterhub.ml/huggingface_hub.html || 🔌 A central repository collecting pre-trained adapter modules

adapter-hub adapters natural-language-processing nlp

Last synced: 06 Nov 2024

https://github.com/argilla-io/biome-text

Custom Natural Language Processing with big and small models 🌲🌱

allennlp data-science natural-language-processing nlp pytorch

Last synced: 30 Sep 2024

https://github.com/bnosac/pattern.nlp

R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian

nlp pattern pos-tagging r sentiment-analysis text-mining

Last synced: 11 Nov 2024

https://github.com/opennyai/opennyai

Opennyai : An efficient NLP Pipeline for Indian Legal documents

indian-laws indian-legal-judgements legalnlp machine-learning natural-language-processing nlp python spacy

Last synced: 30 Oct 2024

https://github.com/aiwithqasim/Free-Artificial-Intelligence-Resources

Welcome, to this Open Source Repository regarding FREE ARTIFICIAL INTELLIGENCE RESOURCE. Get Benefit from the free resources mention & kindly five STAR & FORK this so that it can get maximum Fame so that Everyone can take advantage.

ai article artificial-intelligence artificial-neural-networks blog data-science datascientist deep-learning freeresources hacktoberfest hecktoberfest2021 jobs machine-learning machine-learning-algorithms natural-language-processing nlp project python3 youtube

Last synced: 02 Nov 2024

https://github.com/charmve/paperweeklyai

📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.

advanced applied-machine-learning computer-vision data-mining data-science deep-learning machine-learning machine-learning-algorithms nlp paper-with-code papers study-papers tutorials

Last synced: 28 Oct 2024

https://github.com/RUCAIBox/MVP

This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.

data-to-text dialog multi-task-learning natural-language-generation natural-language-processing nlg nlp plm pre-trained-model question-answering question-generation seq2seq sequence-to-sequence story-generation summarization text-generation

Last synced: 03 Aug 2024

https://github.com/Zlasejd/HuangDI

黄帝(Huang-Di)模型仓库,基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。

ancient-books llm nlp qusetion-answer tcm

Last synced: 02 Nov 2024

https://github.com/chnsh/deep-semantic-code-search

Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application

deep-learning deep-neural-networks nlp nlp-machine-learning

Last synced: 27 Oct 2024

https://github.com/zaibacu/rita-dsl

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

dsl language natural-language-processing nlp parsing python regex rule-based spacy

Last synced: 12 Oct 2024

https://github.com/kensho-technologies/sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.

bioinformatics hirschberg natural-language-processing needleman-wunsch nlp pyo3 python rust sequence-alignment

Last synced: 13 Nov 2024

https://github.com/Zlasejd/HuangDi

黄帝(Huang-Di)模型仓库,基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。

ancient-books llm nlp qusetion-answer tcm

Last synced: 03 Aug 2024

https://github.com/deneutoy/spacy-vis

A visualisation tool for Spacy using Hierplane.

dependency-parsing nlp spacy visualization

Last synced: 01 Nov 2024

https://github.com/abitdodgy/gibran

Gibran is an Elixir natural language processor, and a port of WordsCounted.

elixir-lang natural-language-processing nlp

Last synced: 01 Nov 2024

https://github.com/LanguageMachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

computational-linguistics folia language natural-language-processing nlp punctuation tokeniser

Last synced: 30 Oct 2024

https://github.com/winkjs/wink-pos-tagger

English Part-of-speech (POS) tagger

nlp part-of-speech pos tagger wink

Last synced: 09 Nov 2024

https://github.com/techcentaur/PyLex

Perform lexical analysis on words, one word at a time.

cli lexical-analysis nlp poets python3 scraping words

Last synced: 26 Oct 2024

https://github.com/alexeyev/abae-pytorch

PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'

aspect-extraction autoencoder nlp pytorch pytorch-implementation topic-modeling unsupervised-machine-learning

Last synced: 11 Nov 2024

https://github.com/asappresearch/dialog-intent-induction

Code and data for paper "Dialog Intent Induction with Deep Multi-View Clustering", Hugh Perkins and Yi Yang, 2019, EMNLP 2019

clustering deep-learning emnlp nlp pytorch

Last synced: 11 Nov 2024

https://github.com/liuzl/ling

Natural Language Processing Toolkit in Golang

corenlp lemmatization nlp normalization opencc spacy tokenization

Last synced: 12 Oct 2024

https://github.com/linuxscout/arabicnlptoolslist

Arabic NLP tools List inventory

arabic catalogue nlp nlp-resources

Last synced: 25 Oct 2024

https://github.com/mirfan899/Urdu

Collection of Urdu datasets for POS, NER, Sentiment, Summarization and NLP tasks.

machine-learning ner nlp sentiment-analysis spacy-models summarization urdu-language urdu-model

Last synced: 03 Aug 2024

https://github.com/usyiyi/nlp-py-2e-zh

:book: [译] Python 自然语言处理 中文第二版

book nlp nltk python

Last synced: 12 Nov 2024

https://github.com/yohasebe/ruby-spacy

A wrapper module for using spaCy natural language processing library from the Ruby programming language via PyCall

gpt natural-language nlp openai parsing ruby spacy word-embeddings

Last synced: 14 Oct 2024

https://github.com/dayyass/qaner

Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.

data-science machine-learning named-entity-recognition natural-language-processing ner nlp python python3 question-answering

Last synced: 07 Nov 2024

https://github.com/datasciencecampus/pyGrams

Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence

dsc-projects emergence-calculations natural-language-processing nlp nltk patents python scikit-learn tf-idf

Last synced: 27 Oct 2024

https://github.com/zimmerrol/keras-utility-layer-collection

Collection of custom layers and utility functions for Keras which are missing in the main framework.

attention deep-learning keras layers lstm nlp normalization rnn

Last synced: 09 Nov 2024

https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 11 Nov 2024

https://github.com/winkjs/wink-sentiment

Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉

emoji emoticons hashtag nlp sentiment sentiment-analysis sentiment-classification sentiment-scores wink

Last synced: 09 Nov 2024

https://github.com/thudm/multilingual-glm

The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective

deep-learning language-model nlp pytorch

Last synced: 14 Nov 2024

https://github.com/yasinkuyu/turkish.php

Turkish Suffix Library for PHP - Türkçe Çekim ve Yapım Ekleri

nlp php stem vowel

Last synced: 06 Nov 2024

https://github.com/batzner/tensorlm

Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

char-lm char-rnn language-model nlp tensorflow tensorflow-library

Last synced: 30 Oct 2024

https://github.com/proycon/folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

computational-linguistics corpus file-format folia language library linguistic-annotation-framework linguistics nlp python xml

Last synced: 14 Oct 2024

https://github.com/swabhs/joint-lstm-parser

Transition-based joint syntactic dependency parser and semantic role labeler using a stack LSTM RNN architecture.

conll dynet joint-parser natural-language-processing nlp semantic-parser syntactic-parser transition-based-parser

Last synced: 28 Oct 2024

https://github.com/avilum/llama-saas

A client/server for LLaMA (Large Language Model Meta AI) that can run ANYWHERE.

ai client-server facebook llama llm nlp

Last synced: 28 Oct 2024

https://github.com/google-research-datasets/textnormalizationcoveringgrammars

Covering grammars for English and Russian text normalization

nlp speech-recognition text-to-speech

Last synced: 08 Nov 2024

https://github.com/deepset-ai/haystack-integrations

🚀 A list of Haystack Integrations, maintained by the community or deepset.

community machine-learning nlp open-source

Last synced: 06 Nov 2024

https://github.com/dbklim/voice_chatbot

Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.

attention-model bot chatbot flask gensim keras lstm natural-language-processing nlp pocketsphinx restful-api rhvoice russian seq2seq speech-recognition speech-synthesis tensorflow text-processing word2vec

Last synced: 11 Nov 2024

https://github.com/messense/fasttext-serving

fastText model serving service

fasttext model-server model-serving nlp

Last synced: 14 Nov 2024

https://github.com/takelab/podium

Podium: a framework agnostic Python NLP library for data loading and preprocessing

data-loading datasets natural-language-processing nlp preprocessing python

Last synced: 08 Nov 2024

https://github.com/ohnlp/MedTagger

MedTagger is a light weight clinical NLP system built upon Apache UIMA.

nlp uima

Last synced: 04 Aug 2024

https://github.com/openeventdata/plover

Next generation event data ontology

event-data nlp political-science shared-tasks

Last synced: 12 Nov 2024

https://github.com/maum-ai/pnlp-mixer

Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)

mlp-mixer nlp pytorch pytorch-lightning

Last synced: 11 Nov 2024

https://github.com/ymcui/expmrc

ExpMRC: Explainability Evaluation for Machine Reading Comprehension

cmrc2022 dataset explainable-ai expmrc machine-reading-comprehension nlp question-answering xai

Last synced: 28 Oct 2024

https://github.com/omarsar/nlp_newsletter

Natural language processing (NLP) newsletter right on GitHub

deep-learning natural-language-processing nlp nlp-machine-learning

Last synced: 13 Oct 2024

https://github.com/louisbrulenaudet/ragoon

High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡

ai embeddings embeddings-similarity faiss generative-ai groq groqapi llama llama-index llm nlp rag retrieval-augmented-generation vector-database vector-search vectorization

Last synced: 12 Nov 2024

https://github.com/salesforce/factualNLG

Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"

factual-consistency factuality large-language-models llm nlp summarization

Last synced: 27 Oct 2024

https://github.com/greenelab/snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

analysis dataset hetnet machine-learning methodology nlp script snorkel text-mining tool workflow

Last synced: 13 Nov 2024

https://github.com/josefalbers/roy

Roy: A lightweight, model-agnostic framework for crafting advanced multi-agent systems using large language models.

agent agentgpt autogen autogpt baby-agi chat chatbot code-generation code-generator gpt langchain llm llm-agent multi-agent nlp prompt-engineering quantization retrieval-augmented-generation vector-index wizardcoder

Last synced: 08 Nov 2024

https://github.com/natasha/nerus

Large silver standart Russian corpus with NER, morphology and syntax markup

corpus-linguistics morphology ner nlp python russian syntax

Last synced: 10 Nov 2024

https://github.com/salesforce/factualnlg

Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"

factual-consistency factuality large-language-models llm nlp summarization

Last synced: 08 Nov 2024

https://github.com/wjbmattingly/spacyex

SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

nlp spacy

Last synced: 31 Oct 2024

https://github.com/messense/fasttext-rs

fastText Rust binding

fasttext nlp

Last synced: 14 Nov 2024

https://github.com/searchableai/kitanaqa

KitanaQA: Adversarial training and data augmentation for neural question-answering models

adversarial-attacks adversarial-training bert data-augmentation ml-automation natural-language-processing nlp pytorch question-answering transformer

Last synced: 13 Oct 2024

https://github.com/OpenSUM/CPSUM

Code and Data Repo for COLING'22 paper "Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization"

extractive-summarization nlp semi-supervised-learning

Last synced: 03 Aug 2024

https://github.com/doragd/text-classification-pytorch

Implementation of papers for text classification task on SST-1/SST-2

bilstm-attention nlp sentiment-classification text-classification textcnn

Last synced: 29 Oct 2024

https://github.com/saturncloud/dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

computer-vision dask deep-learning distributed-computing machine-learning nlp pytorch

Last synced: 03 Aug 2024

https://github.com/argilla-io/adept-augmentations

A Python library aimed at dissecting and augmenting NER training data.

dataset datasets few-shot-learning machine-learning natural-language-processing nlp spacy

Last synced: 18 Oct 2024

https://github.com/Legilibre/legi.py

Outils de manipulation des archives LEGI (lois françaises)

france laws legi legislation natural-language-processing nlp opendata python

Last synced: 03 Sep 2024

https://github.com/nicolay-r/AREkit

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML

bert datasets frames language-models neural-networks nlp pandas pandas-dataframe prompt prompting relation-extraction sentiment-analysis tensorflow

Last synced: 01 Nov 2024

https://github.com/thunlp/paragraph2vec

Paragraph Vector Implementation

nlp

Last synced: 10 Nov 2024

https://github.com/cluebenchmark/mobileqa

离线端阅读理解应用 QA for mobile, Android & iPhone

albert android bert chinese iphone machine-reading-comprehension nlp qa tensorflow tflite

Last synced: 09 Nov 2024

https://github.com/doccano/doccano-mini

Annotation meets Large Language Models (ChatGPT, GPT-3 and alike).

annotation langchain nlp openai

Last synced: 01 Nov 2024

https://github.com/NISH1001/tag-generator

A simple tool to generate tags for the given text (document) using TF-IDF.

nlp tagging tf-idf tfidf

Last synced: 05 Nov 2024

https://github.com/knadh/indic.page

A directory of Indic (Indian) language computing resources.

datasets indian-language indic-languages language linguistics nlp

Last synced: 28 Oct 2024

https://github.com/kyubyong/koparadigm

KoParadigm: Korean Inflectional Paradigm Generator

inflection korean linguistics morphology nlp paradigm

Last synced: 10 Nov 2024

https://github.com/princeton-nlp/calm-textgame

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

calm gpt n-gram nlp rl text-based-game

Last synced: 11 Nov 2024