Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/fdalvi/NeuroX

A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.

explainable-ai natural-language-processing neurons nlp nlp-machine-learning

Last synced: 03 Aug 2024

https://github.com/princeton-nlp/LLMBar

[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following

evaluation llm nlp

Last synced: 02 Aug 2024

https://github.com/saidziani/Arabic-News-Article-Classification

Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.

arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization

Last synced: 03 Aug 2024

https://github.com/cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

computer-vision multimodal-deep-learning nlp vision-and-language

Last synced: 01 Aug 2024

https://github.com/foxminchan/LawKnowledge

A legal knowledge search and Q&A application based on Vietnam's Legal Code and legal document database ⚖️

generative-ai microservice natural-language-processing nlp nx searching semantic-search

Last synced: 03 Aug 2024

https://github.com/mohabmes/Arabycia

Arabic NLP tool used to perform Text Search, POS tagging, Translation, auto-diacritization, etc..

arabic-language arabic-nlp nlp

Last synced: 03 Aug 2024

https://github.com/maxoodf/russian_news_corpus

Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ

articles corpus machine-learning ml nlp nlp-machine-learning russian text word2vec

Last synced: 15 Aug 2024

https://github.com/arazd/ProgressivePrompts

Progressive Prompts: Continual Learning for Language Models

continual-learning llms nlp prompt-tuning

Last synced: 09 Aug 2024

https://github.com/feldberlin/timething

Timething is a library for aligning text transcripts with their audio recordings.

alignment audio cli forced-alignment huggingface nlp python speech speech-recognition tts

Last synced: 31 Jul 2024

https://github.com/philgooch/abbreviation-extraction

Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs

abbreviations information-extraction keyword-extraction nlp python3

Last synced: 01 Aug 2024

https://github.com/legacyai/tf-transformers

State of the art faster Transformer with Tensorflow 2.0 ( NLP, Computer Vision, Audio ).

bert gpt2 keras language-model natural-language-processing nlp nlp-library tensorflow tensorflow2 text-classification text-generation transformer

Last synced: 01 Aug 2024

https://github.com/thushv89/manning_tf2_in_action

The official code repository for "TensorFlow in Action" by Manning.

computer-vision deep-learning machine-learning nlp notebook python tensorflow tensorflow2 tf tf2

Last synced: 31 Jul 2024

https://github.com/forzagreen/n2words

Convert numerical numbers to written numbers, in 25+ languages.

convert-numbers language natural-language nlp

Last synced: 03 Aug 2024

https://github.com/Dumbris/trunklucator

Python module for data scientists for quick creating annotation projects.

active-learning annotation annotation-tool data-science machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/ARBML/tkseem

Arabic Tokenization Library. It provides many tokenization algorithms.

arabic-nlp nlp tkseem tokenization

Last synced: 31 Jul 2024

https://github.com/aaaton/golem

A lemmatizer implemented in Go

golang lemmatizer nlp

Last synced: 01 Aug 2024

https://github.com/SentometricsResearch/sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

nlp prediction sentiment-analysis text-mining time-series

Last synced: 02 Aug 2024

https://github.com/datquocnguyen/jLDADMM

A Java package for the LDA and DMM topic models

gibbs-sampling lda nlp short-text topic-modeling topic-models

Last synced: 02 Aug 2024

https://github.com/hyunwoongko/summarizers

Package for controllable summarization

nlp summarization

Last synced: 02 Aug 2024

https://github.com/krystalan/Multi-hopRC

:notebook_with_decorative_cover: notes for Multi-hop Reading Comprehension and open-domain question answering

machine-reading-comprehension natural-language-processing nlp open-domain-qa paper-list question-answering

Last synced: 02 Aug 2024

https://github.com/unilight/R-NET-in-Tensorflow

R-NET implementation in TensorFlow.

machine-comprehension nlp squad tensorflow

Last synced: 07 Aug 2024

https://github.com/delph-in/pydelphin

Python libraries for DELPH-IN

delph-in hpsg mrs nlp profile python semantics

Last synced: 03 Aug 2024

https://github.com/Neuraxio/New-Empty-Python-Project-Base

The Perfect Python Project Template. Bored of coding anew the same thing for your new Python projects? Here is what you need. Click below on the "use this template" green button to start using it instantly. Rename the "project" folder and all references to this folder to customize your project name.

base computer-vision library nlp project-template project-templates pypi-package python-library time-series

Last synced: 03 Aug 2024

https://github.com/felladrin/MiniSearch

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

ai artificial-intelligence generative-ai gpu-accelerated information-retrieval llm llm-inference machine-learning nlp question-answering ratchet-ml retrieval-augmented-generation search search-engine searxng typescript web-llm webapp wllama

Last synced: 31 Jul 2024

https://github.com/bnosac/textrank

Summarise text by finding relevant sentences and keywords using the Textrank algorithm

natural-language-processing nlp r textrank textrank-algorithm

Last synced: 05 Aug 2024

https://github.com/Flight-School/ner

A command-line utility for extracting names of people, places, and organizations from text on macOS.

cli macos named-entity-recognition nlp swift

Last synced: 05 Aug 2024

https://github.com/cbilgili/zemberek-nlp-server

Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu

docker javascript nlp part-of-speech-tagger rest sentence-tokenizer spark turkish turkish-language zemberek

Last synced: 02 Aug 2024

https://github.com/bukosabino/justicio

Building an assistant for Boletin Oficial del Estado (BOE) using Retrieval Augmented Generation (RAG)

legal legaltech nlp spanish

Last synced: 31 Jul 2024

https://github.com/telecombcn-dl/2017-persontyle

Applied Deep Learning Workshop London 2017

computer-vision deep-learning nlp

Last synced: 07 Aug 2024

https://github.com/LanguageMachines/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing

Last synced: 31 Jul 2024

https://github.com/adrien2p/nestjs-dialogflow

Dialog flow module that simplify the web hook handling for your NLP application using NestJS :satellite:

addons dialogflow google-dialogflow nestjs nlp typescript webhook

Last synced: 01 Aug 2024

https://github.com/sorenlind/lemmy

🤘Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪

danish lemma lemmatizer nlp spacy swedish

Last synced: 31 Jul 2024

https://github.com/nalbion/whisper-server

streaming speech to text server using Whisper

idiolect nlp whisper

Last synced: 02 Aug 2024

https://github.com/PhantomInsights/comments-generator

A Reddit bot that generates new context-aware comments using Markov chains trained from a set of given users or subreddits comments history.

markov-chain nlp praw python3 reddit-bot requests

Last synced: 01 Aug 2024

https://github.com/akashp1712/nlp-akash

Natural Language Processing notes and implementations.

natural-language-processing nlp nlp-akash nltk summarization text-summarization

Last synced: 03 Aug 2024

https://github.com/Loodos/turkish-language-models

Transformer based Turkish language models

language-models natural-language-processing nlp turkish

Last synced: 02 Aug 2024

https://github.com/jantrienes/nereval

Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.

evaluation-metrics machine-learning named-entity-recognition nlp

Last synced: 07 Aug 2024

https://github.com/prajjwal1/fluence

A deep learning library based on Pytorch focussed on low resource language research and robustness

attention deep-learning nlp pytorch transformers

Last synced: 07 Aug 2024

https://github.com/Flight-School/sentiment

A command-line utility that evaluates the emotional sentiment of natural language text.

macos nlp polarity sentiment-analysis swift

Last synced: 09 Aug 2024

https://github.com/leehanchung/cs182

Berkeley CS182/282A Designing, Visualizing and Understanding Deep Neural Networks

berkeley cnn cs182 cs231n cs231n-assignment natural-language-processing nlp pytorch reinforcement-learning tensorflow transformer

Last synced: 31 Jul 2024

https://github.com/RUCAIBox/MVP

This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.

data-to-text dialog multi-task-learning natural-language-generation natural-language-processing nlg nlp plm pre-trained-model question-answering question-generation seq2seq sequence-to-sequence story-generation summarization text-generation

Last synced: 03 Aug 2024

https://github.com/bnosac/pattern.nlp

R package to perform sentiment analysis and Parts of Speech tagging for Dutch/French/English/German/Spanish/Italian

nlp pattern pos-tagging r sentiment-analysis text-mining

Last synced: 05 Aug 2024

https://github.com/Zlasejd/HuangDi

黄帝(Huang-Di)模型仓库,基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。

ancient-books llm nlp qusetion-answer tcm

Last synced: 03 Aug 2024

https://github.com/abitdodgy/gibran

Gibran is an Elixir natural language processor, and a port of WordsCounted.

elixir-lang natural-language-processing nlp

Last synced: 01 Aug 2024

https://github.com/LanguageMachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

computational-linguistics folia language natural-language-processing nlp punctuation tokeniser

Last synced: 31 Jul 2024

https://github.com/chnsh/deep-semantic-code-search

Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application

deep-learning deep-neural-networks nlp nlp-machine-learning

Last synced: 31 Jul 2024

https://github.com/techcentaur/PyLex

Perform lexical analysis on words, one word at a time.

cli lexical-analysis nlp poets python3 scraping words

Last synced: 30 Jul 2024

https://github.com/aiwithqasim/Free-Artificial-Intelligence-Resources

Welcome, to this Open Source Repository regarding FREE ARTIFICIAL INTELLIGENCE RESOURCE. Get Benefit from the free resources mention & kindly five STAR & FORK this so that it can get maximum Fame so that Everyone can take advantage.

ai article artificial-intelligence artificial-neural-networks blog data-science datascientist deep-learning freeresources hacktoberfest hecktoberfest2021 jobs machine-learning machine-learning-algorithms natural-language-processing nlp project python3 youtube

Last synced: 01 Aug 2024

https://github.com/usyiyi/nlp-py-2e-zh

:book: [译] Python 自然语言处理 中文第二版

book nlp nltk python

Last synced: 02 Aug 2024

https://github.com/mirfan899/Urdu

Collection of Urdu datasets for POS, NER, Sentiment, Summarization and NLP tasks.

machine-learning ner nlp sentiment-analysis spacy-models summarization urdu-language urdu-model

Last synced: 03 Aug 2024

https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 31 Jul 2024

https://github.com/winkjs/wink-sentiment

Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉

emoji emoticons hashtag nlp sentiment sentiment-analysis sentiment-classification sentiment-scores wink

Last synced: 31 Jul 2024

https://github.com/datasciencecampus/pyGrams

Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence

dsc-projects emergence-calculations natural-language-processing nlp nltk patents python scikit-learn tf-idf

Last synced: 31 Jul 2024

https://github.com/batzner/tensorlm

Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

char-lm char-rnn language-model nlp tensorflow tensorflow-library

Last synced: 31 Jul 2024

https://github.com/proycon/folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

computational-linguistics corpus file-format folia language library linguistic-annotation-framework linguistics nlp python xml

Last synced: 03 Aug 2024

https://github.com/avilum/llama-saas

A client/server for LLaMA (Large Language Model Meta AI) that can run ANYWHERE.

ai client-server facebook llama llm nlp

Last synced: 01 Aug 2024

https://github.com/ohnlp/MedTagger

MedTagger is a light weight clinical NLP system built upon Apache UIMA.

nlp uima

Last synced: 04 Aug 2024

https://github.com/OpenSUM/CPSUM

Code and Data Repo for COLING'22 paper "Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization"

extractive-summarization nlp semi-supervised-learning

Last synced: 03 Aug 2024

https://github.com/salesforce/factualNLG

Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"

factual-consistency factuality large-language-models llm nlp summarization

Last synced: 31 Jul 2024

https://github.com/saturncloud/dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

computer-vision dask deep-learning distributed-computing machine-learning nlp pytorch

Last synced: 03 Aug 2024

https://github.com/Legilibre/legi.py

Outils de manipulation des archives LEGI (lois françaises)

france laws legi legislation natural-language-processing nlp opendata python

Last synced: 03 Sep 2024

https://github.com/xyntopia/pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python

Last synced: 03 Aug 2024

https://github.com/NISH1001/tag-generator

A simple tool to generate tags for the given text (document) using TF-IDF.

nlp tagging tf-idf tfidf

Last synced: 01 Aug 2024

https://github.com/SamEdwardes/spacytextblob

A TextBlob sentiment analysis pipeline component for spaCy.

natural-language-processing nlp python spacy

Last synced: 04 Aug 2024

https://github.com/LaVi-Lab/CLEVA

[EMNLP 2023 Demo] CLEVA: Chinese Language Models EVAluation Platform

chinese evaluation nlp

Last synced: 03 Aug 2024

https://github.com/nicolay-r/AREkit

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML

bert datasets frames language-models neural-networks nlp pandas pandas-dataframe prompt prompting relation-extraction sentiment-analysis tensorflow

Last synced: 01 Aug 2024

https://github.com/minasmz/Persian-Summarization

Statistical and Semantical Text Summarizer in Persian Language

doc2vec-model gensim nlp persian-language persian-nlp text-summarization textrank-algorithm

Last synced: 04 Aug 2024

https://github.com/MrinmoiHossain/Udacity-Deep-Learning-Nanodegree

The course is contained knowledge that are useful to work on deep learning as an engineer. Simple neural networks & training, CNN, Autoencoders and feature extraction, Transfer learning, RNN, LSTM, NLP, Data augmentation, GANs, Hyperparameter tuning, Model deployment and serving are included in the course.

convolutional-networks convolutional-neural-networks deep-learning gans generative-adversarial-network long-short-term-memory-models lstms machine-learning nanodegree neural-network nlp pytorch recurrent-neural-networks rnn sentiment-analysis style-transfer transfer-learning udacity udacity-nanodegree

Last synced: 07 Aug 2024

https://github.com/Zlasejd/HuangDI

黄帝(Huang-Di)模型仓库,基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。

ancient-books llm nlp qusetion-answer tcm

Last synced: 01 Aug 2024

https://github.com/nzw0301/lightLDA

fast sampling algorithm based on CGS

lda machine-learning nlp python topic-modeling

Last synced: 01 Aug 2024

https://github.com/arne-cl/discoursegraphs

linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).

conversion converter natural-language-processing networkx nlp python

Last synced: 01 Aug 2024

https://github.com/masci/banks

LLM prompt language based on Jinja

chatgpt llm nlp openai prompt-engineering prompt-toolkit

Last synced: 31 Jul 2024

https://github.com/lonePatient/bert-sentence-similarity-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for sentence similarity task.

bert nlp pytorch sentence-similarity text-classification

Last synced: 01 Aug 2024

https://github.com/FerdinandZhong/punctuator

A small seq2seq punctuator tool based on DistilBERT

bert bert-ner chinese-nlp deep-learning nlp punctuation pytorch seq2seq

Last synced: 03 Aug 2024

https://github.com/LanguageMachines/PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation

computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow

Last synced: 01 Aug 2024

https://github.com/onesuper/HuggingFace-Datasets-Text-Quality-Analysis

Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas

dataset huggingface-datasets llm machine-learning nlp streamlit text-processing

Last synced: 09 Aug 2024

https://github.com/kootenpv/spacy_api

Server/Client around Spacy to load spacy only once

api machine-learning nlp spacy

Last synced: 07 Aug 2024

https://github.com/khakhulin/compressed-transformer

Compression of NMT transformer model with tensor methods

compression deep-learning mnist nlp nmt pytorch tensor-train transformer translation tucker

Last synced: 04 Aug 2024

https://github.com/obss/trapper

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

allennlp deep-learning natural-language-processing nlp python pytorch pytorch-transformers transformer transformers

Last synced: 02 Aug 2024

https://github.com/ChenghaoMou/pytorch-pQRNN

Implementation of pQRNN in PyTorch

nlp pqrnn pytorch text-classification

Last synced: 03 Aug 2024