Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/lucasmccabe/emailGPT

a quick and easy interface to generate emails with ChatGPT

chatgpt gpt nlp openai productivity streamlit

Last synced: 01 Aug 2024

https://github.com/vrasneur/pyfasttext

Yet another Python binding for fastText

fasttext machine-learning nlp numpy python python-bindings word-vectors

Last synced: 01 Aug 2024

https://github.com/BLLIP/bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

ai artificial-intelligence computational-linguistics machine-learning natural-language-processing nlp nlp-library parsing

Last synced: 31 Jul 2024

https://github.com/daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

analyzer japanese morphological-analysis nlp rust segmentation tokenization tokenizer

Last synced: 01 Aug 2024

https://github.com/hxu296/nlp-resume-parser

NLP-powered, GPT-3 enabled Resume Parser from PDF to JSON.

gpt-3 nlp nlp-parsing open-ai parser resume resume-parer

Last synced: 02 Aug 2024

https://github.com/FedML-AI/FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022

federated-learning machine-learning natural-language-processing nlp

Last synced: 02 Aug 2024

https://github.com/sunyilgdx/NSP-BERT

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

bert correference-resolution entity-linking entity-typing natural-language-inference nlp prompt-learning sentence-classification sentiment-analysis tensorflow text-classification zero-shot

Last synced: 03 Aug 2024

https://github.com/vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

deep-learning neural-network nlp

Last synced: 31 Jul 2024

https://github.com/hppRC/bert-classification-tutorial

【2023年版】BERTによるテキスト分類

bert deep-learning japanese nlp python pytorch transformers

Last synced: 01 Aug 2024

https://github.com/soskek/bert-chainer

Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

bert chainer google natural-language-processi natural-language-understanding nlp transformer

Last synced: 01 Aug 2024

https://github.com/openvenues/node-postal

NodeJS bindings to libpostal for fast international address parsing/normalization

address address-parser binding international native nlp

Last synced: 28 Aug 2024

https://github.com/mindflowai/mindflow

🧠 AI-powered CLI git wrapper, boilerplate code generator, chat history manager, and code search engine to streamline your dev workflow 🌊

chat-gpt cli code-generation command-line-interface dev-tools git git-wrapper information-retrieval large-language-models llm machine-learning modern-dev-tools nlp openai openai-api python search search-engine

Last synced: 31 Jul 2024

https://github.com/natasha/slovnet

Deep Learning based NLP modeling for Russian language

bert deep-learning machine-learning morphology ner nlp python pytorch russian syntax

Last synced: 07 Aug 2024

https://github.com/liyucheng09/selective_context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

chatgpt llms nlp

Last synced: 02 Aug 2024

https://github.com/IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: 31 Jul 2024

https://github.com/mmxgn/spacy-clausie

Implementation of the ClausIE information extraction system for python+spacy

clausie information-extraction nlp problog python-spacy spacy

Last synced: 03 Aug 2024

https://github.com/vipul-sharma20/sharingan

Tool to extract news articles from newspaper and give the context about the news

context-extraction news-extraction nlp opencv

Last synced: 02 Aug 2024

https://github.com/akaza-im/akaza

Yet another Japanese IME for IBus/Linux

ibus ime nlp rust

Last synced: 01 Aug 2024

https://github.com/ticki/eudex

A blazingly fast phonetic reduction/hashing algorithm.

nlp

Last synced: 01 Aug 2024

https://github.com/akoksal/Turkish-Word2Vec

Pre-trained Word2Vec Model for Turkish

gensim nlp turkish word2vec

Last synced: 02 Aug 2024

https://github.com/ajdavidl/Portuguese-NLP

List of resources and tools developed with focus on Portuguese.

nlp portuguese portuguese-language

Last synced: 02 Aug 2024

https://github.com/irlab-sdu/fuzi.mingcha

夫子•明察司法大模型是由山东大学、浪潮云、中国政法大学联合研发,以 ChatGLM 为大模型底座,基于海量中文无监督司法语料与有监督司法微调数据训练的中文司法大模型。该模型支持法条检索、案例分析、三段论推理判决以及司法对话等功能,旨在为用户提供全方位、高精准的法律咨询与解答服务。

chatglm-6b judicial large-language-models legal legal-ai legalai llms nlp pretrained-models

Last synced: 01 Aug 2024

https://github.com/Fixy-TR/fixy

Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.

acikhack2 ai artificial-intelligence bert data-science deep-learning deeplearning keras natural-language-processing neural-network neural-networks nlp python

Last synced: 02 Aug 2024

https://github.com/coteries/cedille-ai

✒️ Cedille is a large French language model (6B), released under an open-source license

machine-learning nlg nlp

Last synced: 01 Aug 2024

https://github.com/thunlp/THUCTC

An Efficient Chinese Text Classifier

chinese-nlp nlp

Last synced: 01 Aug 2024

https://github.com/OpenNewsLabs/guri-vr

https://gurivr.com

nlp virtual-reality vr webvr

Last synced: 06 Aug 2024

https://github.com/iPieter/RobBERT

A Dutch RoBERTa-based language model

bert bert-model language-model nlp nlp-resources roberta transformers

Last synced: 03 Aug 2024

https://github.com/textvec/textvec

Text vectorization tool to outperform TFIDF for classification tasks

machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf

Last synced: 01 Aug 2024

https://github.com/WZBSocialScienceCenter/tmtoolkit

Text Mining and Topic Modeling Toolkit for Python with parallel processing power

evaluation nlp parallel-processing python socialscience text-processing topic-modeling

Last synced: 02 Aug 2024

https://github.com/guotong1988/NL2SQL-RULE

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179

bert deep-learning knowledge knowledge-representation nl2sql nlp pytorch rule-inject-to-model semantic-parsing text2sql

Last synced: 02 Aug 2024

https://github.com/MaartenGr/Concept

Concept Modeling: Topic Modeling on Images and Text

computer-vision image-processing nlp topic-modeling

Last synced: 01 Aug 2024

https://github.com/intelligo-mn/neuro

🔮 Neuro.js is machine learning library for building AI assistants and chat-bots.

ai ai-assistants bot chat-bot chat-bots chatbot machine-learning natural-language-processing nlp nodejs

Last synced: 28 Aug 2024

https://github.com/ines/spacy-js

🎀 JavaScript API for spaCy with Python REST API

javascript natural-language-processing nlp python rest-api spacy

Last synced: 04 Aug 2024

https://github.com/ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

nlp peer-reviewed r r-package rstats text-mining tokenizer

Last synced: 05 Aug 2024

https://github.com/ShawnyXiao/2017-CCF-BDCI-AIJudge

2017-CCF-BDCI-让AI当法官(初赛):7th/415 (Top 1.68%)

2017 bdci ccf data-mining multiclass-classification nlp

Last synced: 01 Aug 2024

https://github.com/Attempto/APE

Parser for Attempto Controlled English (ACE)

ace attempto cnl nlp swi-prolog

Last synced: 02 Aug 2024

https://github.com/beader/ruijin_round2

瑞金医院MMC人工智能辅助构建知识图谱大赛复赛

nlp relation-extraction tianchi

Last synced: 02 Aug 2024

https://github.com/hrwhisper/SpamMessage

中文垃圾短信识别(手写分类器)

machine-learning nlp python

Last synced: 04 Aug 2024

https://github.com/PaddlePaddle/PALM

a Fast, Flexible, Extensible and Easy-to-use NLP Large-scale Pretraining and Multi-task Learning Framework.

baidu multi-task-learning nlp paddlepaddle pretrain-model transformers

Last synced: 07 Aug 2024

https://github.com/xatkit-bot-platform/xatkit

The simplest way to build all types of smart chatbots and digital assistants

bot chatbot-framework chatbots conversational-ai digital-assistant dsl low-code nlp no-code

Last synced: 01 Aug 2024

https://github.com/daspartho/prompt-extend

extending stable diffusion prompts with suitable style cues using text generation

deep-learning gpt-2 huggingface-spaces huggingface-transformers machine-learning nlp prompt stable-diffusion text-generation

Last synced: 03 Aug 2024

https://github.com/mannefedov/compling_nlp_hse_course

Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ

computational-linguistics course hse machine-learning natural-language-processing nlp python

Last synced: 02 Aug 2024

https://github.com/CyberZHG/keras-xlnet

Implementation of XLNet that can load pretrained checkpoints

glue keras language-model nlp xlnet

Last synced: 03 Aug 2024

https://github.com/dccuchile/wefe

WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Word Embeddings models. Please feel welcome to open an issue in case you have any questions or a pull request if you want to contribute to the project!

bias-detection bias-reduction fairness-ai fairness-ml library nlp nlp-library python3 word-embedding-evaluation word-embedding-fairness word-embeddings

Last synced: 05 Aug 2024

https://github.com/j2kao/fcc_nn_research

(somewhat) cleaned-up notebooks used in researching public comments for FCC Proceeding 17-108 (Net Neutrality Repeal)

fcc net-neutrality nlp

Last synced: 09 Aug 2024

https://github.com/opensemanticsearch/open-semantic-entity-search-api

Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of entities like persons, organizations and places for (semi)automatic semantic tagging & analysis of documents by linked data knowledge graph like SKOS thesaurus, RDF ontology, database(s) or list(s) of names

api disambiguation entity-extraction knowledge-graph knowledgebase linked-data linked-data-api linkeddata named-entities named-entity-recognition natural-language-processing nlp python reconciliation reconciliation-service rest-api semantic semantic-analysis semantic-annotation thesaurus

Last synced: 01 Aug 2024

https://github.com/Uzay-G/espial

Espial is an engine for automated organization and discovery of personal knowledge

knowledge knowledge-graph nlp python

Last synced: 01 Aug 2024

https://github.com/coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

benchmark lawtech legal legaltech nlp

Last synced: 01 Aug 2024

https://github.com/rizerphe/obsidian-companion

Autocomplete your obsidian notes with AI, including ChatGPT, through a copilot-like interface.

ai ai21labs chatgpt groq groq-ai large-language-models llm llm-local nlp obsidian-md obsidian-plugin ollama oobabooga openai

Last synced: 01 Aug 2024

https://github.com/rylans/getlang

Natural language detection package in pure Go

language-model natural-language nlp

Last synced: 30 Jul 2024

https://github.com/IlyaGusev/summarus

Models for automatic abstractive summarization

deep-learning machine-learning nlp pytorch summarization

Last synced: 01 Aug 2024

https://github.com/sorgerlab/indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.

bioinformatics biology computational-biology indra modeling nlp pysb sbml systems-biology

Last synced: 09 Aug 2024

https://github.com/princeton-nlp/OptiPrompt

[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240

nlp probing prompt

Last synced: 04 Aug 2024

https://github.com/natasha/navec

Compact high quality word embeddings for Russian language

embeddings glove nlp python quantization russian word2vec

Last synced: 07 Aug 2024

https://github.com/avidale/compress-fasttext

Tools for shrinking fastText models (in gensim format)

fasttext fasttext-embeddings nlp python word-embeddings

Last synced: 02 Aug 2024

https://github.com/crazyofapple/Reading_groups

A paper & resource list of large language models, including course, paper, demo, figures

chatgpt gpt-3 gpt-4 large-language-models llm llms natural-language-processing nlp

Last synced: 02 Aug 2024

https://github.com/akshaynagpal/w2n

Convert number words (eg. twenty one) to numeric digits (21)

nlp numeric-digits python word-to-number

Last synced: 05 Aug 2024

https://github.com/NPCai/Open-IE-Papers

Open Information Extraction (OpenIE) and Open Relation Extraction (ORE) papers and data.

information-extraction literature-review nlp openie papers relation-extraction tuples

Last synced: 02 Aug 2024

https://github.com/geekjr/quickai

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

ai artificial-intelligence bert deep-learning dl easy-to-use fast gpt gpt-neo huggingface-transformers ml neural-network nlp object-detection python pytorch quickai research tensorflow2 yolo

Last synced: 02 Aug 2024

https://github.com/irudnyts/openai

An R package-wrapper around OpenAI API

api ml nlp openai package r

Last synced: 13 Aug 2024

https://github.com/indix/whatthelang

Lightning Fast Language Prediction 🚀

fasttext language-detection languages nlp python

Last synced: 01 Aug 2024

https://github.com/kuutsav/information-retrieval

Neural information retrieval / semantic-search / Bi-Encoders

information-retrieval machine-learning nlp semantic-search

Last synced: 03 Aug 2024

https://github.com/platisd/duplicate-code-detection-tool

A simple Python3 tool to detect similarities between files within a repository

code-duplication gensim nlp

Last synced: 31 Jul 2024

https://github.com/Lancern/asm2vec

An unofficial implementation of asm2vec as a standalone python package

asm2vec binary-analysis machine-learning nlp numpy python python3 unofficial word2vec

Last synced: 03 Aug 2024

https://github.com/microsoft/presidio-research

This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.

deep-learning flair machine-learning named-entity-recognition natural-language-processing ner nlp pii privacy spacy transformers

Last synced: 02 Aug 2024

https://github.com/fiddler-labs/fiddler-auditor

Fiddler Auditor is a tool to evaluate language models.

ai-observability evaluation generative-ai langchain llms nlp robustness

Last synced: 01 Aug 2024

https://github.com/Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

coordinates deep-learning geo-location geotagging machine-learning neural-network nlp nlp-machine-learning python pytorch transformers

Last synced: 01 Aug 2024

https://github.com/husseinmozannar/SOQAL

Arabic Open Domain Question Answering System using Neural Reading Comprehension

arabic arabic-language arabic-nlp deep-learning nlp question-answering reading-comprehension tf-idf

Last synced: 03 Aug 2024

https://github.com/apple/ml-mkqa

We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

dataset multilingual-evaluation nlp

Last synced: 01 Aug 2024

https://github.com/the-javapocalypse/Twitter-Sentiment-Analysis

This script can tell you the sentiments of people regarding to any events happening in the world by analyzing tweets related to that event

nlp python python3 sentiment sentiment-analysis textblob tweepy tweets twitter twitter-sentiment-analysis

Last synced: 30 Jul 2024

https://github.com/microsoft/ASTRA

Self-training with Weak Supervision (NAACL 2021)

machine-learning nlp weak-supervision weakly-supervised-learning

Last synced: 01 Aug 2024

https://github.com/zhpmatrix/BERTem

论文实现(ACL2019):《Matching the Blanks: Distributional Similarity for Relation Learning》

acl2019 bert-pytorch fewrel matching-the-blanks nlp relation-extraction

Last synced: 01 Aug 2024

https://github.com/smyja/blackmaria

Python package for webscraping in Natural language

gpt-3 nlp openai python webscraping

Last synced: 09 Aug 2024