Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

https://github.com/chunelfeng/caiss

一款简单好用的 跨平台/多语言的 相似向量/相似词/相似句 高性能检索引擎。欢迎star & fork。Build together! Power another !

ai ann chatbot deep-learning faiss hnsw mrpt nlp search-engine similarity-search

Last synced: 28 Oct 2024

https://github.com/phantominsights/subreddit-analyzer

A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.

matplotlib nlp pandas python3 seaborn spacy wordcloud

Last synced: 13 Nov 2024

https://github.com/Brokenwind/BertSimilarity

Computing similarity of two sentences with google's BERT algorithm。利用Bert计算句子相似度。语义相似度计算。文本相似度计算。

bert nlp python semantic similarity tensorflow

Last synced: 02 Nov 2024

https://github.com/salesforce/matchbox

Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.

deep-learning minibatch nlp pytorch

Last synced: 14 Nov 2024

https://github.com/phantominsights/mexican-government-report

Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.

geopandas matplotlib nlp numpy pandas seaborn spacy

Last synced: 13 Nov 2024

https://github.com/shibing624/pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。

bert classification focalloss-pytorch hierarchical machine-learning nlp pytextclassifier python pytorch softmax text-classification text-classifier

Last synced: 11 Oct 2024

https://github.com/dccuchile/beto

BETO - Spanish version of the BERT model

bert bert-model nlp spanish transformers transformers-library

Last synced: 05 Aug 2024

https://github.com/proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing

Last synced: 19 Oct 2024

https://github.com/PrithivirajDamodaran/Styleformer

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

active formal-languages informal-sentences nlp passive slang style-transfer text-style text-style-transfer text-style-transfer-benchmark

Last synced: 03 Nov 2024

https://github.com/CogComp/cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

big-data cogcomp data-mining dependency-parsing lemmatization lemmatizer named-entity-recognition natural-language-processing natural-language-understanding ner nlp parts-of-speech-tagging pos pos-tagging relation-extraction similarity tokenizer transliteration

Last synced: 30 Oct 2024

https://github.com/Beomi/KcBERT

🤗 Pretrained BERT model & WordPiece tokenizer trained on Korean Comments 한국어 댓글로 프리트레이닝한 BERT 모델과 데이터셋

bert bert-model korean-nlp nlp transformers

Last synced: 09 Nov 2024

https://github.com/koaning/whatlies

Toolkit to help understand "what lies" in word embeddings. Also benchmarking!

embeddings nlp visualisations

Last synced: 29 Oct 2024

https://github.com/huggingface/node-question-answering

Fast and production-ready question answering in Node.js

bert nlp nodejs question-answering tensorflow transformers typescript

Last synced: 30 Oct 2024

https://github.com/ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is currently unmaintained, issues will probably not be addressed.

deep-learning document-classification hierarchical-attention-networks machine-learning nlp tensorflow

Last synced: 06 Nov 2024

https://github.com/ynqa/wego

Word Embeddings (e.g. Word2Vec) in Go!

glove go machine-learning nlp word-embeddings word2vec

Last synced: 29 Oct 2024

https://github.com/LingDong-/cope

A modern IDE for writing classical Chinese poetry 格律诗编辑程序

bag-of-words chinese chinese-poetry editor electron ide nlp poetry

Last synced: 01 Nov 2024

https://github.com/hendrikstrobelt/detecting-fake-text

Giant Language Model Test Room

ai nlp visualization

Last synced: 13 Nov 2024

https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024

A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.

aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo

Last synced: 14 Nov 2024

https://github.com/jina-ai/examples

Jina examples and demos to help you get started

deep-learning examples jina neural-search nlp onboarding python semantic-search tutorials

Last synced: 01 Nov 2024

https://github.com/judahpaul16/gpt-home

ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.

ai async automation chatgpt docker fastapi home-assistant home-automation iot llm nginx nlp nodejs openai python raspberry-pi react speech-recognition spotify typescript

Last synced: 12 Nov 2024

https://github.com/ruu3f/freegpt

freeGPT provides free access to text and image generation models.

ai artificial-intelligence chatgpt deep-learning freegpt gpt gpt4all gpt4free llama llm machine-learning nlp python

Last synced: 10 Oct 2024

https://github.com/imgarylai/bert-embedding

🔡 Token level embeddings from BERT model on mxnet and gluonnlp

bert gluonnlp mxnet natural-language-processing nlp word-embeddings

Last synced: 02 Nov 2024

https://github.com/huggingface/large_language_model_training_playbook

An open collection of implementation tips, tricks and resources for training large language models

cuda large-language-models llm nccl nlp performance python pytorch scalability troubleshooting

Last synced: 11 Nov 2024

https://github.com/johanmodin/clifs

Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine learning model CLIP

ai machine-learning nlp openai python search text video

Last synced: 01 Nov 2024

https://github.com/vas3k/infomate.club

RSS feed aggregator with collections and NLP article summarization

feed nlp nltk python rss telegram

Last synced: 29 Oct 2024

https://github.com/adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

computational-linguistics corpus-linguistics german-language natural-language-processing nlp text-mining

Last synced: 26 Oct 2024

https://github.com/ayoungprogrammer/nlquery

Natural Language Engine on WikiData

dbpedia nlp wikidata

Last synced: 04 Aug 2024

https://github.com/magpie-align/magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

alignment dataset gemma llama2 llama3 llm nlp paper phi3 qwen2 supervised-finetuning synthetic-data synthetic-dataset-generation

Last synced: 10 Oct 2024

https://github.com/Cartus/AGGCN

Attention Guided Graph Convolutional Networks for Relation Extraction (authors' PyTorch implementation for the ACL19 paper)

deep-learning graph-convolutional-networks graph-neural-networks information-extraction nlp relation-extraction

Last synced: 02 Nov 2024

https://github.com/intelligo-mn/intelligo

Intelligo is powerful chatbot builder that enables anyone to create and deploy chatbots anywhere.

ai artificial-intelligence bot bot-framework bots chatbot machine-learning messenger-api messenger-bot messenger-chatbots nlp nodejs slack slack-bot

Last synced: 13 Nov 2024

https://github.com/zzy99/epidemic-sentence-pair

天池 疫情相似句对判定大赛 线上第一名方案

nlp

Last synced: 03 Aug 2024

https://github.com/bdbc-kg-nlp/ie-survey

北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。

extraction nlp survey

Last synced: 12 Nov 2024

https://github.com/pochih/RL-Chatbot

🤖 Deep Reinforcement Learning Chatbot

chatbot deep-learning nlp reinforcement-learning seq2seq-model tensorflow

Last synced: 11 Nov 2024

https://github.com/hyunwoongko/kss

KSS: Korean String processing Suite

korean korean-nlp kss nlp sentences split-sentences

Last synced: 13 Nov 2024

https://github.com/airaria/visual-chinese-llama-alpaca

多模态中文LLaMA&Alpaca大语言模型(VisualCLA)

alpaca chinese llama llm lora multimodal nlp vision-language

Last synced: 11 Nov 2024

https://github.com/the-finai/pixiu

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

aifinance chatgpt fintech gpt-4 large-language-models llama machine-learning named-entity-recognition natural-language-processing nlp pixiu question-answering sentiment-analysis stock-price-prediction text-classification

Last synced: 09 Nov 2024

https://github.com/The-FinAI/PIXIU

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

aifinance chatgpt fintech gpt-4 large-language-models llama machine-learning named-entity-recognition natural-language-processing nlp pixiu question-answering sentiment-analysis stock-price-prediction text-classification

Last synced: 24 Oct 2024

https://github.com/houbb/opencc4j

🇨🇳Open Chinese Convert is an opensource project for conversion between Traditional Chinese and Simplified Chinese.(java 中文繁简体转换)

chinese dfa java java7 nlp opencc simple-tranditional trie trie-tree

Last synced: 14 Nov 2024

https://github.com/jjangsangy/ExplainToMe

Automatic Web Article Summarizer

docker heroku nlp python textrank

Last synced: 26 Oct 2024

https://github.com/llhthinker/NLP-Papers

Natural Language Processing Papers

deep-learning nlp

Last synced: 10 Nov 2024

https://github.com/IntelLabs/RAGFoundry

Framework for enhancing LLMs for RAG tasks using fine-tuning.

evaluation fine-tuning information-retrieval llm nlp question-answering rag semantic-search

Last synced: 21 Aug 2024

https://github.com/Droidtown/ArticutAPI

API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。

artificial-intelligence cws natural-language-processing natural-language-understanding nlp nlu part-of-speech-embdding part-of-speech-tagger pos-tagger pos-tagging

Last synced: 30 Oct 2024

https://github.com/MuQiuJun-AI/bert4pytorch

超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新

bert nlp pytorch transformer

Last synced: 06 Nov 2024

https://github.com/microsoft/rat-sql

A relation-aware semantic parsing model from English to SQL

dbqa nl2sql nlp program-synthesis question-answering semantic-parsing transformers

Last synced: 07 Oct 2024

https://github.com/erickrf/nlpnet

A neural network architecture for NLP tasks, using cython for fast performance. Currently, it can perform POS tagging, SRL and dependency parsing.

natural-language-processing neural-network nlp parsing pos-tagging semantic-role-labeling

Last synced: 15 Nov 2024

https://github.com/muqiujun-ai/bert4pytorch

超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新

bert nlp pytorch transformer

Last synced: 01 Oct 2024

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 13 Nov 2024

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 11 Oct 2024

https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca

多模态中文LLaMA&Alpaca大语言模型(VisualCLA)

alpaca chinese llama llm lora multimodal nlp vision-language

Last synced: 08 Aug 2024

https://github.com/microsoft/azureml-bert

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

azure-machine-learning azureml-bert bert bert-model finetuning language-model nlp pretrained-models pretraining pytorch tuning

Last synced: 02 Nov 2024

https://github.com/tomaarsen/spanmarkerner

SpanMarker for Named Entity Recognition

huggingface ner nlp spacy spacy-extension transformers

Last synced: 14 Oct 2024

https://github.com/shibing624/nlp-tutorial

自然语言处理(NLP)教程,包括:词向量,词法分析,预训练语言模型,文本分类,文本语义匹配,信息抽取,翻译,对话。

dialogue language-model machine-translation nlp seq2seq text-classification text-generation torch word-embedding

Last synced: 14 Nov 2024

https://github.com/openmoss/collie

Collaborative Training of Large Language Models in an Efficient Way

deep-learning deepspeed nlp pytorch

Last synced: 09 Nov 2024

https://github.com/Microsoft/AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

azure-machine-learning azureml-bert bert bert-model finetuning language-model nlp pretrained-models pretraining pytorch tuning

Last synced: 02 Nov 2024

https://github.com/Microsoft/rat-sql

A relation-aware semantic parsing model from English to SQL

dbqa nl2sql nlp program-synthesis question-answering semantic-parsing transformers

Last synced: 18 Aug 2024

https://github.com/OpenMOSS/CoLLiE

Collaborative Training of Large Language Models in an Efficient Way

deep-learning deepspeed nlp pytorch

Last synced: 03 Aug 2024

https://github.com/shixzie/nlp

[UNMANTEINED] Extract values from strings and fill your structs with nlp.

go golang natural-language-processing nlp parse text text-extraction

Last synced: 02 Nov 2024

https://github.com/microsoft/AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

azure-machine-learning azureml-bert bert bert-model finetuning language-model nlp pretrained-models pretraining pytorch tuning

Last synced: 07 Aug 2024

https://github.com/Shixzie/nlp

[UNMANTEINED] Extract values from strings and fill your structs with nlp.

go golang natural-language-processing nlp parse text text-extraction

Last synced: 26 Oct 2024

https://github.com/kermitt2/delft

a Deep Learning Framework for Text https://delft.readthedocs.io/

deep-learning keras ner nlp sequence-labeling text-classification

Last synced: 13 Nov 2024

https://github.com/msg-systems/holmes-extractor

Information extraction from English and German texts based on predicate logic

information-extraction machine-learning nlp ontology python semantics spacy spacy-extension

Last synced: 14 Nov 2024

https://github.com/huggingface/tflite-android-transformers

DistilBERT / GPT-2 for on-device inference thanks to TensorFlow Lite with Android demo apps

android nlp tensorflow tensorflow-lite transformers

Last synced: 08 Nov 2024