Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jsksxs360/AHANLP
啊哈自然语言处理包,提供包括分词、依存句法分析、语义角色标注、自动摘要、语义相似度计算、LDA 主题预测、词云等服务。
Last synced: 03 Jul 2024
https://github.com/arian-askari/ChatGPT-RetrievalQA
A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on real human responses.
ai chatgpt chatgpt-information-retrieval chatgpt-ir data-augmentation dataset deep-learning gpt-3 gpt2 gpt3 information-retrieval information-retrieval-chatgpt ir ir-chatgpt machine-learning nlp openai python sequence-to-sequence text-retrieval
Last synced: 03 Jul 2024
https://github.com/Anbani/word-embeddings
anbani georgian natural-language-processing nlp word-embeddings
Last synced: 03 Jul 2024
https://github.com/microsoft/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
mixture-of-experts moe nlp pytorch transformer
Last synced: 03 Jul 2024
https://github.com/bigscience-workshop/bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
machine-learning models nlp training
Last synced: 03 Jul 2024
https://github.com/explosion/spacy-models
💫 Models for the spaCy Natural Language Processing (NLP) library
machine-learning machine-learning-models models natural-language-processing nlp spacy spacy-models statistical-models
Last synced: 02 Jul 2024
https://github.com/gkiril/benchie
Comprehensive evaluation framework for Open Information Extraction.
benchmark-framework dataset information-extraction natural-language-processing natural-language-understanding nlp nlp-datasets open-information-extraction
Last synced: 02 Jul 2024
https://github.com/tim5go/zhopenie
Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)
chinese chinese-nlp nlp relation-extraction semantic-web
Last synced: 02 Jul 2024
https://github.com/philipperemy/Stanford-OpenIE-Python
Stanford Open Information Extraction made simple!
extraction nlp python-wrapper stanford stanford-openie
Last synced: 02 Jul 2024
https://github.com/gkiril/MinSCIE
MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.
information-extraction natural-language-processing natural-language-toolkit natural-language-understanding nlp nlp-apis nlp-resources open-information-extraction
Last synced: 02 Jul 2024
https://github.com/crownpku/awesome-chinese-nlp
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
Last synced: 02 Jul 2024
https://github.com/brightmart/text_classification
all kinds of text classification models and more with deep learning
attention-mechanism classification convolutional-neural-networks fasttext memory-networks multi-class multi-label nlp sentence-classification tensorflow text-classification textcnn textrnn
Last synced: 02 Jul 2024
https://github.com/allenai/allennlp
An open-source NLP research library, built on PyTorch.
data-science deep-learning natural-language-processing nlp python pytorch
Last synced: 02 Jul 2024
https://github.com/openvinotoolkit/openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
ai computer-vision deep-learning deploy-ai diffusion-models generative-ai good-first-issue inference llm-inference natural-language-processing nlp openvino optimize-ai performance-boost recommendation-system speech-recognition stable-diffusion transformers yolo
Last synced: 02 Jul 2024
https://microsoft.github.io/AI-For-Beginners/?id=offline-access
12 Weeks, 24 Lessons, AI for All!
ai artificial-intelligence cnn computer-vision deep-learning gan machine-learning nlp rnn
Last synced: 02 Jul 2024
https://github.com/babylonhealth/fastText_multilingual
Multilingual word vectors in 78 languages
distributed-representations machine-learning machine-translation natural-language-processing nlp word-vectors
Last synced: 02 Jul 2024
https://github.com/vzhong/embeddings
Fast, DB Backed pretrained word embeddings for natural language processing.
deep-learning neural-network nlp
Last synced: 02 Jul 2024
https://github.com/mideind/GreynirServer
The greynir.is Icelandic natural language processing API and website.
earley grammar icelandic icelandic-language icelandic-news-sites information-extraction natural-language-processing natural-language-queries nlp parse-forests parse-trees parser python tf-idf tokenizer
Last synced: 02 Jul 2024
https://github.com/Beomi/KcBERT
🤗 Pretrained BERT model & WordPiece tokenizer trained on Korean Comments 한국어 댓글로 프리트레이닝한 BERT 모델과 데이터셋
bert bert-model korean-nlp nlp transformers
Last synced: 02 Jul 2024
https://github.com/SKTBrain/KoBERT
Korean BERT pre-trained cased (KoBERT)
bert korean-nlp language-model nlp pytorch transformers
Last synced: 02 Jul 2024
https://github.com/cosmoquester/2021-dialogue-summary-competition
[2021 훈민정음 한국어 음성•자연어 인공지능 경진대회] 대화요약 부문 알라꿍달라꿍 팀의 대화요약 학습 및 추론 코드를 공유하기 위한 레포입니다.
dialogue huggingface-transformers nlp pytorch-lightning summarization
Last synced: 02 Jul 2024
https://github.com/km1994/recommendation_advertisement_search
整理自然语言处理、推荐系统、搜索引擎等AI领域的入门笔记,论文学习笔记和面试资料(关于NLP那些你不知道的事、关于推荐系统那些你不知道的事、NLP百面百搭、推荐系统百面百搭、搜索引擎百面百搭)
advertisement nlp recommendation-system search-engine
Last synced: 02 Jul 2024
https://github.com/JackHCC/Arxiv-NLP-Reporter
每日自动获取Arxiv上NLP相关最新论文【Arxiv Natural Language Processing Paper Automatic Crawl Daily】
Last synced: 02 Jul 2024
https://github.com/techcentaur/PyLex
Perform lexical analysis on words, one word at a time.
cli lexical-analysis nlp poets python3 scraping words
Last synced: 01 Jul 2024
https://github.com/ARBML/tkseem
Arabic Tokenization Library. It provides many tokenization algorithms.
arabic-nlp nlp tkseem tokenization
Last synced: 01 Jul 2024
https://github.com/ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
adversarial-attacks awesome-list diffusion-models jailbreak language-model llm nlp privacy safety security vlm
Last synced: 01 Jul 2024
https://github.com/datasciencecampus/pyGrams
Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence
dsc-projects emergence-calculations natural-language-processing nlp nltk patents python scikit-learn tf-idf
Last synced: 01 Jul 2024
https://github.com/rguthrie3/DeepLearningForNLPInPytorch
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.
deep-learning lstm neural-network nlp pytorch tutorial
Last synced: 01 Jul 2024
https://github.com/LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow
Last synced: 30 Jun 2024
https://github.com/indix/whatthelang
Lightning Fast Language Prediction 🚀
fasttext language-detection languages nlp python
Last synced: 30 Jun 2024
https://github.com/davikawasaki/utfpr-ce-undergrad-final-project
UTFPR Computer Engineering Undergrad Final Project - Computing Exam Questions Classification Using Natural-Language Processing
adaptive-teaching computing-classification machine-learning natural-language-processing nlp nltk python sklearn
Last synced: 30 Jun 2024
https://github.com/makcedward/nlp
:memo: This repository recorded my NLP journey.
ai data-science deep-learning machine-learning nlp
Last synced: 30 Jun 2024
https://github.com/oxford-cs-deepnlp-2017/practical-open
Oxford Deep NLP 2017 course - Open practical
deep-learning machine-learning natural-language-processing nlp oxford
Last synced: 30 Jun 2024
https://github.com/love-irish/spellchecker
A ruby spellchecker library that works well with Irish
Last synced: 30 Jun 2024
https://github.com/thunlp/PromptPapers
Must-read papers on prompt-based tuning for pre-trained language models.
ai bert machine-learning nlp pre-trained-language-models prompt prompt-based prompt-learning prompt-toolkit
Last synced: 30 Jun 2024
https://github.com/ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit)
cjk japanese juman morphological-analyser morphological-analysis nlp part-of-speech-tagger pos-tagger pos-tagging tokenizer word-segmentation
Last synced: 30 Jun 2024
https://github.com/vi3k6i5/flashtext
Extract Keywords from sentence or Replace keywords in sentences.
data-extraction keyword-extraction nlp search-in-text word2vec
Last synced: 29 Jun 2024
https://github.com/anupamchugh/iowncode
A curated collection of iOS, ML, AR resources sprinkled with some UI additions
alamofire arkit computer-vision coreml coremltools ios keras ml-kit natural-language-processing nlp realitykit swift swiftui vision vision-framework
Last synced: 29 Jun 2024
https://github.com/mesejo/trex
Efficient string matching with regular expressions
keyword-extraction nlp pandas python python-library regex regular-expression search-in-text string-matching text-mining trie
Last synced: 29 Jun 2024
https://github.com/web-arena-x/webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Last synced: 29 Jun 2024
https://github.com/explosion/spacy-llm
🦙 Integrating LLMs into structured NLP pipelines
anthropic claude cohere dolly falcon gpt-3 gpt-4 large-language-models llama llm machine-learning named-entity-recognition natural-language-processing nlp openai prompt-engineering spacy text-classification
Last synced: 29 Jun 2024
https://github.com/CogComp/cogcomp-nlpy
CogComp's light-weight Python NLP annotators
data-mining natural-language-processing nlp text-mining text-processing
Last synced: 29 Jun 2024
https://github.com/rynst/awesome-llm-engineering
💻 An awesome & curated list of resources for large language model engineering (application layer: prompt engineering, fine tuning, etc.) [ Work In Progress, feel free to contribute! ]
gpt-3 machine-learning nlp prompt-engineering
Last synced: 29 Jun 2024
https://github.com/the-finai/pixiu
This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).
aifinance chatgpt fintech gpt-4 large-language-models llama machine-learning named-entity-recognition natural-language-processing nlp pixiu question-answering sentiment-analysis stock-price-prediction text-classification
Last synced: 29 Jun 2024
https://github.com/ukairia777/tensorflow-nlp-tutorial
tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.
bert bert-ner dpo huggingface keras-tutorial llama llm lora named-entity-recognition natural-language-processing nlp nlp-tutorial question-answering sft tensorflow trainer transformers
Last synced: 29 Jun 2024
https://github.com/dsgiitr/d2l-pytorch
This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.
book computer-vision d2l data-science deep-learning dive-into-deep-learning mxnet nlp pytorch pytorch-implmention
Last synced: 29 Jun 2024
https://github.com/Curated-Awesome-Lists/awesome-llms-fine-tuning
Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs). Perfect for ML practitioners and researchers!
ai awesome-list deep-learning fine-tuning gpt large-language-models llms machine-learning nlp transformers
Last synced: 29 Jun 2024
https://github.com/ART-Group-it/GASP
GASP! Dataset - Generating Abstracts of Scientific Papers from Abstracts of Cited Papers
corpus dataset machine-learning natural-language-processing nlp
Last synced: 28 Jun 2024
https://github.com/jerryji1993/DNABERT
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
deep-learning dnabert-model genome gpu kmer kmer-format machine-learning natural-language-processing nlp sequence
Last synced: 28 Jun 2024
https://github.com/zjunlp/OntoProtein
Code and datasets for the ICLR2022 paper "OntoProtein: Protein Pretraining With Gene Ontology Embedding"
bert gene-ontology iclr iclr2022 knowledge-graph nlp ontoprotein pretrained-models pretraining protein protein-function-prediction protein-pretraining protein-protein-interaction protein-structure-prediction pytorch
Last synced: 28 Jun 2024
https://github.com/CogStack/OpenGPT
A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).
chatgpt gpt-4 health healthcare huggingface llm medicine nlp opengpt
Last synced: 28 Jun 2024
https://github.com/CornellNLP/ConvoKit
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit
Last synced: 28 Jun 2024
https://github.com/Guitaricet/relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
deep-learning distributed-training llama nlp peft transformer
Last synced: 28 Jun 2024
https://github.com/salesforce/factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
factual-consistency factuality large-language-models llm nlp summarization
Last synced: 28 Jun 2024
https://github.com/filyp/autocorrect
Spelling corrector in python
autocorrect autocorrection czech english languages levenshtein-distance multilanguage multilingual nlp ocr polish portuguese python russian spanish spellchecker spelling spelling-corrector turkish ukrainian
Last synced: 27 Jun 2024
https://github.com/SamEdwardes/spacytextblob
A TextBlob sentiment analysis pipeline component for spaCy.
natural-language-processing nlp python spacy
Last synced: 27 Jun 2024
https://github.com/mhezarei/ai-bot
2020 AI bot challenge (ai-bot.ir) repository. This program answers a given question with a specific format and subject.
Last synced: 27 Jun 2024
https://github.com/jxmorris12/language_tool_python
a free python grammar checker 📝✅
grammar grammar-checker grammar-parser languagetool nlp python spellchecker
Last synced: 27 Jun 2024
https://github.com/srstevenson/keyword-extractor
Extract keywords from plain text documents
Last synced: 27 Jun 2024
https://github.com/htaghizadeh/PersianStemmer-Python
PersianStemmer-Python
information-retrieval nlp persian persian-language persian-nlp persian-stemmer stemmer
Last synced: 27 Jun 2024
https://github.com/AlirezaTheH/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 27 Jun 2024
https://github.com/kargaranamir/parstdex
A package that extracts Persian time and date markers by applying regexes -- AACL 2022
datetime event-extract event-extraction hengam hengamtagger information-extraction nlp parstdex persian persian-calendar persian-datetime persian-time regex-pattern time-date
Last synced: 27 Jun 2024
https://github.com/pooya-mohammadi/persian-spell-checker-kenlm
A complete instruction for training a Persian spell checker and a language model based on SymSpell and KenLM, respectively using Wikipedia dataset.
bash kenlm language-model nlp persian python spellcheck spellchecker symspell
Last synced: 27 Jun 2024
https://github.com/mohadese-yousefi/spell-correction
Simple autocorrect misspelled word base on distance.
Last synced: 27 Jun 2024
https://github.com/minasmz/Persian-Summarization
Statistical and Semantical Text Summarizer in Persian Language
doc2vec-model gensim nlp persian-language persian-nlp text-summarization textrank-algorithm
Last synced: 27 Jun 2024
https://github.com/johnbumgarner/wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
antonyms bag-of-words definitions dictionary homophones hypernyms hyponyms lexicon nlp python python3 synonyms text-analysis textual-analysis wordlists wordnet wordnets wordsearch
Last synced: 27 Jun 2024
https://github.com/roshan-research/hazm
Persian NLP Toolkit
dependency-parser embeddings farsi lemmatization natural-language-processing nlp normalization persian persian-nlp pos-tagging python text-processing tokenizer
Last synced: 27 Jun 2024
https://github.com/neilgupta/Sherlock
Natural-language event parser for Javascript
datetime event-parser javascript natural-language-processing nlp regex
Last synced: 27 Jun 2024
https://github.com/theamrzaki/text_summurization_abstractive_methods
Multiple implementations for abstractive text summurization , using google colab
abstractive-text-summarization ai artificial-intelligence deep-learning deeplearning encoder-decoder google-colab google-colaboratory machine-learning machinelearning nlp pointer-generator policy-gradient reinforcement-learning rnn seq2seq tensorflow text-summarization word2vec
Last synced: 26 Jun 2024
https://github.com/DanAnastasyev/DeepNLP-Course
Deep NLP Course
colab-notebook deep-learning keras nlp pytorch
Last synced: 26 Jun 2024
https://github.com/PaddlePaddle/models
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
computer-vision cv deep-learning models natural-language-processing neural-network nlp paddlepaddle recommendation speech
Last synced: 26 Jun 2024
https://github.com/HLasse/TextDescriptives
A Python library for calculating a large variety of metrics from text
dependency-distance descriptive-statistics nlp python readability readability-scores spacy spacy-extension statistics syntactic-analysis
Last synced: 26 Jun 2024
https://github.com/ymcui/Chinese-Mixtral
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
32k 64k large-language-models llm mixtral mixture-of-experts moe nlp
Last synced: 26 Jun 2024
https://github.com/sanjibnarzary/awesome-llm
Curated list of open source and openly accessible large language models
alpaca bloom chatgpt chatgpt-alternative chatgpt-alternatives chatllama gpt gpt-j koala llama llm llms nlp opt peft rlhf transformer transformers vicuna xturing
Last synced: 26 Jun 2024
https://github.com/kili-technology/awesome-datasets
A comprehensive list of annotated training datasets classified by use case.
annotation awesome-data-science awesome-datasets awesome-public-datasets corpora data dataset datasets document-processing entity-extraction entity-recognition ner nlp ocr open-datasets opendata opendatasets public-data public-dataset public-datasets
Last synced: 25 Jun 2024
https://github.com/seanghay/awesome-khmer-language
A large collection of Khmer language resources. Khmer is a language used by Cambodia.
ai cambodia cambodian g2p khmer khmer-dataset khmer-language khmer-nlp khmer-research khmer-resource machine-learning nlp research segmentation seq2seq transformer
Last synced: 25 Jun 2024
https://github.com/koayon/awesome-adaptive-computation
A curated reading list of research in Adaptive Computation, Dynamic Compute & Mixture of Experts (MoE).
adaptive-computation computer-vision machine-learning mixture-of-experts nlp pytorch tensorflow transformers
Last synced: 25 Jun 2024
https://github.com/maastrichtlawtech/awesome-legal-nlp
📖 A curated list of LegalNLP resources from all around the web.
artificial-intelligence law legal-ai nlp
Last synced: 25 Jun 2024
https://github.com/banglakit/awesome-bangla
A collection of tools, datasets and resources on Bangla computing
bangla bangla-computing bengali nlp
Last synced: 25 Jun 2024
https://github.com/kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
c-transformers chatgpt cpu cpu-inference deep-learning document-qa faiss langchain language-models large-language-models llama llama-2 llm machine-learning natural-language-processing nlp open-source-llm python sentence-transformers transformers
Last synced: 25 Jun 2024
https://github.com/different-ai/obsidian-ava
Quickly format your notes with ChatGPT in Obsidian
ai computer-vision deep-learning knowledge-management language-model machine-learning natural-language-processing nlp personal-knowledge-management second-brain
Last synced: 25 Jun 2024
https://github.com/Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.
in-context-learning language-model nlp
Last synced: 25 Jun 2024
https://github.com/AliOsm/arabic-text-diacritization
Benchmark Arabic text diacritization dataset
arabic-language comparison dataset diacritization iccais nlp sequence-labeling
Last synced: 24 Jun 2024
https://github.com/AliOsm/shakkelha
Neural Arabic text diacritization
arabic-language comparison dataset diacritization ffnn nlp rnn sequence-labeling
Last synced: 24 Jun 2024
https://github.com/AliAbdelaal/ATKSpy
this repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS
arabic arabic-nlp atks microsoft natural-language-processing nlp parser pos-tagger pos-tagging python3 soap-web-services
Last synced: 24 Jun 2024
https://github.com/forzagreen/n2words
Convert numerical numbers to written numbers, in 25+ languages.
convert-numbers language natural-language nlp
Last synced: 24 Jun 2024
https://github.com/ha-lins/MetaLearning4NLP-Papers
A list of recent papers about Meta / few-shot learning methods applied in NLP areas.
dialogue-systems few-shot-learning low-resource meta-learning nlp papers-collection semantic-parsing
Last synced: 24 Jun 2024
https://github.com/sinaahmadi/ScriptNormalization
Script Normalization for Unconventional Writing of Perso-Arabic scripts (ACL2023)
acl2023 arabic azeri gilaki gorani kashmiri kurdish kurdish-language-processing kurmanji less-resource-languages mazanderani nlp persian preprocessing script-normalization sindhi sorani turkish urdu
Last synced: 23 Jun 2024
https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
alpaca chinese llama llm lora multimodal nlp vision-language
Last synced: 23 Jun 2024
https://github.com/thunlp/OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Last synced: 23 Jun 2024
https://github.com/grammarly/gector
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
bert grammatical-error-correction natural-language-processing nlp roberta sequence-labeling text-simplification transformers xlnet
Last synced: 23 Jun 2024
https://github.com/Qznan/QizNLP
Quick run NLP in many task 快速运行分类、序列标注、匹配、生成等NLP任务的Tensorflow框架 (中文 NLP 支持分布式)
beam-search chinese classification horovod match nlp sequence-labeling sequence-to-sequence tensorflow
Last synced: 23 Jun 2024
https://github.com/ChenghaoMou/pytorch-pQRNN
Implementation of pQRNN in PyTorch
nlp pqrnn pytorch text-classification
Last synced: 23 Jun 2024