Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with nlp-machine-learning

A curated list of projects in awesome lists tagged with nlp-machine-learning .

https://github.com/zhaoyingjun/chatbot

ChatGPT带火了聊天机器人,主流的趋势都调整到了GPT类模式,本项目也与时俱进,会在近期更新GPT类版本。基于本项目和自己的语料可以训练出自己想要的聊天机器人,用于智能客服、在线问答、闲聊等场景。

ai chatbot nlp-machine-learning python pytorch seq2seq-chatbot seqgan seqgan-tensorflow tensorflow2

Last synced: 17 Dec 2024

https://github.com/esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 03 Nov 2024

https://github.com/esbatmop/mnbvc

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning

Last synced: 19 Dec 2024

https://github.com/cbamls/ai_tutorial

精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总

artificial-intelligence artificial-intelligence-algorithms deep-learning-tutorial deep-neural-networks elasticsearch graph-neural-networks machine-learning machine-learning-tutorials nlp-machine-learning recommender-systems search-system

Last synced: 20 Dec 2024

https://github.com/chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

buffer covid-19 detection extraction memex mime nlp nlp-library nlp-machine-learning parse parser-interface python recognition text-extraction text-recognition tika-python tika-server tika-server-jar translation-interface usc

Last synced: 17 Dec 2024

https://github.com/dengbocong/nlp-paper

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 21 Dec 2024

https://github.com/DengBoCong/nlp-paper

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 14 Nov 2024

https://github.com/MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

bert embeddings multilingual-models multilingual-topic-models neural-topic-models nlp nlp-library nlp-machine-learning text-as-data topic-coherence topic-modeling transformer

Last synced: 04 Nov 2024

https://github.com/google-research/tapas

End-to-end neural table-text understanding models.

nlp-machine-learning question-answering table-parsing tensorflow

Last synced: 19 Dec 2024

https://github.com/pemistahl/lingua-rs

The most accurate natural language detection library for Rust, suitable for short text and mixed-language text

language-classification language-detection language-identification language-processing language-recognition natural-language-processing nlp nlp-machine-learning rust rust-crate rust-library

Last synced: 17 Dec 2024

https://github.com/veekaybee/what_are_embeddings

A deep dive into embeddings starting from fundamentals

embeddings machine-learning machine-learning-algorithms nlp-machine-learning

Last synced: 30 Oct 2024

https://github.com/bin123apple/autocoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 09 Nov 2024

https://github.com/bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 05 Nov 2024

https://github.com/mila-iqia/babyai

BabyAI platform. A testbed for training agents to understand and execute language commands.

imitation-learning nlp nlp-machine-learning openai-gym reinforcement-learning-environments

Last synced: 16 Dec 2024

https://github.com/michaelthwan/searchGPT

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

ai chatgpt grounded-api grounded-bot language-model llm machine-learning nlp nlp-machine-learning openai python retrieval retrieval-model

Last synced: 11 Nov 2024

https://github.com/howl-anderson/chinese_models_for_spacy

SpaCy 中文模型 | Models for SpaCy that support Chinese

chinese-nlp nlp nlp-dependency-parsing nlp-machine-learning spacy-models

Last synced: 21 Dec 2024

https://github.com/michaelthwan/searchgpt

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

ai chatgpt grounded-api grounded-bot language-model llm machine-learning nlp nlp-machine-learning openai python retrieval retrieval-model

Last synced: 09 Nov 2024

https://github.com/namuan/dr-doc-search

Converse with book - Built with GPT-3

gpt3 langchain nlp-machine-learning python summarization

Last synced: 20 Dec 2024

https://github.com/lpty/nlp_base

自然语言基础模型

nlp-machine-learning

Last synced: 21 Dec 2024

https://github.com/carrychang/customer_satisfaction_analysis

基于在线民宿 UGC 数据的意见挖掘项目,包含数据挖掘和NLP 相关的处理,负责数据采集、主题抽取、情感分析等任务。目的是克服用户打分和评论不一致,实时对在线民宿的满意度评测,包含在线评论采集和情感可视化分析。搭建了百度地图POI查询入口,可以进行自动化的批量查询 POI 信息的功能;构建了基于在线民宿语料的 LDA 自动主题聚类模型,利用主题中心词能找出对应的主题属性字典;以用户打分作为标注,然后 litNlp 自带的字符级 TextCNN 进行情感分析,将情感分类概率分布作为情感趋势,最后通过 POI 热力图的方式对不同地域的民宿满意度进行展示。软件版本请见链接。

customer-satisfaction-analysis litnlp nlp-machine-learning sentiment-analysis

Last synced: 21 Dec 2024

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 13 Nov 2024

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 11 Oct 2024

https://github.com/MAIF/melusine

📧 Melusine: Use python to automatize your email processing workflow

courriels datascience emails natural-language-processing nlp nlp-machine-learning python python3

Last synced: 03 Nov 2024

https://github.com/maif/melusine

📧 Melusine: Use python to automatize your email processing workflow

courriels datascience emails natural-language-processing nlp nlp-machine-learning python python3

Last synced: 15 Dec 2024

https://github.com/bjascob/lemminflect

A python module for English lemmatization and inflection.

inflection lemmatization nlp nlp-machine-learning python spacy spacy-extensions

Last synced: 20 Dec 2024

https://github.com/gmihaila/ml_things

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.

google-colab machine-learning nlp nlp-machine-learning notebooks python-snippets pytorch snippets transformer

Last synced: 15 Dec 2024

https://github.com/abelriboulot/onnxt5

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

inference nlp nlp-machine-learning onnx onnxruntime sentiment-analysis summarization text-classification text-generation transformer transformers translation

Last synced: 07 Nov 2024

https://github.com/DevinZ1993/Chinese-Poetry-Generation

An RNN-based Chinese Poem Generator

nlp-machine-learning tensorflow

Last synced: 14 Nov 2024

https://github.com/neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

google-colab grammar-checks hacktoberfest jupyter kaggle-kernels natural-language-processing nlp nlp-keywords-extraction nlp-library nlp-machine-learning nlp-parsing nlp-profiler profiler profiling profiling-datasets text-mining

Last synced: 21 Dec 2024

https://github.com/prakhar21/TextAugmentation-GPT2

Fine-tuned pre-trained GPT2 for custom topic specific text generation. Such system can be used for Text Augmentation.

gpt-2 natural-language-generation natural-language-processing nlp-machine-learning text-augmentation textclassification transformer-architecture

Last synced: 15 Nov 2024

https://github.com/hamelsmu/ktext

Utilities for preprocessing text for deep learning with Keras

deep-learning keras machine-learning ml-infrastructure nlp-machine-learning python-3

Last synced: 16 Dec 2024

https://github.com/chambliss/multilingual_ner

Applying BERT to named entity recognition in English and Russian.

bert english-language named-entity-recognition nlp-machine-learning pytorch russian-language spacy

Last synced: 19 Dec 2024

https://github.com/Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

coordinates deep-learning geo-location geotagging machine-learning neural-network nlp nlp-machine-learning python pytorch transformers

Last synced: 05 Nov 2024

https://github.com/chewxy/lingo

package lingo provides the data structures and algorithms required for natural language processing

conll-u go golang inflection language-model natural-language-processing nlp nlp-dependency-parsing nlp-library nlp-machine-learning nlp-parsing part-of-speech part-of-speech-tagger

Last synced: 19 Dec 2024

https://github.com/zcgzcgzcg1/MRC_book

《机器阅读理解:算法与实践》代码

machine-reading-comprehension nlp-machine-learning

Last synced: 06 Nov 2024

https://github.com/howl-anderson/microtokenizer

一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

chinese-nlp chinese-tokenizer chinese-word-segmentation dag-network educational-project nlp-machine-learning tokenizer

Last synced: 16 Dec 2024

https://github.com/hamelsmu/seq2seq_tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 27 Oct 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 29 Oct 2024

https://github.com/boudinfl/ake-datasets

Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.

benchmarking datasets information-retrieval keyphrase-extraction keyphrase-generation keyword-extraction natural-language-processing nlp nlp-machine-learning

Last synced: 14 Oct 2024

https://github.com/moritzlaurer/gpt-google-sheets

Code and documentation for running generative LLMs like ChatGPT or GPT4 in google sheets without any coding knowledge. Transform unstructured text to structured data.

chatgpt gpt3 gpt4 nlp nlp-machine-learning

Last synced: 28 Nov 2024

https://github.com/kavgan/phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

collocation-extraction multiword-expressions multiword-extraction natural-language-processing nlp nlp-machine-learning phrase-discovery phrase-extraction pyspark spark

Last synced: 30 Oct 2024

https://github.com/gauravbh1010tt/dl-text

Text pre-processing library for deep learning (Keras, tensorflow).

deep-learning nlp-machine-learning

Last synced: 09 Nov 2024

https://github.com/jxwuyi/atnre

Adversarial Training for Neural Relation Extraction

adversarial-machine-learning nlp-machine-learning relation-extraction tensorflow-experiments

Last synced: 12 Nov 2024

https://github.com/ibrahimsharaf/doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

doc2vec gensim nlp-machine-learning scikit-learn sentiment-analysis text-classification

Last synced: 15 Dec 2024

https://github.com/algorithmica-repository/datascience

It consists of examples, assignments discussed in data science course taken at algorithmica.

algorithms coding-interview-challenges datastructures deep-learning java machine-learning nlp-machine-learning problem-solving python

Last synced: 27 Nov 2024

https://github.com/MoritzLaurer/GPT-google-sheets

Code and documentation for running generative LLMs like ChatGPT or GPT4 in google sheets without any coding knowledge. Transform unstructured text to structured data.

chatgpt gpt3 gpt4 nlp nlp-machine-learning

Last synced: 18 Nov 2024

https://github.com/google-research-datasets/wiki-atomic-edits

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

deep-learning deep-neural-networks nlp nlp-machine-learning wikipedia

Last synced: 08 Nov 2024

https://github.com/ropensci-archive/monkeylearn

:no_entry: ARCHIVED :no_entry: Accesses the Monkeylearn API for Text Classifiers and Extractors

classifier extractor monkeylearn nlp nlp-machine-learning peer-reviewed r r-package rstats

Last synced: 25 Oct 2024

https://github.com/fdalvi/neurox

A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.

explainable-ai natural-language-processing neurons nlp nlp-machine-learning

Last synced: 18 Dec 2024

https://github.com/fdalvi/NeuroX

A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.

explainable-ai natural-language-processing neurons nlp nlp-machine-learning

Last synced: 17 Nov 2024

https://github.com/maxoodf/russian_news_corpus

Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ

articles corpus machine-learning ml nlp nlp-machine-learning russian text word2vec

Last synced: 07 Dec 2024

https://github.com/codelucas/cracking-the-da-vinci-code-with-google-interview-problems-and-nlp-in-python

A guide on how to crack combinatorics puzzles shown in The Da Vinci Code movie using CS fundamentals and NLP

combinatorics interview-questions nlp nlp-machine-learning python

Last synced: 15 Nov 2024

https://github.com/pragativerma18/mlh-quizzet

This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.

css flask-application hackathon html html-css-javascript javascript jinja2 machine-learning mlh mlh-fellowship nlp nlp-machine-learning python question-answering question-generation question-generator quiz upload-file

Last synced: 16 Dec 2024

https://github.com/google-research-datasets/query-wellformedness

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

deep-learning deep-neural-networks information-retrieval nlp nlp-machine-learning search-engine

Last synced: 08 Nov 2024

https://github.com/ars-linguistica/mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

conjugation conjugator devops linguistics machine-learning nlp nlp-library nlp-machine-learning python3 test-driven-development

Last synced: 20 Dec 2024