An open API service indexing awesome lists of open source software.

https://github.com/ankitamungalpara/huggingface-nlp

This repository introduces the 🤗 Transformers library, covering Transformer models, fine-tuning on datasets, and result sharing. You’ll learn to handle classic NLP tasks using 🤗 Datasets and 🤗 Tokenizers. Finally, explore advanced applications in speech processing and computer vision, preparing to build and optimize models for diverse ML.
https://github.com/ankitamungalpara/huggingface-nlp

huggingface-transformers nlp python3 pytorch tensorflow

Last synced: 2 months ago
JSON representation

This repository introduces the 🤗 Transformers library, covering Transformer models, fine-tuning on datasets, and result sharing. You’ll learn to handle classic NLP tasks using 🤗 Datasets and 🤗 Tokenizers. Finally, explore advanced applications in speech processing and computer vision, preparing to build and optimize models for diverse ML.

Awesome Lists containing this project

README

          

# Natural Language Processing (NLP) with Hugging Face

This repository provides a comprehensive guide to using 🤗 Transformers.

## Modules

### Module 1: Fundamentals of 🤗 Transformers: From Basics to Fine-Tuning

Learn the fundamental concepts of the 🤗 Transformers library, including how Transformer models function. By the end of this section, you’ll know how to utilize a model from the Hugging Face Hub, fine-tune it on a dataset, and share your results.

- [01. Introduction to Transformers Pipeline](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/00_Transformers_Pipeline_Introduction.ipynb)

- [02. Transformer Pipelines: Behind the Scenes](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/01_Behind_the_scenes_pipeline.ipynb)

- [03. Models and Tokenizers](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/02_Transformers_Models_and_Tokenizers.ipynb)

- [04. Handling Multiple Sequences](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/03_Handling_Multiple_Sequences_Transformers.ipynb)

### Module 2: Fundamentals of 🤗 Datasets and Tokenizers for NLP

This module covers the foundational concepts of working with 🤗 Datasets and 🤗 Tokenizers, preparing to independently solve common NLP tasks.

- [01. Processing Large Data](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/04_Processing_Data_Hugging_Face_Transformers.ipynb)

- [02. Full Training with GPU and Accelerator](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/06_Full_Training_HuggingFace_Transformers.ipynb)

- [03. Datasets in HuggingFace](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/07_Datasets_in_HuggingFace.ipynb)

- [04. Semantic Search with FAISS](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/08_Semantic_Search_with_FAISS.ipynb)




Section
Description
Links




Using embeddings for semantic search
Introduction to building a semantic search engine using embeddings.
Transformers Documentation


Loading and Preparing Dataset
Loading the GitHub Issues dataset and filtering out pull requests to focus on issues with comments.
GitHub Issues Dataset


Creating Text Embeddings
Using the sentence-transformers library to create embeddings for text data, with a focus on pooling techniques.
Sentence-Transformers Documentation


Using FAISS for Efficient Similarity Search
Implementing FAISS to create an index for fast similarity searches on the embeddings and conducting nearest neighbor searches.
FAISS Documentation




- [05. Training New Tokenizer](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/09_Training_new_tokenizer.ipynb)