https://github.com/ankitamungalpara/huggingface-nlp
This repository introduces the 🤗 Transformers library, covering Transformer models, fine-tuning on datasets, and result sharing. You’ll learn to handle classic NLP tasks using 🤗 Datasets and 🤗 Tokenizers. Finally, explore advanced applications in speech processing and computer vision, preparing to build and optimize models for diverse ML.
https://github.com/ankitamungalpara/huggingface-nlp
huggingface-transformers nlp python3 pytorch tensorflow
Last synced: 2 months ago
JSON representation
This repository introduces the 🤗 Transformers library, covering Transformer models, fine-tuning on datasets, and result sharing. You’ll learn to handle classic NLP tasks using 🤗 Datasets and 🤗 Tokenizers. Finally, explore advanced applications in speech processing and computer vision, preparing to build and optimize models for diverse ML.
- Host: GitHub
- URL: https://github.com/ankitamungalpara/huggingface-nlp
- Owner: AnkitaMungalpara
- Created: 2024-08-06T21:50:47.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-26T18:57:35.000Z (over 1 year ago)
- Last Synced: 2025-02-26T16:49:44.859Z (over 1 year ago)
- Topics: huggingface-transformers, nlp, python3, pytorch, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 258 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Natural Language Processing (NLP) with Hugging Face
This repository provides a comprehensive guide to using 🤗 Transformers.
## Modules
### Module 1: Fundamentals of 🤗 Transformers: From Basics to Fine-Tuning
Learn the fundamental concepts of the 🤗 Transformers library, including how Transformer models function. By the end of this section, you’ll know how to utilize a model from the Hugging Face Hub, fine-tune it on a dataset, and share your results.
- [01. Introduction to Transformers Pipeline](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/00_Transformers_Pipeline_Introduction.ipynb)
- [02. Transformer Pipelines: Behind the Scenes](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/01_Behind_the_scenes_pipeline.ipynb)
- [03. Models and Tokenizers](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/02_Transformers_Models_and_Tokenizers.ipynb)
- [04. Handling Multiple Sequences](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/03_Handling_Multiple_Sequences_Transformers.ipynb)
### Module 2: Fundamentals of 🤗 Datasets and Tokenizers for NLP
This module covers the foundational concepts of working with 🤗 Datasets and 🤗 Tokenizers, preparing to independently solve common NLP tasks.
- [01. Processing Large Data](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/04_Processing_Data_Hugging_Face_Transformers.ipynb)
- [02. Full Training with GPU and Accelerator](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/06_Full_Training_HuggingFace_Transformers.ipynb)
- [03. Datasets in HuggingFace](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/07_Datasets_in_HuggingFace.ipynb)
- [04. Semantic Search with FAISS](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/08_Semantic_Search_with_FAISS.ipynb)
Section
Description
Links
Using embeddings for semantic search
Introduction to building a semantic search engine using embeddings.
Transformers Documentation
Loading and Preparing Dataset
Loading the GitHub Issues dataset and filtering out pull requests to focus on issues with comments.
GitHub Issues Dataset
Creating Text Embeddings
Using the sentence-transformers library to create embeddings for text data, with a focus on pooling techniques.
Sentence-Transformers Documentation
Using FAISS for Efficient Similarity Search
Implementing FAISS to create an index for fast similarity searches on the embeddings and conducting nearest neighbor searches.
FAISS Documentation
- [05. Training New Tokenizer](https://github.com/AnkitaMungalpara/HuggingFace-NLP/blob/main/09_Training_new_tokenizer.ipynb)