Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/asiff00/text-generation-with-tensorflow-from-scratch

Text Generation with TensorFlow from Scratch
https://github.com/asiff00/text-generation-with-tensorflow-from-scratch

fundamentals jupyter-notebook llm machine-learning natural-language-processing nlp python scratch tensorflow text-generation

Last synced: about 2 months ago
JSON representation

Text Generation with TensorFlow from Scratch

Awesome Lists containing this project

README

        

Text Generation with TensorFlow from Scratch: Understanding the Basics
==================================================================================================================

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://www.kaggle.com/code/asif00/text-generation-with-tensorflow-nlp-rnn) [![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge&logo=TensorFlow&logoColor=white)](https://www.tensorflow.org/) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white)](https://jupyter.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Large language models have come a long way. Modern LLMs can generate almost anything that is humanly possible, and even things that aren't. However, at their core, they are text generators that predict the next likely words in a sequence. This notebook covers the very fundamental concepts of text generation, from corpus to tokenization, embeddings, padding, N-grams, and finally, generating text.

Concepts Covered
----------------

* **Corpus**: A large collection of text that the model is trained on.
* **Tokenization**: The process of breaking down text into smaller pieces, or 'tokens', that the model can understand.
* **Embeddings**: A crucial step that transforms tokens into numerical vectors, allowing the model to capture semantic relationships between words.
* **Padding**: A technique used to ensure that all sequences in a batch have the same length, making it easier for the model to process them.
* **N-grams**: Sequences of 'n' items from a given sample of text, which are used to predict the next word in a sequence.
* **Text Generation**: The process of using a trained model to generate new text.

Simplified Explanations
-----------------------

Each of these concepts is explained in a simple and easy-to-understand manner. The goal of this notebook is to provide a clear understanding of the basics of text generation.

Built with TensorFlow
---------------------

This notebook uses TensorFlow for its easy-to-understand nature. TensorFlow is a popular open-source machine learning framework that makes it easy to build and deploy machine learning models.

Run the Code in Kaggle
----------------------

You can run the code in this notebook on Kaggle at the following link: [Text Generation with TensorFlow NLP RNN](https://www.kaggle.com/code/asif00/text-generation-with-tensorflow-nlp-rnn)

Happy Learning!
---------------

I hope this notebook helps you understand the concepts of text generation better. Enjoy learning!

License
-------

This project is licensed under the MIT License - see the [LICENSE](https://opensource.org/licenses/MIT) file for details.