https://github.com/truongd3/simple-language-models

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/truongd3/simple-language-models
Owner: truongd3
Created: 2023-08-10T20:08:15.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-08-10T20:35:15.000Z (almost 2 years ago)
Last Synced: 2025-01-22T02:31:43.079Z (4 months ago)
Language: Jupyter Notebook
Size: 790 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Simple Language Models

**Simple language model** is a model that predicts the next unit of text. A statistical model known as a language model is employed to estimate **the likelihood of a word sequence** appearing in a particular language. This model is trained on extensive text collections and serves the purpose of generating fresh text or assessing the probability of a specified word sequence.

For example, the system receives the *characters*, *words*, and *sentences* as inputs, then calculate the probability distribution so as to give the outputs as the predicting *characters*, *words*, and *sentences*. It is easy to understand but it's not easy to build (personally hehe).

Below are some general stages for training a language model:

- Process and sanitize the text data.
- Convert the textual content into a numerical representation.
- Establish the model's structure: pick the neural network type, and set parameters like layer count, hidden units, and activation functions.
- Specify the loss function: options include max-margin, negative log-likelihood, cross-entropy loss, RMSE, etc.
- Train the model: initialize model parameters, define the optimizer, and execute the training loop for a specific number of epochs to minimize the loss function.
- Adjust parameters: fine-tune hyperparameters or the model itself based on gradients derived from the loss function.
- Assess the model's performance.
- Generate novel text.

## Project's Details:

- In **Part 1**, the requirement is to build a neural network named **Multi Layered Perception (MLP)** using `JAX Autodiff`. This model follows a concept of updating the parameters by calculating their gradients. The `Autodiff` is calculated by the chain rule when I step-by-step dot-mutiplied the matrices of weights and biases.

- In **Part 2**, the requirement is to build a **Bigram Language Model**. Again, as mentioned, the concept is easy to understand but very challenging to implement. **Bigram Language Model** is counting the frequency of **2** adjacent characters (or words) in the corpus then calculate the probability distribution. After that, we are able to figure out other characters (or words) to create more of them.

## Bi-character Frequency

![img2](img/img2bilettersfreq.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/truongd3/simple-language-models

Awesome Lists containing this project

README