https://github.com/savinrazvan/attention

Visualize BERT's attention mechanism with a user-friendly script. Input text with a masked token, predict the masked word, and generate attention diagrams to understand BERT's focus. Ideal for AI enthusiasts and NLP researchers.
https://github.com/savinrazvan/attention

ai attention-mechanism bert cs50 data-visualization harvard-cs50 masked-language-model natural-language-processing nlp python transformer

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/savinrazvan/attention
Owner: SavinRazvan
License: mit
Created: 2024-07-30T22:44:55.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-09-19T09:36:31.000Z (10 months ago)
Last Synced: 2024-11-11T16:25:04.059Z (8 months ago)
Topics: ai, attention-mechanism, bert, cs50, data-visualization, harvard-cs50, masked-language-model, natural-language-processing, nlp, python, transformer
Language: Jupyter Notebook
Homepage:
Size: 2.81 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Attention

This project demonstrates how to visualize the attention mechanism of a pre-trained BERT model. The script allows users to input text with a mask token, retrieves top predictions for the mask token, and generates attention diagrams showing the attention weights across different layers and heads.

## Description

This project uses the BERT (Bidirectional Encoder Representations from Transformers) model to predict masked words within a text sequence. BERT is a transformer-based language model developed by Google, trained to predict a masked word based on its surrounding context. The project involves two main tasks:

1. **Masked Word Prediction**: Using the Hugging Face transformers library, the program predicts masked words in an input text. The user inputs text containing a `[MASK]` token, and the program outputs the top predictions for the masked word.

2. **Attention Visualization**: The program generates diagrams visualizing attention scores for each of BERT's 144 attention heads. These diagrams help users understand which words BERT focuses on when making predictions.

## Features

- **Masked Word Prediction**: Predicts the masked word in a text sequence using BERT.
- **Attention Diagrams**: Generates and saves visualizations of attention scores for each attention head across all layers.
- **Configurable Parameters**: Allows customization of the number of predictions, font settings, grid size, and output options.
- **Console Output**: Optionally prints predictions and attention values to the console.
- **Output Directory**: Saves generated diagrams to a specified output folder.

## Goal

The goal of this project is to provide insights into the inner workings of the BERT model by visualizing its attention mechanism. By analyzing the attention diagrams, users can gain a better understanding of what BERT pays attention to when processing language, which can be particularly useful for debugging and improving language models.

## Getting Started

1. **Installation**:
- Clone the repository and navigate to the project directory.
- Create a virtual environment and activate it.
- Install the required packages using `pip install -r requirements.txt`.

2. **Running the Script**:
- Execute the script using `python script.py`.
- Input the text with the `[MASK]` token when prompted.
- View the predictions and attention diagrams as per the configured output options.

## Example Usage

```bash
$ python script.py
Text: We turned down a narrow lane and passed through a small [MASK].
Top Predictions:
1. We turned down a narrow lane and passed through a small field.
2. We turned down a narrow lane and passed through a small clearing.
3. We turned down a narrow lane and passed through a small park.
```

## Configuration Parameters

- **MODEL**: Identifier for the pre-trained masked language model (e.g., `"bert-base-uncased"`).
- **K**: Number of top predictions to generate for the mask token.
- **FONT**: Default font for drawing text on images; can be adjusted as needed.
- **BASE_GRID_SIZE**: Base size of each grid cell in the attention diagram.
- **PIXELS_PER_WORD**: Pixels allocated per word for image dimensions.
- **PRINT_TO_CONSOLE**: If `True`, prints the predictions and attention values to the console.
- **SAVE_TO_OUTPUT_FOLDER**: If `True`, saves the generated diagrams to the output folder.

## License

This project is licensed under the MIT License.

> For further details and updates, you can refer to the [CS50 AI course page](https://cs50.harvard.edu/ai/2024/projects/6/attention/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/savinrazvan/attention

Awesome Lists containing this project

README