{"id":22423114,"url":"https://github.com/giuseppebellamacina/little_language_model","last_synced_at":"2025-04-13T14:40:38.535Z","repository":{"id":262713920,"uuid":"888120971","full_name":"GiuseppeBellamacina/Little_Language_Model","owner":"GiuseppeBellamacina","description":"Implementation of a Transformer and training on Dante's Divina Commedia","archived":false,"fork":false,"pushed_at":"2024-11-13T21:18:34.000Z","size":269,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-01T10:42:30.254Z","etag":null,"topics":["llm","pythorch","transformer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GiuseppeBellamacina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-13T21:12:48.000Z","updated_at":"2025-01-19T23:32:26.000Z","dependencies_parsed_at":"2024-11-13T22:24:13.908Z","dependency_job_id":"bf1b11e2-f81f-4301-9b7c-b1c05ec16c8c","html_url":"https://github.com/GiuseppeBellamacina/Little_Language_Model","commit_stats":null,"previous_names":["giuseppebellamacina/little_language_model"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GiuseppeBellamacina%2FLittle_Language_Model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GiuseppeBellamacina%2FLittle_Language_Model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GiuseppeBellamacina%2FLittle_Language_Model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GiuseppeBellamacina%2FLittle_Language_Model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GiuseppeBellamacina","download_url":"https://codeload.github.com/GiuseppeBellamacina/Little_Language_Model/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245791918,"owners_count":20672666,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","pythorch","transformer"],"created_at":"2024-12-05T18:09:19.179Z","updated_at":"2025-03-27T05:33:00.800Z","avatar_url":"https://github.com/GiuseppeBellamacina.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LittleLanguageModel\n\nLittleLanguageModel is a Transformer-based language model, designed for text generation and sequence modeling tasks. This repository contains the implementation of a transformer architecture, with a specific instance called DanteGPT, trained on a dataset of Italian text.\n\n## Overview\n\nThe model is based on the Transformer architecture, utilizing multiple layers of self-attention heads and feedforward neural networks to generate human-like text. DanteGPT is a variant of LittleLanguageModel, specifically trained to generate text inspired by Dante Alighieri’s *Divine Comedy*. \n\n### Features:\n- **Transformer Architecture**: Multiple attention heads and layers to handle complex relationships within the input sequence.\n- **Text Generation**: Capability to generate text based on an initial prompt using a temperature-controlled sampling technique.\n- **Training**: The model is trained on a large corpus of Italian text to capture language structure and style.\n\n## Installation\n\nTo set up the environment, clone the repository and install the dependencies:\n\n```bash\ngit clone https://github.com/GiuseppeBellamacina/LittleLanguageModel.git\ncd LittleLanguageModel\npip install -r requirements.txt\n```\n\n## Usage\n\n### Training the Model\n\nTo train the model, use the `train_model()` function with the desired parameters. This function will train the model on a dataset, periodically evaluating and logging the loss.\n\n```python\nfrom model import LittleLanguageModel\nfrom train import train_model\nimport torch\n\nfile = 'file.txt'\ntext = open(file, 'r', encoding='utf-8').read()\nvocab = sorted(list(set(text)))\n\nencode = lambda s: [vocab.index(c) for c in s]\ndecode = lambda l: \"\".join([vocab[c] for c in l]) \n\n# Split the dataset into training and validation sets\nx = int(0.9*len(text))\ntext = torch.tensor(encode(text), dtype=torch.long)\ntrain, val = text[:x], text[x:]\n\n# Define the model parameters\nvocab_size = len(vocab)  # Based on the dataset\nembed_size = 512\nnum_heads = 8\nhead_size = embed_size // num_heads\nnum_layers = 6\nblock_size = 128  # Example sequence length\nbatch_size = 64\n\n# Find the device (GPU or CPU)\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n\n# Instantiate the model\nmodel = LittleLanguageModel(vocab_size, head_size, embed_size, block_size, num_heads, num_layers, device).to(device)\noptimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)\n\n# Train the model\nEPOCHS = 5000\ntrain_losses, val_losses = train_model(model, train, val, block_size, batch_size, device, optimizer, EPOCHS)\n```\n\n### Generating Text\n\nTo generate text using the trained model, you can use the `generate()` method. You provide an initial prompt and the number of tokens to generate.\n\n```python\n# Example for generating text\nids = torch.tensor(\n    encode(\"Nel mezzo del cammin di nostra vita\"),\n    dtype=torch.long\n).unsqueeze(0).to(device)\n\ngenerated_ids = model.generate(\n    ids,\n    max_new_tokens=2000,\n    temperature=0.8\n)\n\nprint(decode(generated_ids[0].tolist()))\n```\n\n### Available Functions:\n- **`train_model()`**: Trains the model on the training dataset and evaluates on the validation set.\n- **`generate()`**: Generates text based on an initial prompt.\n\n## Model Architecture\n\n- **Embedding Layer**: Encodes input tokens into dense vectors.\n- **Positional Encoding**: Adds positional information to input tokens to maintain the sequential nature of the data.\n- **Multi-Head Attention**: Multiple self-attention heads to capture different types of relationships in the data.\n- **Feedforward Layers**: A fully connected neural network for further learning.\n- **Layer Normalization**: Normalizes the input to each layer to speed up training and improve stability.\n\n## Training Data\n\nDanteGPT is trained on a text corpus that includes a selection of classical Italian literature, primarily focusing on Dante Alighieri’s *Divine Comedy*.\n\n## Acknowledgments\n\n- The Transformer model architecture is based on the paper: [\"Attention is All You Need\"](https://arxiv.org/abs/1706.03762).\n- Special thanks to the open-source community for the contributions that made this project possible.\n\n---\n\nYou can customize the sections (especially the usage and requirements) based on the specific details of your project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiuseppebellamacina%2Flittle_language_model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgiuseppebellamacina%2Flittle_language_model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiuseppebellamacina%2Flittle_language_model/lists"}