{"id":24822256,"url":"https://github.com/lazerlambda/rapmachine","last_synced_at":"2025-03-25T21:23:47.324Z","repository":{"id":44675422,"uuid":"439038993","full_name":"LazerLambda/RapMachine","owner":"LazerLambda","description":"GPT-2-based Rap-Tweet-Bot","archived":false,"fork":false,"pushed_at":"2022-01-31T22:03:24.000Z","size":235,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-30T18:46:02.388Z","etag":null,"topics":["bert","computational-linguistics","data-science","gpt-2","lmu-munich","machine-learning","nlp","statistics","t5","transformer","tweet-bot"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LazerLambda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-16T15:30:06.000Z","updated_at":"2024-05-14T14:13:00.000Z","dependencies_parsed_at":"2022-09-12T13:42:13.545Z","dependency_job_id":null,"html_url":"https://github.com/LazerLambda/RapMachine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazerLambda%2FRapMachine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazerLambda%2FRapMachine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazerLambda%2FRapMachine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazerLambda%2FRapMachine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LazerLambda","download_url":"https://codeload.github.com/LazerLambda/RapMachine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245544217,"owners_count":20632780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","computational-linguistics","data-science","gpt-2","lmu-munich","machine-learning","nlp","statistics","t5","transformer","tweet-bot"],"created_at":"2025-01-30T18:39:58.515Z","updated_at":"2025-03-25T21:23:47.282Z","avatar_url":"https://github.com/LazerLambda.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RM - RapMachine\n\n## Use\n\n - Run `python BotScript.py` for the Tweetbot (This script runs only with `GPT2-rap-recommended`)\n - Run `python src/test_generation.py` with proper parameters to test language generation. \n - To quit the program, use `CTRL` + `C` \n\n\n## Installation\n\n- Install requirements `pip install -r requirements.txt`\n- Alternatively run `./install.sh`\n\n\n## Prerequisites\n\n- Apply for a [Twitter Developer Account with elevated access](https://developer.twitter.com/en)\n- Create an `.env` file including the variables:\n    - `CONSUMER_API_KEY`\n    - `CONSUMER_API_KEY_SECRET`\n    - `ACCESS_TOKEN`\n    - `ACCESS_TOKEN_SECRET`\n    and provide the necessary credentials to each variable.\n- Download [fasttext's language identification model](https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin) and \n  place it in the same folder as this file.\n- Create a folder called `.model` in the same folder as this file and place the proper finetuned GPT-2 model (see Models section) inside it \n  (`.model/GPT2-rap-recommended/config.json pytorch...`). The model is available [here](https://drive.google.com/drive/folders/116WlytHENvyNia_xZr7GxUEym20SjeQn?usp=sharing)\n- Hardware that can deal with GPT-2.\n\n\n## Data Documentation\n \n- We gathered raps from genius.com, ohhla.com and battlerap.com. For genius.com, we used the official [API](https://docs.genius.com/) \n  (GeniusLyrics and GetRankings repos) while genius.com and ohhla.com were scraped using a specifically tailored scrapy scraper.\n  In total we gathered ~70k raps which we used for finetuning. GPT-2 was finetuned by creating one large text, while T5 was finetuned\n  on prompts. The prompts had the form of `KEYWORDS: \u003ckeywords\u003e RAP-LYRICS: \u003crap text\u003e` which proved to be insufficient for our task.\n  Eventually we chosed to use the fine-tuned GPT2 model. Experimental and succeeding scripts can be found in `./preprocessing/finetunging`.\n  Additionaly, a RoBERTa model was finetuned on both data from the english wikipedia, tweets regarding hate speech, the CNN/Dailymail dataset \n  and 4k rap lyrics data (data can be found under `Data`) to classify the quality of the generated raps.\n\n### `preprocessing`\n - `finetuning`\n    - `FineTuneRapMachineExp.ipynb` Experimental script\n    - `FineTuneRapMachineGPT2.ipynb` GPT2 finetuning script\n    - `T5.ipynb` Finetuning Script for T5 on a key2text approach\n    - `keytotext.ipynb` Using the keytotext library for finetuning\n    - `FineTuneRapMachineExp2.ipynb` Another experimental script, in which GPT-J and GPT-NEO were used, yet didn't succeed\n - `data_analysis`\n    - `CreateAdvData.ipynb` Script to create balanced dataset to train the ranker model\n    - `LyricsAnalyye.ipzng` Script to analyze the scraped data.\n - `lyrics_spider`\n    - Includes scrapy program to obtain lyrics\n - `cleaning_and_keywords`\n    - `data_cleaner` Script for removing noise from 70k scraped rap corpus\n    - `kw_extraction` Script that starts building a TF-IDF model either from scratch or from an existing model to generate keywords for rap corpus\n    - `tf_idf` TF-IDF model script\n - `ranker`\n    - `roberta_ranker.ipynb` Roberta finetuning script\n\n### Sources\n - ohhla.com - Scraped \n - BattleRap.com - Scraped \n - Genius.com - Accessed through API, [GeniusLyrics](https://github.com/LazerLambda/GeniusLyrics) and [GetRankings](https://github.com/LazerLambda/GetRankings/) used.\n\n### Genius API\n - To obtain lyrics from genius.com, two programs were implemented which are based on different, yet outdated, repositories.\n    - [GeniusLyrics](https://github.com/LazerLambda/GeniusLyrics)\n    - [GetRankings](https://github.com/LazerLambda/GetRankings)\n - Both programs are part of this project\n\n\n## Models\n - GPT2-rap-recommended [Download](https://drive.google.com/drive/folders/1zl_Zn7hUzsnr7FpdtV9VBo3SmmvM4jQO?usp=sharing) (Necessary to use BotScript.py)\n - GPT2-small-key2text [Download](https://drive.google.com/drive/folders/1FOrFDQgpnnBcSbXfGsBG2RkrjzggEaqx?usp=sharing) (Approach did not work out, trained on 4k corpus)\n - Roberta Ranker [Download](https://drive.google.com/drive/folders/1IztahoA0rfnHZ4dkHIZ46f-Sh1sqxr2e?usp=sharing) (Ranker trained on 8k data with 4k rap corpus and 4k non-rap corpus)\n - T5-large-key2text [Download](https://drive.google.com/drive/folders/1dIsp7LmHwRXng8GX2fs__4JYrjpk-W4D?usp=sharing) (Approach did not work out, trained on 70k corpus)\n - T5-small-key2text [Download](https://drive.google.com/drive/folders/1KyxvhLMDG2z1gCQ9aCSm4TmIL5CXq8Nz?usp=sharing) (Approach did not work out, trained on 4k corpus)\n - tf-idf pickle [Download](https://drive.google.com/drive/folders/1R8HYgaADOhOQ2BdAMEsLryA2XqUxLTMm?usp=sharing) (Approach did not work out, trained on 70k corpus)\n\n## Data\n - Our data can be downloaded [here](https://drive.google.com/drive/folders/1XJ-tnf0VgORbo7qS3rHaXVjsX01nFKuT?usp=sharing)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flazerlambda%2Frapmachine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flazerlambda%2Frapmachine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flazerlambda%2Frapmachine/lists"}