{"id":20340847,"url":"https://github.com/taylor-eos/lstm-classifier","last_synced_at":"2026-05-28T13:05:05.245Z","repository":{"id":257878557,"uuid":"871294258","full_name":"Taylor-eOS/lstm-classifier","owner":"Taylor-eOS","description":"Machine learning experiment to classify audio segments","archived":false,"fork":false,"pushed_at":"2024-10-23T14:37:20.000Z","size":226,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-24T23:08:08.034Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Taylor-eOS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-11T16:48:39.000Z","updated_at":"2024-10-23T14:38:06.000Z","dependencies_parsed_at":"2024-10-26T03:18:14.198Z","dependency_job_id":null,"html_url":"https://github.com/Taylor-eOS/lstm-classifier","commit_stats":null,"previous_names":["taylor-eos/sltm-classifier"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Taylor-eOS/lstm-classifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Flstm-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Flstm-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Flstm-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Flstm-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Taylor-eOS","download_url":"https://codeload.github.com/Taylor-eOS/lstm-classifier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Flstm-classifier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33609267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T21:24:00.512Z","updated_at":"2026-05-28T13:05:05.226Z","avatar_url":"https://github.com/Taylor-eOS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## LSTM-Classifier\n\nThis was a exploration project meant to get familiar with machine learning and neural network programming. The script uses a LSTM (Long Short-Term Memory) architecture to classify chunks of audio into two content types. The project extracts and process MFCC features from audio files in order to train a neural net to learn the distinctive patterns that distinguish the types of input data. It can then predict what type a unseen chunk of audio is. It is a complete reassembly of a project from [amsterg](https://github.com/amsterg/Podcast-Ad-Detection), to whom I concede the creative credit.\\\nThis version is optimized for home computers. It is less fine-grained and uses lower qualities than the original. Training is fast and can be done in non-painful time on a CPU. Most of the time it takes is actually to convert the files to wav, since the MFCC extraction had problems with mp3 input. My tests show near-100% accurate predictions on unseen data after just a few training epochs and with only five input episodes.\\\nThe project contains tools for labeling ground truth data, but these require some time investment to be used correctly.\\\nNote: This is a hobby project. It consists of crude source code that is not set up to be user friendly. The functions are made specific to a particular use case and include hardcoded values that may not work with all inputs. The script may at times not check for deviations from the expected use. To get this to work, it would require understanding the source code and manually setting up and adapting the code for your content.\\\nThis project is an excellent simple example of neural net technology in use, that can be used for studying the technology.\n\n### Instructions\n\n1. **Creating ground truth:**\n   - Put your unsegmented wav files and a space-separated `segments.txt` containing the ground truth timestamps into the `input` folder. These labels can be created using `tool_manually_label_audio.py`.\n   - Run `tool_cut_segments.py` to split your files up according to the labels.\n   - Run `tool_separate_chunks.py` to separate your files into training-sized chunks. The script will randomly set aside a given proportion of your files as a validation set and move the files into the right folders.\n   - Optionally you can use `tool_shuffle_more_files` to synthesize more files to even out the amount of `A` and `B` types.\n   \n2. **Training the Model:**\n   - Run the training script:\n     ```bash\n     python main.py\n     ```\n   - If everything was placed correctly, the model will be trained and saved.\n\n3. **Running Inference:**\n   - To classify a single audio chunk for testing functionality, use the inference mode:\n     ```bash\n     python main.py file.wav\n     ```\n   - The output will indicate whether the file is classified as `A` or `B`.\n\n4. **Batch Inference for Evaluation:**\n   - To evaluate multiple chunks at once and get a percentage accuracy value for model evaluation, place your wav files in the `eval` directory with an `_a` or `_b` at the end of the file name. Run:\n     ```bash\n     python tool_evaluate.py\n     ```\n   - The script will run through all inferences, compare the results to the labels in the file, and output the accuracy of the classification for all files in the evaluation set. You can use this to optimize the model parameters.\n\nTransformer distillation currently doesn't work because of dimension mismatches. But there is a file to train a transformer from ground truth.\n\n### Requirements\n- Python 3\n- torch\n- torchvision\n- librosa\n- numpy\n- pydub\n- pygame (tools only)\n- Others I might have forgotten (you will know from the errors)\n\nDependencies can be installed via:\n  ```bash\n  pip install -r requirements.txt\n  ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaylor-eos%2Flstm-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaylor-eos%2Flstm-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaylor-eos%2Flstm-classifier/lists"}