{"id":29026099,"url":"https://github.com/mtg/music-ner","last_synced_at":"2025-10-19T03:08:38.333Z","repository":{"id":53533179,"uuid":"192496847","full_name":"MTG/music-ner","owner":"MTG","description":" Musical Named Entity Recognition System for Twitter ","archived":false,"fork":false,"pushed_at":"2021-03-25T22:42:08.000Z","size":129747,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"mtg_branch","last_synced_at":"2024-04-15T00:15:00.073Z","etag":null,"topics":["entity-recognition","information-extraction","music-information-retrieval","natural-language-processing","trompa"],"latest_commit_sha":null,"homepage":"https://github.com/LPorcaro/musicner","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MTG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-18T08:18:33.000Z","updated_at":"2023-09-03T09:44:58.000Z","dependencies_parsed_at":"2022-08-20T13:20:51.922Z","dependency_job_id":null,"html_url":"https://github.com/MTG/music-ner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MTG/music-ner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2Fmusic-ner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2Fmusic-ner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2Fmusic-ner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2Fmusic-ner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MTG","download_url":"https://codeload.github.com/MTG/music-ner/tar.gz/refs/heads/mtg_branch","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2Fmusic-ner/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262003990,"owners_count":23243358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entity-recognition","information-extraction","music-information-retrieval","natural-language-processing","trompa"],"created_at":"2025-06-26T05:08:25.079Z","updated_at":"2025-10-19T03:08:38.318Z","avatar_url":"https://github.com/MTG.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Recognizing Musical Entities in User-generated Content\n\nWe present a novel method for detecting musical entities from user-generated content, modelling linguistic features with statistical models and extracting contextual information from a radio schedule.  We analyzed tweets related to a classical music radio station, integrating its schedule to connect users' messages to tracks broadcasted. \n\nThis repository contains code to reproduce the results of our [arXiv paper](https://arxiv.org/abs/1904.00648).\n\n#### Reference:\n\u003e Lorenzo Porcaro, Horacio Saggion (2019). Recognizing Musical Entities in User-generated Content. Paper presented at the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2019, University of La Rochelle, La Rochelle, 7-13 April.\n\n#### Contact:\n\u003elorenzo.porcaro at gmail.com\n\n\n## Reproduce our results\n\n#### Installation:\nCreate a python 2.7 (sorry!) virtual environment and install dependencies `pip install -r src/requirements.txt`\n\n#### Update config file:\nUpdate the file `etc/config.yaml`, insert your consumer key, consumer secret, access token, access secret from the Twitter API. More info about the API: https://developer.twitter.com/\n\n#### Import data:\nTo receive the data for reproducing the experiment, please contact `lorenzo.porcaro at gmail.com`. Once received, go to the data [README](https://github.com/LPorcaro/musicner/tree/master/data) page for more info.\n\n#### Pre-process data:\nTo pre-process the data, run:\n\n`python src/hydrate_tweet.py -i ../path/to/input/file.json`\n\nIt will read the tweet IDs and related annotations from the input file, and create the following output files\n1) **INPUTFILE_entities.csv**: list of entities annotated\n2) **INPUTFILE_summary.csv**: tweets summary information (creation date, raw text, etc)\n3) **INPUTFILE_text_tkn.txt**: tweet raw texts tokenized\n\n#### Extract features:\nTo extract the required features from the data, run:\n\n`python src/extract_features.py -i ../path/to/INPUTFILE_summary.csv -e ../path/to/INPUTFILE_entities.csv -o ../path/to/OUTPUTFILE_WEKA.csv -n ../path/to/OUTPUTFILE_biLSTM_CRF.csv`\n\nIt extracts several features from the input tweets for performing the experiments. It takes as input the **INPUTFILE_summary.csv** and **INPUTFILE_entities.csv**, and it creates two output files: one which can be used as input in [WEKA](https://www.cs.waikato.ac.nz/ml/weka/), and one which can be used as input in this [BiLSTM-CNN-CRF architecture for sequence tagging implementation](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf)\n\n#### Schedule  matching:\nTo run the matching against the schedule, run\n\n`python src/schedule_matcher.py -w work_tsl -c contr_tsl -t time_tsl -i ../path/to/UGC_INPUTFILE_summary.csv -s ../path/to/SCHEDULE_INPUTFILE_summary.csv`\n\nIt searches for matches between entities annotated in the schedule and  user-generated tweets. It writes the results in a text file in CoNLL format. The input parameters are the input summary files and the thresholds:\n- time_tsl (int): time-distance threshold (in seconds) between schedule tweet and user-generated tweet\n- work_tsl (float): string similarity threshold for Musical Work entities\n- contr_tsl (float): string similarity threshold for Contributor entities\n\nThe output file is written in `results/schedule_matcher_%s_%s_%s.txt`, where the %s in the file path are the values used for the thresholds. \n\nFor evaluating the results obtained from the schedule matching, run \n\n`src/conlleval \u003c results/schedule_matcher_%s_%s_%s.txt \u003e results/score.schedule_matcher_%s_%s_%s.txt`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtg%2Fmusic-ner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmtg%2Fmusic-ner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtg%2Fmusic-ner/lists"}