{"id":19457664,"url":"https://github.com/liaad/tweet2event-pt","last_synced_at":"2025-02-25T11:40:56.564Z","repository":{"id":71825239,"uuid":"543253250","full_name":"LIAAD/tweet2event-pt","owner":"LIAAD","description":"Dataset linking Portuguese tweets to events, with annotated relevance using BM25","archived":false,"fork":false,"pushed_at":"2022-10-12T15:43:14.000Z","size":426,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-08T01:48:06.894Z","etag":null,"topics":["dataset","tweets"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LIAAD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-29T17:54:18.000Z","updated_at":"2022-11-10T16:34:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"e50e8974-53db-498f-a90d-7919c3e8f326","html_url":"https://github.com/LIAAD/tweet2event-pt","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LIAAD%2Ftweet2event-pt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LIAAD%2Ftweet2event-pt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LIAAD%2Ftweet2event-pt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LIAAD%2Ftweet2event-pt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LIAAD","download_url":"https://codeload.github.com/LIAAD/tweet2event-pt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240662325,"owners_count":19837366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","tweets"],"created_at":"2024-11-10T17:23:20.117Z","updated_at":"2025-02-25T11:40:56.531Z","avatar_url":"https://github.com/LIAAD.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tweet2Event-PT-Dataset\n\nTweet2Event-PT is a Portuguese dataset containing tweets related to events from the Wikipedia page \"2021 in Portugal\".\n\nIn the *dataset* directory, we make available the list of twitter IDs and their relevance (annotated by the BM25 function), as well as a CSV/JSONL file with the 12 events selected for the dataset, with manually curated keywords. We also make available a sample of the dataset (473 tweets) manually annotated with their relevance (0 for non-relevant, 1 for relevant).\n\nIf you want to reproduce our method for retrieving all the events and related tweets, you can follow steps in the sections below. With this method, the keywords will be extracted automatically using YAKE, so the retrieved tweets will be different than the ones in our dataset. Make sure to run the scripts while inside the *reproduction* directory.\n\n\n## Extract events from Wikipedia:\n\n 1. Register in Wikipedia and change and set your username in *userconfig.py*:\n         \n        usernames['wikipedia']['pt'] = 'your-username'\n        usernames['wikinews']['pt'] = 'your-username'\n          \n 2.  To create a CSV with events from the Wikipedia page run:\n\n         python retrieve-events.py\n\nFor more information regarding Pywikibot, the library used to obtain Wikipedia content, check their [manual](https://www.mediawiki.org/wiki/Manual:Pywikibot).\n\n\n## Extract related tweets\n          \n 1.  Set your Twitter API credentials in *.env*\n\n 2.  Retrieve tweets and create a CSV with:\n\n         python retrieve-tweets.py\n \n 3.  Clean the tweets with:\n\n         python clean-tweets.py\n\n\n# Contact\n\nFor more information about the dataset or any problems with it, contact mafalda.r.castro@inesctec.pt\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliaad%2Ftweet2event-pt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliaad%2Ftweet2event-pt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliaad%2Ftweet2event-pt/lists"}