{"id":18894260,"url":"https://github.com/takelab/ashnee","last_synced_at":"2026-01-28T15:01:48.080Z","repository":{"id":175531302,"uuid":"653527389","full_name":"TakeLab/ashnee","owner":"TakeLab","description":"Automatically Scraped Hard News Event Extraction dataset.","archived":false,"fork":false,"pushed_at":"2023-06-17T14:17:03.000Z","size":2876,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-31T16:42:44.323Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TakeLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-14T08:19:47.000Z","updated_at":"2023-06-14T08:19:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"ac0e6cd3-09f8-4f5c-9788-e3426c3491ad","html_url":"https://github.com/TakeLab/ashnee","commit_stats":null,"previous_names":["takelab/ashnee"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TakeLab/ashnee","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TakeLab%2Fashnee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TakeLab%2Fashnee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TakeLab%2Fashnee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TakeLab%2Fashnee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TakeLab","download_url":"https://codeload.github.com/TakeLab/ashnee/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TakeLab%2Fashnee/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28846083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T13:02:32.985Z","status":"ssl_error","status_checked_at":"2026-01-28T13:02:04.945Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T08:20:29.531Z","updated_at":"2026-01-28T15:01:48.072Z","avatar_url":"https://github.com/TakeLab.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ashnee\n\n**WIP. Additional information and code will be added soon.**\n\n**A**utomatically **S**craped **H**ard **N**ews **E**vent **E**xtraction dataset.\n\n## Statistics\n\nThe dataset contains $2279$ articles in total, spread across $26$ hard-news\nevent types and an additional class *Other*. The table below shows the number of\ndocuments for each event type.\n\n| **Event Type**                    | **#Documents** | **Event Type**              | **#Documents** |\n| :-------------------------------- | :------------: | :-------------------------- | :------------: |\n| Air crash                         |       55       | Mass Poisoning              |       7        |\n| Armed Conflict                    |       76       | Military Exercise           |       70       |\n| Bank Robbery                      |       7        | Mine Collapses              |       4        |\n| Disease Outbreaks                 |       59       | Mudslides                   |       21       |\n| Droughts                          |       18       | Other                       |      1229      |\n| Earthquakes                       |       56       | Protest_Online Condemnation |       68       |\n| Environment Pollution             |       39       | Regime Change               |       2        |\n| Famine                            |       12       | Riot                        |       16       |\n| Financial Crisis                  |       27       | Road Crash                  |       86       |\n| Fire                              |       77       | Shipwreck                   |       37       |\n| Floods                            |       84       | Strike                      |       65       |\n| Gas explosion                     |       23       | Train collisions            |       6        |\n| Hurricanes_Tornado_Storm_Blizzard |       98       | Tsunamis                    |       0        |\n| Insect Disaster                   |       24       | Volcano Eruption            |       13       |\n\n## Data sources\n\nFor majority of articles you can find the url in the `ashnee_url.csv` file.\n\nArticles were mainly scraped from the following portals/domains: *dailymail.co.uk*,\n*thewest.com.au*, *bbc.com*, **allafrica.com*, *thetimes.co.uk*, *nzherald.co.nz*,\n*indiatimes.com*, *sputniknews.com*, *indepedent.co.uk*, *9news.com.au*,\n*inquirer.net*, *theguardian.com*, *mb.com.ph*, *punchng.com*, *thestar.com.my*,\n*sott.net*, and *news.com.au*.\n\nMost articles were published between 2019. and 2022.\n\n## Models\n\nList of models we fine-tuned for event detection: [roberta-base](https://huggingface.co/roberta-base), [roberta-large](https://huggingface.co/roberta-large), [deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base), [deberta-large](https://huggingface.co/microsoft/deberta-v3-large), [distilroberta-base](https://huggingface.co/distilroberta-base), and [albert-base-v2](https://huggingface.co/albert-base-v2).\n\nList of models we fine-tuned for argument extraction: [roberta-base](https://huggingface.co/deepset/roberta-base-squad2), [roberta-large](https://huggingface.co/deepset/roberta-large-squad2), [deberta-v3-base](https://huggingface.co/deepset/deberta-v3-base-squad2), [deberta-v3-large](https://huggingface.co/deepset/deberta-v3-large-squad2), [distilroberta-base](https://huggingface.co/squirro/distilroberta-base-squad_v2), and [albert-base-v2](https://huggingface.co/squirro/albert-base-v2-squad_v2).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftakelab%2Fashnee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftakelab%2Fashnee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftakelab%2Fashnee/lists"}