{"id":50938099,"url":"https://github.com/simula/datasets.simula.no","last_synced_at":"2026-06-17T11:03:46.288Z","repository":{"id":50633096,"uuid":"460213049","full_name":"simula/datasets.simula.no","owner":"simula","description":"Public datasets published by Simula.","archived":false,"fork":false,"pushed_at":"2026-06-09T11:13:52.000Z","size":49773,"stargazers_count":21,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-06-09T13:10:56.570Z","etag":null,"topics":["artificial-intelligence","machine-learning","open-datasets","research"],"latest_commit_sha":null,"homepage":"https://datasets.simula.no","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simula.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-02-16T23:30:12.000Z","updated_at":"2026-06-09T11:14:00.000Z","dependencies_parsed_at":"2025-10-02T12:24:47.275Z","dependency_job_id":null,"html_url":"https://github.com/simula/datasets.simula.no","commit_stats":{"total_commits":75,"total_committers":3,"mean_commits":25.0,"dds":0.09333333333333338,"last_synced_commit":"9d8f4480709b89b48275ef6e166a481eb476ed25"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simula/datasets.simula.no","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fdatasets.simula.no","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fdatasets.simula.no/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fdatasets.simula.no/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fdatasets.simula.no/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simula","download_url":"https://codeload.github.com/simula/datasets.simula.no/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fdatasets.simula.no/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34445186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","machine-learning","open-datasets","research"],"created_at":"2026-06-17T11:03:45.513Z","updated_at":"2026-06-17T11:03:46.282Z","avatar_url":"https://github.com/simula.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# datasets.simula.no\nA collection of open datasets published by Simula Research Laboratory and SimulaMet.\n\nCurrently, we have published the following datasets: \n\n**Medical and Biology Datasets**\n* Cellular, A cell autophagy dataset. [[project](https://github.com/simula/cellular)]\n* Depresjon, The Depresjon Dataset. [[publication](https://dl.acm.org/doi/10.1145/3204949.3208125) | [project](https://datasets.simula.no/depresjon/)]\n* GastroVision, A multicenter dataset. [[publication](https://arxiv.org/abs/2307.08140) | [project](https://github.com/DebeshJha/GastroVision)]\n* HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [[publication](https://link.springer.com/chapter/10.1007/978-3-030-67835-7_17) | [project](https://osf.io/4dnh8/)]\n* HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [[publication](https://dl.acm.org/doi/10.1145/3458305.3478454) | [project](https://github.com/simula/hyperaktiv)]\n* HyperKvasir, The Largest Gastrointestinal Dataset. [[publication](https://www.nature.com/articles/s41597-020-00622-y) | [project](https://github.com/simula/hyper-kvasir)]\n* Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [[publication](https://doi.org/10.1145/3083187.3083212) | [project](https://datasets.simula.no/kvasir/)]\n* Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [[publication](https://www.nature.com/articles/s41597-021-00920-z) | [project](https://github.com/simula/kvasir-capsule)]\n* Kvasir Instrument, A gastrointestinal instrument Dataset. [[publication](https://doi.org/10.1007/978-3-030-67835-7_19) | [project](https://osf.io/kp6my/)]\n* Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [[publication](https://dl.acm.org/doi/10.1007/978-3-030-37734-2_37) | [project](https://datasets.simula.no/kvasir-seg/)]\n* Kvasir-VQA, A Text-Image Pair GI Tract Dataset. [[publication](https://doi.org/10.1145/3689096.3689458) | [project](https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA)]\n* Kvasir-VQA-x1, A Large-Scale Multi-Task Benchmark for GI Tract Visual Question Answering. [[publication](https://doi.org/10.1007/978-3-032-08009-7_6) | [project](https://github.com/simula/Kvasir-VQA-x1)]\n* KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [[publication](https://arxiv.org/abs/2104.11138) | [project](https://github.com/DebeshJha/NanoNet)]\n* MedMultiPoints, A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging. [[publication](https://arxiv.org/abs/2505.16647) | [project](https://github.com/Simula/PointDetectCount)]\n* Medico Multimedia - VISEM Tracking, A sperm tracking dataset. [[publication](https://doi.org/10.1145/3304109.3325814) | [project](https://multimediaeval.github.io/editions/2022/)]\n* Nerthus, A Bowel Preparation Quality Video Dataset. [[publication](https://doi.org/10.1145/3083187.3083216) | [project](https://datasets.simula.no/nerthus/)]\n* Psykose, A Motor Activity Database of Patients with Schizophrenia. [[publication](https://ieeexplore.ieee.org/document/9182896) | [project](https://osf.io/dgjzu/)]\n* VISEM, A Multimodal Video Dataset of Human Spermatozoa. [[publication](https://dl.acm.org/doi/10.1145/3304109.3325814) | [project](https://datasets.simula.no/visem/)]\n* VISEM QC, A sperm quality control dataset. [[project](https://datasets.simula.no/visem-qc/)]\n\n**Sport and Activity Datasets**\n* Alfheim, Soccer video and player position dataset. [[publication](https://dl.acm.org/doi/10.1145/2557642.2563677) | [project](https://datasets.simula.no/alfheim/)]\n* Arx, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [[publication](https://ieeexplore.ieee.org/abstract/document/8877417/) | [project](https://datasets.simula.no/arx/)]\n* ExposureEngine, Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts. [[project](https://huggingface.co/datasets/SimulaMet-HOST/ExposureEngine)]\n* Heimdallr, A Dataset For Sport Analysis. [[project](https://datasets.simula.no/heimdallr/)]\n* HockeyAI, A Multi-Class Ice Hockey Dataset for Object Detection. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718335) | [project](https://github.com/acmmmsys/2025-HockeyAI)]\n* HockeyOrient, A Dataset for Ice Hockey Player Orientation Classification. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718342) | [project](https://github.com/acmmmsys/2025-HockeyOrient)]\n* HockeyRink, A Dataset for Precise Ice Hockey Rink Keypoint Mapping and Analytics. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718338) | [project](https://github.com/acmmmsys/2025-HockeyRink)]\n* PMData, A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys. [[publication](https://dl.acm.org/doi/10.1145/3339825.3394926) | [project](https://osf.io/vx4bk/)]\n* ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset. [[project](https://osf.io/v5acr/)]\n* Soccer Summarization, Soccer game captions and summary in English for game summarization. [[publication](https://dl.acm.org/doi/10.1145/3552463.3557019) | [project](https://github.com/simula/soccer-summarization)]\n* SoccerChat, A Multimodal Video-Text Dataset for Natural Language Soccer Game Understanding. [[publication](https://arxiv.org/abs/2505.16630) | [project](https://github.com/simula/SoccerChat)]\n* SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams. [[project](https://osf.io/uryz9/)]\n* SoccerNet-Echoes, A Soccer Game Audio Commentary Dataset. [[publication](https://arxiv.org/abs/2405.07354) | [project](https://github.com/SoccerNet/sn-echoes)]\n* SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch. [[publication](https://doi.org/10.1145/3625468.3652180) | [project](https://github.com/simula/SoccerSum)]\n* TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos. [[publication](https://doi.org/10.1145/3625468.3652166) | [project](https://github.com/simula/tacdec)]\n\n**Other Datasets**\n* Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [[publication](https://datasets.simula.no/ao/mmsys2012-dataset.pdf) | [project](https://datasets.simula.no/ao/)]\n* European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [[publication](https://www.mdpi.com/2504-2289/5/4/62/pdf) | [project](https://osf.io/kqdgx/)]\n* Eye Tracker, A Serious Game Based Dataset. [[publication](http://ceur-ws.org/Vol-1345/gamifir15_5.pdf) | [project](https://datasets.simula.no/eye-tracker/)]\n* HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios. [[publication](http://home.ifi.uio.no/paalh/publications/files/mmsys2013-dataset.pdf) | [project](https://datasets.simula.no/hsdpa/)]\n* Image Sentiment, A dataset for image sentiment analysis. [[publication](https://arxiv.org/pdf/2009.03051.pdf) | [project](https://osf.io/xakp2/)]\n* Njord, A fishing boat dataset. [[project](https://github.com/simula/njord)]\n* Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation. [[project](https://zenodo.org/record/1118338)]\n* THREAT, A Large Annotated Corpus for Detection of Violent Threats. [[project](https://datasets.simula.no/threat/)]\n* Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [[publication](https://dl.acm.org/doi/10.1145/3339825.3394939) | [project](https://github.com/simula/toadstool)]\n* WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [[publication](https://dl.acm.org/doi/10.1145/3472720.3483617) | [project](https://osf.io/5m3by/)]\n* WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [[publication](https://doi.org/10.1145/3472720.3483617) | [project](https://datasets.simula.no/wico-text/)]\n\n\n## How to contribute\n\nDatasets are added via pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full walkthrough.\n\n## Contact\nIf you have any questions or need assistance, please open an issue in the repository or contact steven@simula.no.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fdatasets.simula.no","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimula%2Fdatasets.simula.no","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fdatasets.simula.no/lists"}