{"id":19740795,"url":"https://github.com/duo-labs/datasci-ctf","last_synced_at":"2025-04-30T05:33:41.749Z","repository":{"id":85971832,"uuid":"236785319","full_name":"duo-labs/datasci-ctf","owner":"duo-labs","description":"A capture-the-flag exercise based on data analysis challenges","archived":false,"fork":false,"pushed_at":"2020-01-30T02:16:29.000Z","size":7989,"stargazers_count":18,"open_issues_count":0,"forks_count":9,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T23:11:26.497Z","etag":null,"topics":["ctf","data-science"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duo-labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-28T16:46:37.000Z","updated_at":"2024-12-19T08:11:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"b76a7978-c94d-47db-bb10-b7d9c6d1fd3c","html_url":"https://github.com/duo-labs/datasci-ctf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duo-labs%2Fdatasci-ctf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duo-labs%2Fdatasci-ctf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duo-labs%2Fdatasci-ctf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duo-labs%2Fdatasci-ctf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duo-labs","download_url":"https://codeload.github.com/duo-labs/datasci-ctf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251650300,"owners_count":21621686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctf","data-science"],"created_at":"2024-11-12T01:23:30.542Z","updated_at":"2025-04-30T05:33:41.718Z","avatar_url":"https://github.com/duo-labs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Science Capture the Flag\n\nWhat is a data science capture the flag (CTF)? To learn more checkout [this blog post](https://duo.com/labs/research/gamifying-data-science-education).\n\nIn short, the data science team at Duo Security created an internal training workshop to teach exploratory data analysis skills in a gamified way. In the CTF participants compete in teams to solve data analysis challenges.\n\nThe CTF sessions we led were very well received and we hope other organizations run similar exercises. To decrease the effort needed to run a data science CTF, we've open sourced much of the material we used our CTF sessions.\n\n## Materials\n\nThere are four datasets in the following directories:\n\n* Intro dataset\n* Movie Ratings\n* Jeopardy\n* Challenge death\n\nEach directory contains the data in a CSV, a README that describes the dataset, and a challenges.md with the CTF challenges and answers.\n\nWe found that participants got the most out of the session when they came prepared. There are preparation instructions in `Preparing for the CTF.md`. This leads participants to the \"Tutorials\" directory where there are short tutorials to help participants get familiar with data analysis environments, namely Google Sheets, Excel, or Python in a Jupyter notebook.\n\n`Template Slides.pdf` is a a set of slides based on the presentation we make at the beginning of the session.\n\n## Running a CTF session\nSome suggested steps:\n\n* Host a scoring platform like [CTFd](https://github.com/CTFd/CTFd) so it’s accessible by participants.\n* Use our datasets and challenges to create flags. Optionally add new datasets and challenges.\n* A week before the session, send out the preparation materials to participants. Prep materials are everything in this repo except for the challenge documents and the template slides.\n* Start the session off with a presentation on EDA (you can base it on `Template Slides.pdf`). Let participants loose on the challenges!\n\n\n## Issues/Questions\nIssues should be filed using Github Issues.\n\n## License\n\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"\u003e\u003cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by/4.0/88x31.png\" /\u003e\u003c/a\u003e\u003cbr /\u003eThis work is licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"\u003eCreative Commons Attribution 4.0 International License\u003c/a\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduo-labs%2Fdatasci-ctf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduo-labs%2Fdatasci-ctf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduo-labs%2Fdatasci-ctf/lists"}