{"id":28070108,"url":"https://github.com/neverbot/doom-paper","last_synced_at":"2025-05-12T19:37:20.437Z","repository":{"id":290598862,"uuid":"974989621","full_name":"neverbot/doom-paper","owner":"neverbot","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-29T16:05:08.000Z","size":107,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-29T17:24:15.732Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neverbot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-29T15:55:59.000Z","updated_at":"2025-04-29T16:05:11.000Z","dependencies_parsed_at":"2025-04-29T17:24:18.086Z","dependency_job_id":"67e4987f-71dd-4b9d-ae2f-5f619fca76cc","html_url":"https://github.com/neverbot/doom-paper","commit_stats":null,"previous_names":["neverbot/doom-paper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neverbot%2Fdoom-paper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neverbot%2Fdoom-paper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neverbot%2Fdoom-paper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neverbot%2Fdoom-paper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neverbot","download_url":"https://codeload.github.com/neverbot/doom-paper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253808832,"owners_count":21967604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-12T19:37:19.879Z","updated_at":"2025-05-12T19:37:20.429Z","avatar_url":"https://github.com/neverbot.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# doom-paper\n\n## What am I doing here?\n\n## Project Structure\n\nThis repository uses Git submodules for some tools:\n\n- [wadconvert](https://github.com/neverbot/wadconvert) - Tool to convert WAD files to JSON and other formats.\n- [waddup](https://github.com/neverbot/waddup) - Tool to detect and filter duplicate WAD files.\n\n```bash\n# Clone the repository with submodules (the --recursive flag is important)\ngit clone --recursive git@github.com:neverbot/doom-paper.git\ncd doom-paper\n```\n\n## Steps followed\n\n### Get data\n\nI've downloaded [a big WAD file repository](https://archive.org/details/idgames_202003) (~20GB) from the Internet Archive. They have [a bigger one](https://archive.org/details/wadarchive) (~975GB) but currently I do not have the space to process it, so let's work for a while with the \"smaller\" one until we get some results.\n\nUncompress the zip file (twice, it has another zip inside), so you will have a big collection of smaller zip files inside `/idgames_202003/idgame`.\n\nUse the script in `scripts/unzip-everything.sh` to uncompress all the zip files. This will take a while, so be patient. After the script finishes, you will have a really big collection of directories (~41GB) with WAD and TXT files with descriptions and informations about such WAD files.\n\n```bash\ncd scripts\n./unzip-everything.sh\n```\n\n### Filter data\n\nI've built the [`waddup` command line tool](https://github.com/neverbot/waddup) in C++ to filter the WAD files. It will take a directory with a lot of WAD files and any other things and will create a new directory with only the non-repeated WAD files.\n\n```bash\ncd tools/waddup\n\n# The tool is built with conan and make\nconan profile detect\nmake\n\nbuild/waddup ../../idgames_202003/idgames/ ../../wads\n```\n\nThe tool will take just some seconds to run, and it will create a new directory with the non-repeated WAD files. For the curious, the tool groups wad files by file size, and then by SHA256 hash. It will copy the first instance of each WAD file to the new directory.\n\nThe output will look like this:\n\n```bash\n$ build/waddup ../../idgames_202003/idgames/ ../../wads\n\nWAD DUPlicate detector:\nSearching for WAD files in directory: ../../idgames_202003/idgames/\nTotal WAD files found: 7614\nProgress: 100.0%\nProcessing complete.\n\nFound duplicate WAD files with hash 4eb361c32b6bf1db350488f592e65eeeea17f0821d2b26966bf92106e84ccdb7:\n  ../../idgames_202003/idgames/combos/rdnd/rdndtest.wad (4294 bytes)\n  ../../idgames_202003/idgames/combos/rdnd15/rdndtest.wad (4294 bytes)\nCopied first instance to destination: ../../wads/007425_combos_rdnd_rdndtest.wad\n\n...\n```\n\nWith this filter we can move from a collection of ~41GB of mixed files to ~27GB of non repeated WADs.\n\n### Convert data\n\nNow we can convert the WAD files to JSON format. I've built the C++ [`wadconvert` command line tool](https://github.com/neverbot/wadconvert) that will take a WAD file and convert it to a JSON file. The tool is built with conan and make, so you need to build it first.\n\n```bash\ncd tools/wadconvert\n\n# The tool is built with conan and make\nconan profile detect\nmake\n\n# example of use\nbuild/wadconvert -json ../../wads/000001_deathmatch_deathtag_behetag_Behetag.wad ../../test.json\n```\n\nAfter using it for a while, instead of keep using JSON as the format to store the information, I decided to create a custom DSL (Domain Specific Language) to store the information in a more readable way. The DSL is a simple text format that can be easily parsed and converted. Take a look to the [`wadconvert`](https://github.com/neverbot/wadconvert) readme file for more information about the DSL format. The tool can convert WAD files to DSL format with the `-dsl` option.\n\n```bash\n# example of use\nbuild/wadconvert -dsl ../../wads/000001_deathmatch_deathtag_behetag_Behetag.wad ../../test.dsl\n```\n\nUse the script in `scripts/wads-convert.sh` to convert all the WAD files in the `wads` directory to the DSL format (edit the script if you want to convert them to JSON). The script will use the `wadconvert` tool with every WAD file in the `wads` directory and will store the DSL files in the destination directory (`wads-dsl`). It will take a while to run, so be patient. After the script finishes, you will have a new directory with all the WAD files converted to the new format.\n\n```bash\ncd scripts\n./wads-convert.sh\n```\n\nAs the script could take a long time, it's made to run alphabetically, so you can stop it where you want (`ctrl + c`) and run it again later, starting where you want. If you provide the argument `--start`, it will skip files until it finds one that matched the wildcards provided. For example, if you converted the files until one that stars with `000123_`, you can run the script again with the argument `--start=000123*`.\n\n```bash\n$ cd scripts\n$ ./wads-convert.sh --start=000008*\n\nSkipping: ../wads/000001_deathmatch_deathtag_behetag_Behetag.wad\nSkipping: ../wads/000002_deathmatch_doombot_dbot51_ctflevel.wad\nSkipping: ../wads/000003_deathmatch_doombot_dbot51_ZDoomB.wad\nSkipping: ../wads/000004_deathmatch_facility_Facility.wad\nSkipping: ../wads/000005_deathmatch_skulltag_99coop_99coop.wad\nSkipping: ../wads/000006_deathmatch_skulltag_mek-ttge_Mek-TTGE.wad\nSkipping: ../wads/000007_deathmatch_skulltag_basebldm_BASEBALLDM.wad\n000008_deathmatch_skulltag_td2_td2.wad converted to dsl.\n000009_deathmatch_skulltag_td5_TD5.wad converted to dsl.\n...\n```\n\nAfter converting all the WAD files, I noticed that some of them were not converted. I created the `check-converions.sh` script to check the differences between the directories. Of a total of 7440 files, only 10 were not able to be converted, as they turned out to not be correct WAD files. Just remove them from the `wads` directory so they won't be processed again.\n\n\u003cdetails\u003e\n\u003csummary\u003eWAD files not converted to DSL\u003c/summary\u003e\n\n```bash\n$ ./check-conversions.sh\n\nWAD files not converted to DSL:\n000346_graphics_junkcity___MACOSX_._junkcity.wad\n000588_levels_doom_Ports_s-u_sigil_v1_21___MACOSX_._SIGIL_COMPAT_v1_21.wad\n000589_levels_doom_Ports_s-u_sigil_v1_21___MACOSX_._SIGIL_v1_21.wad\n002336_levels_doom2_Ports_g-i_gws___MACOSX_._GWS.wad\n002508_levels_doom2_Ports_j-l_jphouse_jphouse.wad\n002688_levels_doom_Ports_j-l_lijiang___MACOSX_lijiang_._lijiang.wad\n005150_levels_doom2_a-c_cesspool_cesspool.wad\n005933_levels_doom2_Ports_s-u_testfcil___MACOSX_._testfcil.wad\n006744_levels_doom2_Ports_s-u_stranger___MACOSX_._STRANGER.wad\n007380_levels_doom2_s-u_ultra_ultra.wad\n```\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneverbot%2Fdoom-paper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneverbot%2Fdoom-paper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneverbot%2Fdoom-paper/lists"}