{"id":18019932,"url":"https://github.com/bryceco/parseosmchangesetfile","last_synced_at":"2025-04-04T17:11:56.042Z","repository":{"id":151882141,"uuid":"584639767","full_name":"bryceco/ParseOsmChangesetFile","owner":"bryceco","description":"A crude fast parser for analyzing OSM changeset history files ","archived":false,"fork":false,"pushed_at":"2025-03-09T06:43:07.000Z","size":852,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T00:41:31.319Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bryceco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-03T06:03:39.000Z","updated_at":"2025-03-09T06:43:11.000Z","dependencies_parsed_at":"2024-10-30T05:54:03.799Z","dependency_job_id":null,"html_url":"https://github.com/bryceco/ParseOsmChangesetFile","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryceco%2FParseOsmChangesetFile","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryceco%2FParseOsmChangesetFile/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryceco%2FParseOsmChangesetFile/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryceco%2FParseOsmChangesetFile/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bryceco","download_url":"https://codeload.github.com/bryceco/ParseOsmChangesetFile/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217222,"owners_count":20903009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-30T05:13:11.060Z","updated_at":"2025-04-04T17:11:56.009Z","avatar_url":"https://github.com/bryceco.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ParseOsmChangesetFile\nA fast parser for analyzing OpenStreetMap changeset history files.\n\nWritten for macOS, but should be easily ported to any *nix. \n\nOSM changeset history files are large XML files that require a long time to process using typical XML parsing libraries. As of early 2023 \nthe changeset file (https://planet.openstreetmap.org/planet/changesets-latest.osm.bz2) is over 62 GB when uncompressed. \n\nThis project implements an app for gathering changeset statistics that are interesting to me, but includes a framework for\nparsing changeset history files with minimal overhead, processing a 62 GB history file in 2-3 minutes \non a MacBook Pro with M1 Pro processor and 32 GB memory. Fast processing times makes iterative refinement of analysis functions faster and easier.\n\nPerformance is improved using a combination of techniques:\n* Written in C++.\n* A custom miminmal XML parser designed solely for parsing changeset files (ChangesetParser.cpp).\n* The history file is memory mapped, and there are no memory allocations for strings during processing, except for the specific \nvalues that are needed by the analysis functions.\n* When the analysis only applies to changesets after a particular date (e.g. the last year) the raw XML file is binary searched for the\nchangeset at the cut-off date, avoiding the need to parse any XML before that date.\n\nThe parser is designed to be minimal but extensible. Rather than providing every piece of data that any analysis might need, you can add \nadditional fields as needed by your analysis functions.\n\nHere's an analysis function that counts and prints the total edits for every user in the history:\n~~~\n#include \u003ciostream\u003e\n#include \u003cmap\u003e\n#include \u003cstring\u003e\n#include \"ChangesetParser.hpp\"\nclass UserEditCount: public ChangesetReader {\n\tstd::map\u003cstd::string,long\u003e userCounts;\n\tvoid initialize() {}\n\tvoid process(const Changeset \u0026 changeset)\n\t{\n\t\tauto it = userCounts.find(changeset.user);\n\t\tif ( it != userCounts.end() ) {\n\t\t\tit = userCounts.insert(std::pair\u003cstd::string,long\u003e(changeset.user, 0)).first;\n\t\t}\n\t\tit-\u003esecond += changeset.editCount;\n\t}\n\tvoid finalize()\n\t{\n\t\tfor ( const auto \u0026user: userCounts ) {\n\t\t\tstd::cout \u003c\u003c user.first \u003c\u003c \" = \" \u003c\u003c user.second \u003c\u003c \"\\n\";\n\t\t}\n\t}\n};\n~~~\nand here's the main function that processes and analyzes everything after Jan 1, 2021:\n~~~\nint main(int argc, const char * argv[])\n{\n\tconst char * startDate = \"2021-01-01\";\n\tconst char * path = NULL;\n\tif ( argc == 2 ) {\n\t\tpath = argv[1];\n\t} else {\n\t\treturn 1;\n\t}\n\tChangesetParser * parser = new ChangesetParser();\n\tparser-\u003eaddReader(new UserEditCount());\n\tif ( parser-\u003eparseXmlFile( path, startDate ) ) {\n\t\treturn 0;\n\t} else {\n\t\treturn 1;\n\t}\n}\n~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbryceco%2Fparseosmchangesetfile","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbryceco%2Fparseosmchangesetfile","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbryceco%2Fparseosmchangesetfile/lists"}