{"id":16821261,"url":"https://github.com/andrewharvey/abs2pgsql","last_synced_at":"2025-06-27T21:33:33.940Z","repository":{"id":3138036,"uuid":"4166797","full_name":"andrewharvey/abs2pgsql","owner":"andrewharvey","description":"PostgreSQL schemas and loading scripts for ABS data releases","archived":false,"fork":false,"pushed_at":"2017-04-02T02:57:20.000Z","size":1073,"stargazers_count":8,"open_issues_count":4,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-11T03:08:56.247Z","etag":null,"topics":["abs","census","postgresql"],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrewharvey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-04-28T11:24:10.000Z","updated_at":"2019-05-27T03:32:04.000Z","dependencies_parsed_at":"2022-08-06T13:15:59.630Z","dependency_job_id":null,"html_url":"https://github.com/andrewharvey/abs2pgsql","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andrewharvey/abs2pgsql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewharvey%2Fabs2pgsql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewharvey%2Fabs2pgsql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewharvey%2Fabs2pgsql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewharvey%2Fabs2pgsql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrewharvey","download_url":"https://codeload.github.com/andrewharvey/abs2pgsql/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewharvey%2Fabs2pgsql/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262334967,"owners_count":23295527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abs","census","postgresql"],"created_at":"2024-10-13T10:59:36.840Z","updated_at":"2025-06-27T21:33:33.887Z","avatar_url":"https://github.com/andrewharvey.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# About\nThis project exists to provide a means for loading [Australian Bureau of\nStatistics (ABS)](http://www.abs.gov.au/) data into a PostgreSQL\ndatabase.\n\nIn many cases the ABS only publish Microsoft Excel spreadsheets\nsummarising statistics they have collected in a report style. While this\ncaters for a large component of users, it leaves other users who would\nlike the data delivered in the form of machine readable statistics (\neither for use in a database directly or for building other applications\non top of) out in the dark. This project is designed to help out these\nusers of the data.\n\nThe technical implementation of such a project involves firstly obtaining\nthe statistics from the ABS, defining the structure of that data in the\ndatabase, and finally writing programs to actually transform and load\nthe data. I suppose this is an [extract, transform and load](https://en.wikipedia.org/wiki/Extract,_transform,_load)\noperation.\n\n## Design Principles\nCurrently this project has some fixed constraints limiting the scope of\nthe project and some guiding principles to adhere to.\n\n* Code written mostly in shell (bash flavoured) and Perl.\n* Targeting the PostgreSQL database.\n* Code covers the whole process as much as possible, i.e. the user should\n  be able to simply run make and all the data will be downloaded from the\n  ABS (or a mirror), transformed and loaded with minimal intervention.\n* Eventually target schemas will be released as stable versions to allow\n  third parties to rely on the schema.\n* Terminology should be kept consistent with the source statistics.\n* Use a normalised database model.\n\n## ASGS\nIn cases where ABS data is linked to the ASGS (Australian Statistical\nGeography Standard) such fields reference the ASGS via the asgs schema as\ncreated by [asgs2pgsql](https://github.com/andrewharvey/asgs2pgsql).\n\n## Development Status\nCurrently everything should be considered under active development and\nunstable. Please don't let that stop you from either using or helping out\nthough.\n\nPlease fork and send a pull request (or email your patch if you prefer)\nany contributions you would like to make. This isn't a one person\nproject.\n\n## Comments and Feedback\nPlease use the bug tracker for any bugs, feature requests or\nquestions. Or if you would prefer, you can email the maintainer.\n\n# ABS Product Releases\nThis project is actually a meta-project. It contains code for the\nfollowing ABS releases,\n* 8731.0 - Building Approvals, Australia, March 2012\n* 2011-census - Census of Population and Housing, 2011 (cat no. 2001.0,\n  2002.0, 2003.0, 2069.0.30.008)\n\n# Preparation\n## PostgreSQL Environment Variables\nAll scripts expect you have set up PG* environment variables. These are\nused to control which PostgreSQL database, hostname, port, username, etc.\nis used to load the data into. Refer to the [PostgreSQL documentation](http://www.postgresql.org/docs/current/static/libpq-envars.html)\nfor help.\n\nFor example in your terminal window before running the scripts first run,\n\n    export PGDATABASE=abs\n\n## PostgreSQL Performance Tweaks\nThere are several ways you can speed up this initial load (assuming you\nwant to \"build from source\" rather than using a pre-made dump as the\npre-made dump will always be faster). These suggestions are based on the\nfact that you are loading existing data and you don't need durability.\nThat is, if the server crashes part way though the load you are happy to\njust start the load from the start again.\n\nI would recommend you follow the advice for [non-durable settings](http://www.postgresql.org/docs/current/static/non-durability.html) at least for the time you are actually loading the data.\n\n# License\nThe licensing of this repository is slightly more complicated than usual.\nFor this reason I use a [DEP-5 style copyright file](http://dep.debian.net/deps/dep5/)\ncalled `copyright`.\n\nThe short version is the contents of this repository are licensed under the \n[Creative Commons Zero 1.0](http://creativecommons.org/publicdomain/zero/1.0/)\nlicense, except for some select files which contain\nalmost verbatim copied data from copyrighted works. Again refer to the\n`copyright` file for details.\n\nAlthough not required, I would prefer you give Attribution to the project\nif you distribute it and release derived works or modifications under the same\nCC0 license.\n\n    To the extent possible under law, the person who associated CC0\n    with this work has waived all copyright and related or neighboring\n    rights to this work.\n    http://creativecommons.org/publicdomain/zero/1.0/\n\n# Running the scripts\nRefer to the README within each of the separate loaders within this\nrepository for specific help and instructions for that product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewharvey%2Fabs2pgsql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewharvey%2Fabs2pgsql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewharvey%2Fabs2pgsql/lists"}