{"id":17247604,"url":"https://github.com/joshdata/dcregisterer","last_synced_at":"2025-03-26T05:41:21.788Z","repository":{"id":66126353,"uuid":"162018674","full_name":"JoshData/dcregisterer","owner":"JoshData","description":"An unofficial website that makes it a little easier to find and view District of Columbia Register Notices.","archived":false,"fork":false,"pushed_at":"2020-06-07T21:45:38.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"primary","last_synced_at":"2025-01-31T07:16:47.995Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://s3.amazonaws.com/dcregisterer/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JoshData.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-16T16:15:03.000Z","updated_at":"2018-12-16T16:27:19.000Z","dependencies_parsed_at":"2023-04-23T13:18:06.801Z","dependency_job_id":null,"html_url":"https://github.com/JoshData/dcregisterer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fdcregisterer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fdcregisterer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fdcregisterer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fdcregisterer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JoshData","download_url":"https://codeload.github.com/JoshData/dcregisterer/tar.gz/refs/heads/primary","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245598294,"owners_count":20641882,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T06:38:26.345Z","updated_at":"2025-03-26T05:41:21.768Z","avatar_url":"https://github.com/JoshData.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"DC Register Unofficial Open Data\n================================\n\nThis is a repository of open data for the District of Columbia Register, the DC government's collection of administrative issuances, October 2009-present, from https://dcregs.dc.gov/default.aspx. Why? Because the DC website provides a hopelessly bad interface for actually finding anything.\n\nThe notices in the Register have been converted into browser and search engine friendly formats using ocrmypdf and tesseract (for PDFs) and LibreOffice (for Word docs).\n\nDevelopment\n-----------\n\nTo run the scripts in this repository to build your own copy of our open data, you'll need an Ubuntu 18.04 machine. Install:\n\n\tapt-get install ocrmypdf\n\tpip3 install rtyaml tqdm python-magic\n\nThen download the notices \u0026 metadata into `notices/*.blob` and `notices/*.yaml`:\n\n\tpython3 download_dc_register_notices.py\n\nMake symbolic links at `documents/*.{pdf,doc,docx,html,rtf}` to the raw notices files by automatically determining the file type of each notice:\n\n\tpython3 make_document_symlinks.py\n\nProduce new document files in alternative formats (OCR'd PDFs, plain text, and HTML):\n\n\tpython3 make_document_formats.py\n\nAnd finally produce `index.json` that is loaded by the website in `index.html`:\n\n\tpython3 make_index.py\n\nDeployment\n----------\n\nThe site is currently hosted statically on AWS S3. Upload with:\n\n\ts3cmd -P --no-preserve sync index.* s3://dcregisterer/\n\ts3cmd -P --no-preserve -F sync documents/ s3://dcregisterer/documents/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoshdata%2Fdcregisterer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoshdata%2Fdcregisterer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoshdata%2Fdcregisterer/lists"}