{"id":16117278,"url":"https://github.com/nickbabcock/register","last_synced_at":"2025-04-06T09:21:14.247Z","repository":{"id":147895915,"uuid":"107483976","full_name":"nickbabcock/register","owner":"nickbabcock","description":"Digesting and Distilling Federal Register data","archived":false,"fork":false,"pushed_at":"2020-03-18T01:47:54.000Z","size":30,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-12T14:55:36.439Z","etag":null,"topics":["federal-register","open-data","xquery"],"latest_commit_sha":null,"homepage":"","language":"XQuery","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nickbabcock.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-19T01:43:57.000Z","updated_at":"2021-05-07T11:37:05.000Z","dependencies_parsed_at":"2023-05-27T20:45:24.952Z","dependency_job_id":null,"html_url":"https://github.com/nickbabcock/register","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickbabcock%2Fregister","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickbabcock%2Fregister/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickbabcock%2Fregister/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickbabcock%2Fregister/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nickbabcock","download_url":"https://codeload.github.com/nickbabcock/register/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247458776,"owners_count":20942086,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["federal-register","open-data","xquery"],"created_at":"2024-10-09T20:43:50.084Z","updated_at":"2025-04-06T09:21:14.193Z","avatar_url":"https://github.com/nickbabcock.png","language":"XQuery","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Register\n\n`Register` is a project that attempts to distill the [Federal\nRegister](https://www.federalregister.gov/) data into a more digestible format\nwith an emphasis on reproducibility for those also interested in the data. This\nproject takes the 7GB of XML data from 2005 to the current year and condenses it into a\n70MB CSV. For more details / background, see the introductory blog post: [Back\nto the Register: Distilling the Federal Register for\nAll](https://nbsoftsolutions.com/blog/back-to-the-register-distilling-the-federal-register-for-all)\n(the `sample.R` script is used to generate graphs for that post).\n\nSee [Releases for the latest csv\ndata](https://github.com/nickbabcock/register/releases/latest). Here are the headers\n\n- date: The date the document appeared in the registry\n- type: Presidential / rule / proposed-rule / notice\n- agency: What agency issued this document (eg. Department of transportation)\n- sub agency: What sub agency issued the document. For instance, while the agency may be \"Health and human services\", the sub agency may be \"Food and drug administration\"\n- subject: What is the subject / title of this document\n- names: List of names associated with the document (semi-colon delimited)\n- rin: List Regulation Identifier Numbers associated with the document (semi-colon delimited)\n\nHere's a sample of the data (with subject column removed as Federal Register titles are quite long):\n\n```\ndate        type    agency                             names               rin            docket\n2013-03-20  notice  Department of transportation       G. Kelly Leone                     2013-06361\n2015-04-02  notice  Department of veterans affairs     Rebecca Schiller                   2015-07509\n2012-11-14  notice  Department of commerce             Gwellnar Banks                     2012-27621\n2013-07-22  notice  Federal communications commission  Marlene H. Dortch                  2013-17626\n2005-10-19  notice  Environmental protection agency    Vicki A. Simons                    05-20709\n2016-02-09  notice  Office of personnel management     Beth F. Cobert                     2016-02615\n2013-09-19  rule    Department of the interior         Stephen Guertin     RIN 1018-AY52  2013-22702\n2009-05-05  notice  Department of labor                Elliott S. Kushner                 E9-10237\n2010-08-03  notice  Small business administration      Karen G. Mills                     2010-19068\n2007-09-05  notice  Environmental protection agency    James B. Gulliford                 E7-17542\n```\n\n## Generating the data\n\nThere are two ways to download, parse, and generate the data you see above: [docker](https://www.docker.com/products/container-runtime) (easiest) or by installing the prerequisites (still not that bad)\n\n### Docker\n\nAssuming docker is installed\n\n```bash\ndocker build -t nickbabcock/register .\ndocker run -v \"$(pwd)/data\":/register/data --rm -ti nickbabcock/register\n```\n\nThe csv data will be in the data directory\n\n### Prerequisites\n\nIf not interested in the docker solution, you'll need:\n\n- bash shell (linux machine -- potentially mac-os)\n- Java\n- python3\n\nAfter the above are installed, run the below scripts, which will do the following:\n\n- `setup.sh`:\n  - Download the java library for XQuery files into a `saxon` directory\n  - Download the Federal Register data into the `data` directory\n- `run_conversion.sh`\n  - Run the XQuery transformation (`transform.xql`), which outputs JSON lines\n  - Pipe the JSON lines into the python script (`to_csv.py`), which outputs a CSV file\n\n## Contributing\n\nOnly a subset of fields available in the Federal Register are extracted into\nthe CSV. If there is a field missing that you want to see, please open an issue\nor create a pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickbabcock%2Fregister","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnickbabcock%2Fregister","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickbabcock%2Fregister/lists"}