{"id":26730919,"url":"https://github.com/wrinklerelease/brfss","last_synced_at":"2025-03-27T23:34:13.793Z","repository":{"id":284392761,"uuid":"954795704","full_name":"WrinkleRelease/brfss","owner":"WrinkleRelease","description":"BRFSS-related materials, instructions and scripts to setting up a Postgres instance for BRFSS data","archived":false,"fork":false,"pushed_at":"2025-03-25T17:06:36.000Z","size":144,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T17:33:15.463Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WrinkleRelease.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-25T16:18:56.000Z","updated_at":"2025-03-25T17:06:40.000Z","dependencies_parsed_at":"2025-03-25T17:46:27.437Z","dependency_job_id":null,"html_url":"https://github.com/WrinkleRelease/brfss","commit_stats":null,"previous_names":["wrinklerelease/brfss"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WrinkleRelease%2Fbrfss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WrinkleRelease%2Fbrfss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WrinkleRelease%2Fbrfss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WrinkleRelease%2Fbrfss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WrinkleRelease","download_url":"https://codeload.github.com/WrinkleRelease/brfss/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944020,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-27T23:33:00.273Z","updated_at":"2025-03-27T23:34:13.740Z","avatar_url":"https://github.com/WrinkleRelease.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository aims to provide instructions on how to run the CDC's BRFSS data in a PostgreSQL database while accessing it from RStudio to perform analysis.\n\nThe schema presented here divide the BRFSS into sixty tables grouped by section name (as indicated in the LLCP codebook). The resulting SQL db is around 530MB, half that of the XPT file. This, along with the ability to only pull the table or column you needed into R, means a faster, more responsive analysis environment.\n\n I also provide a markdown-formatted version of the codebook.\n\n\u003cbr\u003e\n\n# Obtain the BRFSS File\n\nThe CDC hosts the data file in either `.xpt` or `.asc` format. They can be downloaded, zipped, from [this location](https://www.cdc.gov/brfss/annual_data/annual_2023.html). Scroll down to the section entitled **Data Files** and choose the format you prefer. I worked with the `.xpt` file as the `.asc` requires a separate file (known as a dictionary file) to correctly assign column names and etc.\n\n\u003cbr\u003e\n\n# Convert XPT to a SQL Friendly Format\n\nI found two methods:\n\n**1. SAS Universal Viewer**\nSAS produces the free [SAS Universal Viewer](https://support.sas.com/downloads/browse.htm?cat=74) that will open `.xpt` files and export `.csv`. You'll need an account to download the tool. The csv file it produced was nearly 100MB smaller than the version produced by RStudio.\n\n**2. Using a simple `.r` script in RStudio**\nThe `.r` file I used to convert the XPT file to CSV is included.\n\n\u003cbr\u003e\n\n# SQL Schema \u0026 Splitting the CSV\n\nMy Postgres schema is based on the codebook, which lists the SAS Variable names along with the Section Name. Each Section is its own table, and each column is an SAS Variable in that section. \n\nI was unable to successfully import the entire CSV file into a Postgres instance with an `init.sql` file. Instead, I split the full CSV into smaller files, one per section. \n\n```shell\ncut -d ',' -f [column numbers separated by commas] ~/path/to/master.csv \u003e section_name.csv\n```\nMove all these files into the `container-name/data/` folder.\n\n\u003cbr\u003e\n\n# Setting up a Docker PostgreSQL instance\n\nInstall Docker Desktop and get the latest PostgreSQL image. The `docker-compose.yml` files found in brfss-2022-container and brfss-2023-container are the instructions for starting up your Postgres db. The compose file is structured to give a persistent db even if the container is stopped and restarted. \n\n\u003e [!CAUTION]\n\u003e If you want to keep your Postgres data, sping the container down with `docker-compose down`. Do _not_ use `docker-compose down -v` unless you want to scrub completely and re-initialize.\n\nSince you started the docker container without an `init.db` the Postgres db exists but has no table or data in it.\n\nI added the tables one at a time in dBeaver then use pqsl in terminal to copy the data over. Here are the psql commands:\n\n```shell\n# first \ndocker ps\n\n# second\ndocker exec -it your_container_name psql -U your_user_name -d your_db_name\n\n# third\n\\copy table_name (\"columns\",\"columns\") from '/docker-entrypoint-initdb.d/file_name.csv' delimiter ',' CSV\nHEADER;\n```\n\nYou can also backup and restore the Postgres db.\n```shell\n# backup sql db\ndocker exec -t your-container-name pg_dump -U username your-db-name \u003e db_name_backup.sql\n\n# restore db\ndocker exec -i your-container-name pg_dump -U username your-db-name \u003e db_name_backup.sql\n```\n\n\u003cbr\u003e\n\n# Connecting to the database from RStudio\n\nEstablish the db connection\n```R\n# install needed packages \ninstall.packages(\"DBI\")\ninstall.packages(\"RPostgres\")\n\n# call packages on script execution\nlibrary(DBI)\nlibrary(RPostgres)\n\n# Replace with your actual database credentials\ncon \u003c- dbConnect(RPostgres::Postgres(),\n                 dbname = \"your_database_name\",\n                 host = \"your_host\",      # e.g., \"localhost\" or an IP address\n                 port = 5432,             # default PostgreSQL port\n                 user = \"your_username\",\n                 password = \"your_password\")\n\n```\n\nCall tables or variables as needed\n```R\n# Replace 'your_table_name' with the actual table name\ndata \u003c- dbReadTable(con, \"your_table_name\")\n\n# Replace 'your_table_name' with the actual table name and specify the columns you want\nquery \u003c- \"SELECT column1, column2 FROM your_table_name\"\ndata \u003c- dbGetQuery(con, query)\n```\n\nOnce finished, disconnect\n```R\ndbDisconnect(con)\n```\n\n\u003cbr\u003e\n\n# Supplemental Material\n\nThe following supplemental material is provided for each year.\n\n**Codebook**: The codebook gives the variable name, location, and frequency of values for all reporting areas combined for the landline and cell phone data set. The CDC distributes the codebook as an `.html` file which can be found on their site. Through python scripts, I've converted the codebook into a `.md` file with section and question headers, which greatly reduce seek time when looking up a variable, its SAS code and answer codes.\n\n**Variable Layout**: Knowing the variable layout helps when building a new schema and creating queries.\n\n\u003cbr\u003e\n\n# Handling Errors\n\n## 2022\n\n## 2023\n\nOddly, there ended up being some SAS Variables that either weren't listed in the Codebook or were listed in the Codebook but weren't in the original XPT file. My solution was to simply remove the columns from the dataset that didn't appear in the codebook. After all, without knowing the question or how the answered were coded, the columns are useless.\n\n| SAS Variable | Column Number | In Codebook | In Dataset | Remediation             |\n|--------------|---------------|-------------|------------|-------------------------|\n| `rcsborg1`   | NA            | Yes         | No         | Removed from `init.sql` |\n| `usemrjn4`   | 215           | No          | Yes        | Removed from csv        |\n| `birthsex`   | 205           | No          | Yes        | Removed from csv        |\n| `celsxbrt`   | 25            | No          | Yes        | Removed from csv        |\n| `rcsgend1`   | 252           | No          | Yes        | Removed from csv        |\n| `rcsxbrth`   | 253           | No          | Yes        | Removed from csv        |\n| `lndsxbrt`   | 19            | No          | Yes        | Removed from csv        |\n| `trnsgndr`   | 208           | No          | Yes        | Removed from csv        |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwrinklerelease%2Fbrfss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwrinklerelease%2Fbrfss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwrinklerelease%2Fbrfss/lists"}