{"id":23302238,"url":"https://github.com/gabe-zhang/cf-dataviz","last_synced_at":"2025-04-06T22:20:46.511Z","repository":{"id":195626121,"uuid":"468931509","full_name":"gabe-zhang/cf-dataviz","owner":"gabe-zhang","description":"A visual data exploration of campaign finance data","archived":false,"fork":false,"pushed_at":"2023-03-03T22:58:16.000Z","size":7274,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"gh-pages","last_synced_at":"2025-02-13T04:31:32.618Z","etag":null,"topics":["data-visualization","ggplot2","r"],"latest_commit_sha":null,"homepage":"https://yuanzzhang.github.io/cf-dataviz/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gabe-zhang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-03-11T23:43:35.000Z","updated_at":"2023-03-03T23:03:52.000Z","dependencies_parsed_at":"2023-09-19T01:23:54.750Z","dependency_job_id":null,"html_url":"https://github.com/gabe-zhang/cf-dataviz","commit_stats":null,"previous_names":["yuanzzhang/cf-dataviz","gabe-zhang/cf-dataviz"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabe-zhang%2Fcf-dataviz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabe-zhang%2Fcf-dataviz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabe-zhang%2Fcf-dataviz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabe-zhang%2Fcf-dataviz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gabe-zhang","download_url":"https://codeload.github.com/gabe-zhang/cf-dataviz/tar.gz/refs/heads/gh-pages","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247558553,"owners_count":20958178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","ggplot2","r"],"created_at":"2024-12-20T10:19:29.507Z","updated_at":"2025-04-06T22:20:46.490Z","avatar_url":"https://github.com/gabe-zhang.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"Campaign Finance Visualization\n================\n\nThis is a **visual exploration** about candidate’s campaign finance data by\nusing several **data visualization** concepts from the book\n[*Fundamentals of Data\nvisualization*](https://clauswilke.com/dataviz/index.html). The dataset\nhas 26,828 rows and 50 columns, which is a combination of candidate\nsummary file from year 2008 to 2020 obtained from the U.S. Federal\nElection Commission ([FEC](https://www.fec.gov/)). Each summary file\ncovers for a two year period of candidate’s financial activity summary.\nFor more detailed file description, please visit:\n\u003chttps://www.fec.gov/campaign-finance-data/candidate-summary-file-description/\u003e\n\n# Part 1. Figures with stories\n\nIn Part 1, several data representations were created along with\ncorresponding stories. The figures presented conveyed messages extracted\nfrom the dataset and were useful in the exploratory data analysis\nprocess. It is believed that, when used properly, data visualization can\nhelp to facilitate understanding of the dataset by providing intuitive\ninsights.\n\n### 1.1 Bar plot of the top 5 parties\n\n\u003cimg src=\"dataviz_files/figure-gfm/party-barplot-1.png\" width=\"65%\" /\u003e\n\nAmong the top 5 popular parties, the number of candidates affiliated\nwith the Republican Party and the Democratic Party significantly\noutweighs (at least 300%) candidates from other political parties.\nRepublican and Democratic are the dominant parties in the dataset.\n\n### 1.2 Density estimates of individual contribution\n\n\u003cimg src=\"dataviz_files/figure-gfm/individual-contribution-density-1.png\" width=\"65%\" /\u003e\n\nThe density plot shows a bimodal distribution of total contributions\nfrom individuals. A high density of candidates made relatively low\nindividual contribution around the mode compared to candidates who made\n10 to 15 arcsinh units of individual contribution around the lower\ndensity peak.\n\n### 1.3 Sina plot of total loan from 2020 reports\n\n\u003cimg src=\"dataviz_files/figure-gfm/sina-total-loan-1.png\" width=\"65%\" /\u003e\n\nFor reports in 2020, candidates from popular states tend to have a total\nloan located between 5 and 15 natural log of dollars, ignoring total\nloan being zero. There is a significantly small data point in Florida\naway from the cluster, so it can be an outlier.\n\n### 1.4 Ending years of reports\n\n\u003cimg src=\"dataviz_files/figure-gfm/timeline-report-1.png\" width=\"65%\" /\u003e\n\nThe line graph shows the ending year of reports fluctuating between odd\nyears and even years. Majority reports ended in even years (e.g., 2010,\n2020), and a minor portion of reports ended in odd years (e.g., 2015).\n\n### 1.5 Geographical distribution of candidate office\n\n\u003cimg src=\"dataviz_files/figure-gfm/choropleth-candidate-office-1.png\" width=\"65%\" /\u003e\n\nCalifornia has the greatest color depth than other states, so most\ncandidates have their offices located in California. Also, many\ncandidates have their offices in Texas, Florida, or New York. But there\nare less candidate offices in Alaska.\n\n# Part 2. Misleading figures\n\nIn part 2 of this project, two figures (1.4 \u0026 1.5) from part 1 were\ntransformed and presented in a way that conveyed different messages.\nWhile these figures were technically correct, they may have been\nmisleading to some viewers who did not examine them carefully. This\nexercise demonstrated the importance of carefully considering how data\nis represented, as not all data representations can clearly convey the\nintended message. By understanding the potential for figures to be\nmisleading, we can better avoid them in the future.\n\n### 2.1 Log transformed ending years of reports\n\n\u003cimg src=\"dataviz_files/figure-gfm/bad-timeline-report-1.png\" width=\"65%\" /\u003e\n\nAt a glance, the number of report ending years has little variations\nfrom year to year. It just varies within 4 to 8 units. Although the plot\nis technically right, the log transformation eases the periodic trend in\nthe visual representation, making fluctuations less obvious. If one\nexponentiates the log unit, they will find the differences are actually\ndistinct.\n\n### 2.2 Dark colored office distribution\n\n\u003cimg src=\"dataviz_files/figure-gfm/bad-choropleth-candidate-office-1.png\" width=\"65%\" /\u003e\n\nPeople may naturally think California and Texas are the most popular\nstates of candidate offices (\u0026gt; 2000 counts). The dark color depth\nmakes them hard to distinguish from one another. In fact, there are more\nthan 2000 offices only in California, but not in Texas. They are not in\nthe same category.\n\n\n(All figures were created in R using packages)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabe-zhang%2Fcf-dataviz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabe-zhang%2Fcf-dataviz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabe-zhang%2Fcf-dataviz/lists"}