{"id":23586829,"url":"https://github.com/mae776569/weratedogs-wrangling","last_synced_at":"2026-01-25T10:32:27.782Z","repository":{"id":91936717,"uuid":"226647802","full_name":"MAE776569/WeRateDogs-wrangling","owner":"MAE776569","description":"Wrangling WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations","archived":false,"fork":false,"pushed_at":"2019-12-08T10:52:22.000Z","size":1362,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-17T04:07:49.038Z","etag":null,"topics":["data-analysis","data-science","data-visualization","tweets","twitter-api"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MAE776569.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-08T10:08:36.000Z","updated_at":"2021-05-17T12:24:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"5b1c6651-eefd-44e5-9f2e-4bfdb733fa7d","html_url":"https://github.com/MAE776569/WeRateDogs-wrangling","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MAE776569/WeRateDogs-wrangling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAE776569%2FWeRateDogs-wrangling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAE776569%2FWeRateDogs-wrangling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAE776569%2FWeRateDogs-wrangling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAE776569%2FWeRateDogs-wrangling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MAE776569","download_url":"https://codeload.github.com/MAE776569/WeRateDogs-wrangling/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAE776569%2FWeRateDogs-wrangling/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28751816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T10:25:12.305Z","status":"ssl_error","status_checked_at":"2026-01-25T10:25:11.933Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","data-visualization","tweets","twitter-api"],"created_at":"2024-12-27T04:14:38.684Z","updated_at":"2026-01-25T10:32:27.777Z","avatar_url":"https://github.com/MAE776569.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WeRateDogs Data Wrangling\n\nWrangling WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required.\n\n## Data Gathering\n\nThe data is divided into three pieces:\n\n- Enhanced Twitter Archive\n\nThe WeRateDogs Twitter archive contains basic tweet data for all 5000+ of their tweets, but not everything. One column the archive does contain though: each tweet's text, dog name, and dog \"stage\" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive \"enhanced.\" Of the 5000+ tweets, They were filtered for tweets with ratings only (there are 2356).\n\n- Image Predictions File\n\nEvery image in the WeRateDogs Twitter archive ran through a neural network that can classify breeds of dogs. The results is a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction (numbered 1 to 4 since tweets can have up to four images).\n\n- Additional Data via the Twitter API\n\nBack to the basic-ness of Twitter archives: retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered from Twitter's API.\n\n## Assessing Data\n\nAfter gathering each of the above pieces of data, it was assessed visually and programmatically for quality and tidiness issues.\n\n## Cleaning Data\n\nEach of the issues documented while assessing was cleaned.\n\nThen the cleaned data was stored in a CSV file with the main one named `twitter_archive_master.csv`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmae776569%2Fweratedogs-wrangling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmae776569%2Fweratedogs-wrangling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmae776569%2Fweratedogs-wrangling/lists"}