{"id":18351803,"url":"https://github.com/codeslash21/wrangle-twitter-archive","last_synced_at":"2025-04-10T00:37:09.981Z","repository":{"id":234739964,"uuid":"768140957","full_name":"codeslash21/wrangle-twitter-archive","owner":"codeslash21","description":"Wrangle Twitter Archive WeRateDog. WeRateDog has 8M followers and they rate the dogs with funny comments and unique rating system. Also use dog-breed classifier to predict dog's breed in the tweets.","archived":false,"fork":false,"pushed_at":"2024-03-06T14:49:57.000Z","size":3356,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-15T15:51:19.734Z","etag":null,"topics":["data-analysis","data-wrangling","neural-networkt","twitter-api","twitter-archive"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codeslash21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-06T14:44:54.000Z","updated_at":"2024-03-06T14:51:43.000Z","dependencies_parsed_at":"2024-04-20T16:23:07.678Z","dependency_job_id":"f95c7896-94fe-4eea-9008-1b261ff273f9","html_url":"https://github.com/codeslash21/wrangle-twitter-archive","commit_stats":null,"previous_names":["codeslash21/wrangle-twitter-archive"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeslash21%2Fwrangle-twitter-archive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeslash21%2Fwrangle-twitter-archive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeslash21%2Fwrangle-twitter-archive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeslash21%2Fwrangle-twitter-archive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codeslash21","download_url":"https://codeload.github.com/codeslash21/wrangle-twitter-archive/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248138007,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-wrangling","neural-networkt","twitter-api","twitter-archive"],"created_at":"2024-11-05T21:32:56.927Z","updated_at":"2025-04-10T00:37:09.961Z","avatar_url":"https://github.com/codeslash21.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Wrangle Twitter Archive\n\n## Table of contents:\n\n- \u003ca href=\"#intro\"\u003eIntroduction \u003c/a\u003e\n- \u003ca href=\"#data\"\u003eDataset \u003c/a\u003e\n- \u003ca href=\"#software\"\u003eWhat software do I need? \u003c/a\u003e\n- \u003ca href=\"#steps\"\u003eProject Steps \u003c/a\u003e\n\n\u003cdiv id=\"intro\u003e\n         \n\u003c/div\u003e\n## Introduction:\n\nIn this project we wrangle the tweet archive of Twitter user @dog_rates, also known as \u003ca href=\"https://en.wikipedia.org/wiki/WeRateDogs\"\u003e WeRateDogs\u003c/a\u003e. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. They rate the dogs almost\nalways with a denominator of 10. But numerators?? Most of them are greater than 10. But WHY??? WeRateDogs believes every dog is beautiful and almost all dogs deserve 10 and sometimes more than that. WeRateDogs has over 8 million followers and has received international media coverage. Our goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations.\n\n\u003cdiv id=\"data\"\u003e\n  \n## Dataset:\n\nThe dataset consists of three parts. \n- **Enhanced twitter archive:** \nThe WeRateDogs Twitter archive contains basic tweet data for all 5000+ of their tweets, but not everything. One column the archive does contain though: each tweet's text, which I used to extract rating, dog name, and dog \"stage\" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive \"enhanced.\" Of the 5000+ tweets, I have filtered for tweets with ratings only (there are 2356). This data is stored in `twitter_archive_enhanced.csv` file.\n\n- **Additional Data via the Twitter API:**\nBack to the basic-ness of Twitter archives: retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered by anyone from Twitter's API. Well, \"anyone\" who has access to data for the 3000 most recent tweets, at least. We have the WeRateDogs Twitter archive and specifically the tweet IDs within it, we can gather this data for all 5000+. We're going to query Twitter's API to gather this valuable data. Finally we store these data in `tweet_json.txt` file.\n\n- **Image Predictions File:**\nOne more cool thing: I ran every image in the WeRateDogs Twitter archive through a neural network that can classify breeds of dogs*. The results: a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction (numbered 1 to 4 since tweets can have up to four images). We store this prediction data in `image_predictions.tsv` file.\n\n\u003cdiv id=\"software\"\u003e\n  \n## What Software Do I Need?\nOne can do this project in jupyter notebook using python 3.x But one has to install the following python packages to wrangle dataset and query twitter api.\n\n\u003e - pandas\n\u003e - NumPy\n\u003e - requests\n\u003e - tweepy\n\u003e - json\n\u003e - sqlalchemy\n  \n\u003cdiv id=\"steps\"\u003e\n  \n## Project Steps:\nBasically data wrangling process consissts of three steps. These are follows -\n\n- **Gather Data:** Gather dataset for wrangling.\n- **Assess Data:** Note the issues regarding quality and tidiness of the dataset.\n- **Clean Data:** Here we fixing issues those are documented during data assessment process to make dataset ready for analysis.\n\nIts recomended that after data wrangling process, clean data should be stored for future analysis purpose. Here we store the clean data in a flat file `twitter_archive_master.csv` and a sqlite database `twitter.db`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeslash21%2Fwrangle-twitter-archive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodeslash21%2Fwrangle-twitter-archive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeslash21%2Fwrangle-twitter-archive/lists"}