{"id":24010846,"url":"https://github.com/shrawans007/google_cyclistic_2023","last_synced_at":"2025-02-25T13:48:34.265Z","repository":{"id":265419610,"uuid":"895954897","full_name":"shrawans007/google_cyclistic_2023","owner":"shrawans007","description":"Google Data Analytics Capstone Case Study (SQL and Tableau)","archived":false,"fork":false,"pushed_at":"2025-01-07T10:19:15.000Z","size":150,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-07T11:32:43.440Z","etag":null,"topics":["big-query","bigquery","coursera-assignment","cyclistic","cyclistic-bike-share-analysis-case-study","cyclistic-bikshare","data-analysis","data-analysis-project","data-analytics","data-cleaning","data-combination","data-exploration","data-science","google-data-analytics","sql","tableau","tableau-dashboard","tableau-public"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shrawans007.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-29T08:58:12.000Z","updated_at":"2025-01-07T10:21:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"d1787913-8c30-447e-8943-9e3b03dfc54a","html_url":"https://github.com/shrawans007/google_cyclistic_2023","commit_stats":null,"previous_names":["shrawans007/google-cyclistic-2023","shrawans007/google_cyclistic_2023"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shrawans007%2Fgoogle_cyclistic_2023","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shrawans007%2Fgoogle_cyclistic_2023/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shrawans007%2Fgoogle_cyclistic_2023/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shrawans007%2Fgoogle_cyclistic_2023/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shrawans007","download_url":"https://codeload.github.com/shrawans007/google_cyclistic_2023/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240680827,"owners_count":19840314,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-query","bigquery","coursera-assignment","cyclistic","cyclistic-bike-share-analysis-case-study","cyclistic-bikshare","data-analysis","data-analysis-project","data-analytics","data-cleaning","data-combination","data-exploration","data-science","google-data-analytics","sql","tableau","tableau-dashboard","tableau-public"],"created_at":"2025-01-08T04:42:32.724Z","updated_at":"2025-02-25T13:48:33.625Z","avatar_url":"https://github.com/shrawans007.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Google Data Analytics Capstone: Casestudy Cyclistic 2023 (SQL and Tableau)\n\nCourse: [Google Data Analytics Capstone: Complete a Case\nStudy](https://www.coursera.org/learn/google-data-analytics-capstone/)\n\n![cyclistic_case_study](https://github.com/user-attachments/assets/b77dbfdb-2daa-4ae9-9183-7fd7b9b8c798)\n\n\n## Introduction\n\nWelcome to the Cyclistic bike-share analysis case study! In this case\nstudy, I worked for a fictional company, Cyclistic, along with some key\nteam members. In order to answer the business questions, I have followed\nthe steps of the data analysis process: **Ask, Prepare, Process,\nAnalyze, Share,** and **Act.**\n\n## Scenario\n\nI am assuming myself as a junior data analyst working on the marketing\nanalyst team at Cyclistic, a bike-share company in Chicago. The director\nof marketing Ms. Lily Moreno believes the company’s future success\ndepends on maximizing the number of annual memberships.\n\nTherefore, my team wants to understand how casual riders and annual\nmembers use Cyclistic bikes differently. From these insights, we will\ndesign a new marketing strategy to convert casual riders into annual\nmembers. But first, Cyclistic executives must approve our team's\nrecommendations, so they must be backed up with compelling data insights\nand professional data visualizations.\n\n### About the company\n\nIn 2016, Cyclistic launched a successful bike-share offering. Since\nthen, the program has grown to a fleet of 5,824 bicycles that are geo\ntracked and locked into a network of 692 stations across Chicago. The\nbikes can be unlocked from one station and returned to any other station\nin the system anytime.\n\nUntil now, Cyclistic’s marketing strategy relied on building general\nawareness and appealing to broad consumer segments. One approach that\nhelped make these things possible was the flexibility of its pricing\nplans: single-ride passes, full-day passes, and annual memberships.\nCustomers who purchase single-ride or full-day passes are referred to as\ncasual riders.Customers who purchase annual memberships are Cyclistic\nmembers.\n\nCyclistic’s finance analysts have concluded that annual members are much\nmore profitable than casual riders. Although the pricing flexibility\nhelps Cyclistic attract more customers, Moreno believes that maximizing\nthe number of annual members will be key to future growth. Rather than\ncreating a marketing campaign that targets all-new customers, Moreno\nbelieves there is a solid opportunity to convert casual riders into\nmembers. Moreno has set a clear goal: Design marketing strategies aimed\nat converting casual riders into annual members.\n\n## Ask\n\n### Business Task\n\nHelp to design marketing strategies to convert casual riders to members.\n\n### Analysis Questions\n\nThree questions will guide the future marketing program:\n\n1.  How do annual members and casual riders use Cyclistic bikes\n    differently?\n\n2.  Why would casual riders buy Cyclistic annual memberships?\n\n3.  How can Cyclistic use digital media to influence casual riders to\n    become members?\n\nMoreno has assigned my team the first question to answer: How do annual\nmembers and casual riders use Cyclistic bikes differently?\n\n## Prepare\n\n### Data Source\n\nI used Cyclistic’s historical trip data to analyze and identify trends\nfrom Jan 2023 to Dec 2023 which can be downloaded from\n[divvy_tripdata](https://divvy-tripdata.s3.amazonaws.com/index.html).\nThe data has been made available by Motivate International Inc. under\nthis [license](https://divvybikes.com/data-license-agreement).\n\nThis is public data that can be used to explore how different customer\ntypes are using Cyclistic bikes. But note that data-privacy issues\nprohibit from using riders’ personally identifiable information. This\nmeans that we won’t be able to connect pass purchases to credit card\nnumbers to determine if casual riders live in the Cyclistic service area\nor if they have purchased multiple single passes.\n\n### Data Organization\n\nThere are 12 files with naming convention of YYYYMM-divvy-tripdata and\neach file includes information for one month, such as the ride id, bike\ntype, start time, end time, start station, end station, start location,\nend location, and whether the rider is a member or not. The\ncorresponding column names are ride_id, rideable_type, started_at,\nended_at, start_station_name, start_station_id, end_station_name,\nend_station_id, start_lat, start_lng, end_lat, end_lng and\nmember_casual.\n\n## Process\n\nBigQuery is used to combine the various datasets into one dataset and\nclean it. Reason: A worksheet can only have 1,048,576 rows in Microsoft\nExcel because of its inability to manage large amounts of data. Because\nthe Cyclistic dataset has more than 6 million rows, it is essential to\nuse a platform like BigQuery that supports huge volumes of data.\n\n### Combining the Data\n\nSQL Query: [Data\nCombining](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1slofty-fort-427707-j0!2sus-central1!3s426c52ea-3526-4025-9f5a-d6ea218b237a!2e1)\n\nMonthly wise starting from January-2023 and ending at December-2023; 12\ncsv files are uploaded as tables in the dataset '202301_tripdata',\n202302_tripdata, 202303_tripdata and so on. Another table named\n\"2023_combined_data\" is created, containing 6,048,834 rows of data for\nthe entire year.\n\n### Data Exploration\n\nSQL Query: [Data\nExploration](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1slofty-fort-427707-j0!2sus-central1!3s25e59f32-557b-4747-b68c-bf1157875e73!2e1)\n\nBefore cleaning the data, I am familiarizing myself with the data to\nfind the inconsistencies.\n\nObservations:\n\n1.  The table below shows the all column names and their data types. The\n    ride_id column is our primary key.\n\n![cyclistic_2023_schema](https://github.com/user-attachments/assets/aa285e4a-4e54-4558-b5ca-1e148ccc98ae)\n\n2.  The following table shows number of null values in each column.\n\n![cyclistic_2023_nulls](https://github.com/user-attachments/assets/34603cbc-b16f-489c-92d4-d84952253458)\n\n**Note**: After checking the above results, I found out that the number of\nNULLS of columns start_station_name, start_station_id, end_station_name,\nend_station_id, end_lat and end_lng are not matching to the numbers of\nNULLS(328957) of remaining other columns(please refer the image). This\nmay be due to missing information in the same row i.e. station's name\nand id for the same station and latitude and longitude for the same\nending station.\n\n3.  Missing values/nulls related to start_staion_name and\n    start_staion_id.\n\n![cyclistic_2023_nulls_start_station](https://github.com/user-attachments/assets/4d879618-bb19-4e3e-bda1-b76a62b79833)\n\nSo, these 875848 rows have both start_station_name and start_station_id\nmissing needs to be removed.\n\n4.  Missing values/nulls related to end_staion_name and end_staion_id.\n\n![cyclistic_2023_nulls_end_station](https://github.com/user-attachments/assets/48f78fd3-05a3-4ed3-8e9b-bb366ca88f0d)\n\nAgain, these 929343 rows have both end_station_name and end_station_id\nmissing needs to be removed.\n\n5.  Missing values/nulls related to end_lat and end_lng\n\n![cyclistic_2023_nulls_end_locations](https://github.com/user-attachments/assets/2e937351-6ef1-4592-98a4-a7e4a5ede3d4)\n\nHere, these 6990 rows have both end_lat and end_lng missing needs to be\nremoved.\n\n6.  As ride_id has no null values, let's use it to check for duplicates.\n   \n\n![cyclistic_2023_ride_id_duplicates](https://github.com/user-attachments/assets/31c3f353-ca96-453b-8d22-253e94ad597e)\n\n\nThere are no duplicate rows in the data.\n\n7.  All ride_id values have length of 16 so, no need to clean it.\n\n8.  There are 3 unique types of bikes(rideable_type) in our data.\n\n![cyclitic_2023_bike_types](https://github.com/user-attachments/assets/65c46a43-7e1d-42b5-8fc9-a407845d209c)\n\n9.  The started_at and ended_at shows start and end time of the trip in\n    YYYY-MM-DD hh:mm:ss UTC format. New column ride_length can be\n    created to find the total trip duration. There are 6419 trips which\n    has duration longer than a day and 161112 trips having less than a\n    minute duration or having end time earlier than start time so need\n    to remove them. Other columns day_of_week and month can also be\n    helpful in analysis of trips at different times in a year.\n\n10. member_casual column has 2 uniqued values as member or casual rider.\n\n![cyclitic_2023_rider_types](https://github.com/user-attachments/assets/9ded1081-d48f-48cc-b9de-5f673e3caa41)\n\n11. Columns that need to be removed are start_station_id and\n    end_station_id as they do not add value to analysis of our current\n    problem. Longitude and latitude location columns may not be used in\n    analysis but can be used to visualize a map.\n\n### Data Cleaning\n\nSQL Query: [Data\nCleaning](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1slofty-fort-427707-j0!2sus-central1!3s7e21c2c1-264c-4f46-9a26-ef2e79d55725!2e1)\n\n1.  All the rows having missing values are deleted.\n\n2.  3 more columns ride_length for duration of the trip, day_of_week and\n    month are added.\n\n3.  Trips with duration less than a minute and longer than a day are\n    excluded.\n\n4.  Total 1,811,546 rows are removed in this step.\n\n## Analyze and Share\n\nSQL Query: [Data\nAnalysis](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1slofty-fort-427707-j0!2sus-central1!3s9cc6fa9e-dbd7-4e77-88ea-ac1854422015!2e1)\n\nData Visualization:\n[Tableau](https://public.tableau.com/app/profile/shrawan.singh6518/viz/GoogleCaseStudyCyclistic_17317739104460/Story1)\n\nThe data is stored appropriately and is now prepared for analysis. I\nqueried multiple relevant tables for the analysis and visualized them in\nTableau. The analysis question is: How do annual members and casual\nriders use Cyclistic bikes differently?\n\nFirst of all, member and casual riders are compared by the type of bikes\nthey are using.\n\n![cyclistic_2023_tableau_01](https://github.com/user-attachments/assets/a73691bf-8495-430a-a23d-63cb6d306ac3)\n\nThe members make 64.53% of the total while remaining 35.47% constitutes\ncasual riders. Each bike type chart shows percentage from the total.\nMost used bike is classic bike followed by the electric bike. Docked\nbikes are used the least by only casual riders.\n\nNext the number of trips distributed by the months, days of the week and\nhours of the day are examined.\n\n![cyclistic_2023_tableau_02](https://github.com/user-attachments/assets/e6858a76-8e58-4a99-8531-616678ef7490)\n\n**Months**: When it comes to monthly trips, both casual and members\nexhibit comparable behavior, with **more trips in the spring and summer\nand fewer in the winter.** The gap between casuals and members is closest\nin the month of July in summer.\n\n**Days of Week**: When the days of the week are compared, it is discovered\nthat **casual riders make more journeys on the weekends while members show\na decline over the weekend in contrast to the other days of the week.**\n\n**Hours of the Day**: The members shows 2 peaks throughout the day in\nterms of number of trips. **One is early in the morning at around 6 am to\n8 am and other is in the evening at around 4 pm to 8 pm** while number of\ntrips for casual riders increase consistently over the day till evening\nand then decrease afterwards.\n\nWe can infer from the previous observations that **member may be using\nbikes for commuting to and from the work in the week days while casual\nriders are using bikes throughout the day, more frequently over the\nweekends for leisure purposes.** Both are most active in summer and\nspring.\n\nRide duration of the trips are compared to find the differences in the\nbehavior of casual and member riders.\n\n![cyclistic_2023_tableau_03](https://github.com/user-attachments/assets/b1bf9aff-8467-4557-9f50-1087721560bd)\n\nTake note that casual riders tend to cycle longer than members do on\naverage. The length of the average journey for members doesn't change\nthroughout the year, week, or day. **However, there are variations in how\nlong casual riders cycle. In the spring and summer, on weekends, and\nfrom 10 am to 2 pm during the day, they travel greater distances.\nBetween five and eight in the morning, they have brief trips.**\n\nThese findings lead to the conclusion **that casual commuters travel\nlonger (approximately 2x more) but less frequently than members. They\nmake longer journeys on weekends and during the day outside of commuting\nhours and in spring and summer season, so they might be doing so for\nrecreation purposes.**\n\nTo further understand the differences in casual and member riders,\nlocations of starting and ending stations can be analysed. Stations with\nthe most trips are considered using filters to draw out the following\nconclusions.\n\n![cyclistic_2023_tableau_04](https://github.com/user-attachments/assets/0e4def75-25ee-4278-8da1-a5ecfa2e2e02)\n\nCasual riders have frequently started their trips from the stations in\n**vicinity of museums, parks, beach, harbor points and aquarium while\nmembers have begun their journeys from stations close to universities,\nresidential areas, restaurants, hospitals, grocery stores, theatre,\nschools, banks, factories, train stations, parks and plazas.**\n\n![cyclistic_2023_tableau_05](https://github.com/user-attachments/assets/7088cb8d-3cdb-49cf-b7c1-c4ea7e3fd357)\n\nSimilar trend can be observed in ending station locations. Casual riders\nend their journey **near parks, museums and other recreational sites\nwhereas members end their trips close to universities, residential and\ncommmercial areas.** So this proves that casual riders use bikes for\nleisure activities while members extensively rely on them for daily\ncommute.\n\n## Summary\n![cyclistic_2023_summary](https://github.com/user-attachments/assets/427d6247-8e1d-4184-af45-82c1058a40aa)\n\n## Act \nAfter identifying the differences between casual and member\nriders, marketing strategies to target casual riders can be developed to\npersuade them to become members.\n\n**Recommendations**: \n1. Marketing campaigns might be conducted in spring and\nsummer at tourist/recreational locations popular among casual riders.\n\n2. Casual riders are most active on weekends and during the summer and\nspring, thus they may be offered seasonal or weekend-only memberships.\n\n4. Casual riders use their bikes for longer duration than members. Offering\ndiscounts for longer rides may incentivize casual riders and entice\nmembers to ride for longer periods of time.\n\n(Reference/Inspired by: https://github.com/SomiaNasir/Google-Data-Analytics-Capstone-Cyclistic-Case-Study)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshrawans007%2Fgoogle_cyclistic_2023","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshrawans007%2Fgoogle_cyclistic_2023","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshrawans007%2Fgoogle_cyclistic_2023/lists"}