{"id":15060667,"url":"https://github.com/darshanjanidev/google_capstone_project_bike_trip_data_case-study","last_synced_at":"2026-01-02T06:04:26.990Z","repository":{"id":242061284,"uuid":"808522960","full_name":"darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study","owner":"darshanjanidev","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-01T07:58:58.000Z","size":42,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T22:09:57.627Z","etag":null,"topics":["bigquery","capstone-project","data-analysis","data-visualization","database","divvy-bikes","documentation","mysql","sql","ssms","tableau"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/darshanjanidev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-31T08:45:22.000Z","updated_at":"2024-06-01T07:59:02.000Z","dependencies_parsed_at":"2024-11-21T04:02:38.113Z","dependency_job_id":"9bf3d0e0-036c-43f6-82d3-a4a9e6dac0c8","html_url":"https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study","commit_stats":{"total_commits":6,"total_committers":1,"mean_commits":6.0,"dds":0.0,"last_synced_commit":"df9e3e4ac3938118558cf8368a26b265906fba12"},"previous_names":["darshanjanidev/google_capstone_project_bike_trip_data_case-study"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darshanjanidev%2FGoogle_Capstone_Project_bike_trip_data_case-study","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darshanjanidev%2FGoogle_Capstone_Project_bike_trip_data_case-study/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darshanjanidev%2FGoogle_Capstone_Project_bike_trip_data_case-study/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darshanjanidev%2FGoogle_Capstone_Project_bike_trip_data_case-study/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/darshanjanidev","download_url":"https://codeload.github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243695589,"owners_count":20332629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","capstone-project","data-analysis","data-visualization","database","divvy-bikes","documentation","mysql","sql","ssms","tableau"],"created_at":"2024-09-24T23:02:29.710Z","updated_at":"2026-01-02T06:04:26.914Z","avatar_url":"https://github.com/darshanjanidev.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Capstone_Project_bike_trip_data_case-study\nCourse: [Google Data Analytics Capstone: Complete a Case Study](https://www.coursera.org/learn/google-data-analytics-capstone)\n## Introduction\nIn this case study, I will perform many real-world tasks of a junior data analyst at a fictional company, Cyclistic. In order to answer the key business questions, I will follow the steps of the data analysis process: [Ask](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/blob/main/README.md#ask), [Prepare](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/blob/main/README.md#prepare), [Process](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-studyblob/main/README.md#process), [Analyze](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/blob/main/README.md#analyze-and-share), [Share](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/blob/main/README.md#analyze-and-share), and [Act](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study#act).\n  \nData Visualizations: [Tableau](https://public.tableau.com/app/profile/darshan.jani/viz/Case_Study-Bike-Trip-data/BikeTypes#1)  \n## Background\n### Cyclistic\nA bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.   \n  \nUntil now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.  \n  \nCyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno (the director of marketing and my manager) believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.  \n\nMoreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.  \n\n### Scenario\nI am assuming to be a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve our recommendations, so they must be backed up with compelling data insights and professional data visualizations.\n\n## Ask\n### Business Task\nDevise marketing strategies to convert casual riders to members.\n### Analysis Questions\nThree questions will guide the future marketing program:  \n1. How do annual members and casual riders use Cyclistic bikes differently?  \n2. Why would casual riders buy Cyclistic annual memberships?  \n3. How can Cyclistic use digital media to influence casual riders to become members?  \n\nMoreno has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?\n## Prepare\n### Data Source\nI will use Cyclistic’s historical trip data to analyze and identify trends from Jan 2022 to Dec 2022 which can be downloaded from [divvy_tripdata](https://divvy-tripdata.s3.amazonaws.com/index.html). The data has been made available by Motivate International Inc. under this [license](https://www.divvybikes.com/data-license-agreement).  \n  \nThis is public data that can be used to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit from using riders’ personally identifiable information. This means that we won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.\n\nThis step will address the data source that will be used for the analysis and the organization of the data structure.\n\n**Data Source**:  Cyclistic’s historical trip data from Jan 2022 to Dec 2022 which is a public dataset published by Motivate International Inc. will be used to analyze and identify trends. [Click Here](https://divvy-tripdata.s3.amazonaws.com/index.html) for the dataset.\n\n**Data Information**: In the data source, there are 12 files in total following the naming convention of *\"YYYYMM-divvy-tripdata\"*. Each file contains data for a specific month, including other details such as ride ID, bike type, start time, end time, start station, end station, start location, end location, and member status. The corresponding column names are:\n- ride_id\n- rideable_type\n- started_at\n- ended_at\n- start_station_name\n- start_station_id\n- end_station_name\n- end_station_id\n- start_lats\n- start_lng\n- end_lat\n- end_lng\n- member_casual\n  \n\n## Process\n\n**Tool**: SQL Server Management Studio 20 is used to combine the total 12 files into one dataset.\n\n#### Data Combination\n\nTables representing 12 CSV files have been uploaded to the case_study dataset. To help with data combination, the following SQL query is implemented in order to combine all 12 files into a single dataset. A new table named ***\"all_tripdata\"*** has been generated using the following code.:\n\n```\nUSE case_study;\n\n-- Step 1: Create the table if it does not exist\nIF NOT EXISTS (\n    SELECT * \n    FROM sys.tables \n    WHERE name = '2023_24_tripdata_all_tripdata' AND schema_id = SCHEMA_ID('dbo')\n)\nBEGIN\n    CREATE TABLE dbo.[2023_24_tripdata_all_tripdata] (\n        ride_id NVARCHAR(50),\n        rideable_type NVARCHAR(50),\n        started_at DATETIME2(7),\n        ended_at DATETIME2(7),\n        start_station_name NTEXT,\n        start_station_id NTEXT,\n        end_station_name NTEXT,\n        end_station_id NTEXT,\n        start_lat FLOAT,\n        start_lng FLOAT,\n        end_lat FLOAT,\n        end_lng FLOAT,\n        member_casual NVARCHAR(50)\n    );\nEND;\n\n-- Step 2: Populate the table with data\nINSERT INTO dbo.[2023_24_tripdata_all_tripdata] (\n    ride_id, \n    rideable_type, \n    started_at, \n    ended_at, \n    start_station_name, \n    start_station_id, \n    end_station_name, \n    end_station_id, \n    start_lat, \n    start_lng, \n    end_lat, \n    end_lng, \n    member_casual\n)\nSELECT * FROM [case_study].[dbo].[202305-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202306-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202307-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202308-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202309-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202310-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202311-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202312-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202401-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202402-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202403-divvy-tripdata]\nUNION ALL\nSELECT * FROM [case_study].[dbo].[202404-divvy-tripdata];\n\n```\n\nThen, to check the total row numbers, we perform this SQL query. The new dataset ***\"all_tripdata\"*** holds a total of 5,738,612 data rows encompassing the entire year:\n\n```\nSELECT Count(*)\n  FROM [case_study].[dbo].[2023_24_tripdata_all_tripdata]\n```\nWe perform the following code to show the first 10 rows of the dataset in order to understand the dataset better\n\n```\nUSE case_study;\n\nSELECT TOP 10 *\nFROM dbo.[2023_24_tripdata_all_tripdata];\n\n```\n#### Data Exploration\n\nIn order to do data exploration, the first thing to do is to check the data type to observe the inconsistencies. After checking, we have seen that the entire dataset has the ride_id as the primary key:\n\n```\nUSE case_study;\n\nSELECT COLUMN_NAME, DATA_TYPE\nFROM INFORMATION_SCHEMA.COLUMNS\nWHERE TABLE_SCHEMA = 'dbo' \nAND TABLE_NAME = '2023_24_tripdata_all_tripdata';\n\n```\nTo help ensure data cleanness, we have to check if the dataset has any null values in any column. However, it appears that there are no ***null*** values in the dataset:\n\n```\nUSE case_study;\n\nSELECT \n    COUNT(*) - COUNT(ride_id) AS ride_id,\n    COUNT(*) - COUNT(rideable_type) AS rideable_type,\n    COUNT(*) - COUNT(started_at) AS started_at,\n    COUNT(*) - COUNT(ended_at) AS ended_at,\n    COUNT(*) - COUNT(CONVERT(nvarchar(max), start_station_name)) AS start_station_name,\n    COUNT(*) - COUNT(CONVERT(nvarchar(max), start_station_id)) AS start_station_id,\n    COUNT(*) - COUNT(CONVERT(nvarchar(max), end_station_name)) AS end_station_name,\n    COUNT(*) - COUNT(CONVERT(nvarchar(max), end_station_id)) AS end_station_id,\n    COUNT(*) - COUNT(start_lat) AS start_lat,\n    COUNT(*) - COUNT(start_lng) AS start_lng,\n    COUNT(*) - COUNT(end_lat) AS end_lat,\n    COUNT(*) - COUNT(end_lng) AS end_lng,\n    COUNT(*) - COUNT(member_casual) AS member_casual\nFROM dbo.[2023_24_tripdata_all_tripdata];\n\n```\n\nAfter checking the null values, we also need to check if the dataset has any duplicate values. By performing this following code, it appears that we have no duplicate values:\n\n```\n\nUSE case_study;\n\nSELECT COUNT(ride_id) - COUNT(DISTINCT ride_id) AS duplicate_rows\nFROM dbo.[2023_24_tripdata_all_tripdata];\n\n```\n\nRetrieve the records of the rideable_type column to see the different bike types: electric_bike, classical_bike, docked_bike\n\n```\nUSE case_study;\n\nSELECT rideable_type, COUNT(rideable_type) AS trip_type\nFROM dbo.[2023_24_tripdata_all_tripdata]\nGROUP BY rideable_type;\n\n```\nRetrieve the records of the member_casual column to check the different member types: member, casual\n\n```\nuse case_study\nSELECT DISTINCT member_casual, COUNT(*) AS count_member_type\nFROM dbo.[2023_24_tripdata_all_tripdata]\nGROUP BY member_casual;\n```\nThe trip starts and end times are indicated in the format YYYY-MM-DD hh:mm:ss UTC in the columns *\"started_at\"* and *\"ended_at.\"* By introducing a new column called ***\"ride_length\"*** we can compute the total trip duration. It is necessary to exclude 7343 trips with a duration exceeding one day and 139,873 trips with a duration less than a minute or end times earlier than start times.\n\nColumns such as ***\"day_of_week\"*** and ***\"month\"*** can offer valuable insights into analyzing trips at various times throughout the year.\n\nTo enhance data integrity, 885429 rows with missing values in both \"start_station_name\" and \"start_station_id\" should be eliminated. Similarly, 885,429 rows with missing values in both ***\"end_station_name\"*** and ***\"end_station_id\"*** and 7343 rows with missing values in both ***\"end_lat\"*** and ***\"end_lng\"*** should also be removed.\n\n```\nUSE case_study;\n\nSELECT TOP 20 started_at, ended_at\nFROM dbo.[2023_24_tripdata_all_tripdata];\n\n\nUSE case_study;\n\nSELECT COUNT(*) AS longer_than_1_day\nFROM dbo.[2023_24_tripdata_all_tripdata]\nWHERE DATEDIFF(MINUTE, started_at, ended_at) \u003e= 1440;  -- 1440 minutes = 1 day 7343 rows\n\n\nUSE case_study;\n\nSELECT COUNT(*) AS less_than_a_minute\nFROM dbo.[2023_24_tripdata_all_tripdata]\nWHERE DATEDIFF(SECOND, started_at, ended_at) \u003c= 60;  -- 60 seconds = 1 minute  139873 rows\n\n```\n\nIn the dataset, there are 885,429 rows where both the ***\"start_station_name\"*** and ***\"start_station_id\"*** values are missing.\n\n```\nUSE case_study;\n\nSELECT DISTINCT \n    CONVERT(nvarchar(max), start_station_name) AS start_station_name\nFROM dbo.[2023_24_tripdata_all_tripdata]\nORDER BY start_station_name;\n\n\n\nUSE case_study;\n\nSELECT COUNT(ride_id) AS start_station_null   \nFROM dbo.[2023_24_tripdata_all_tripdata]\nWHERE start_station_name IS NULL OR start_station_id IS NULL; \n\n```\n\nThere are also 885,429 rows in which both the ***\"end_station_name\"*** and ***\"end_station_id\"*** values are absent.\n\n```\nUSE case_study;\n\nSELECT DISTINCT \n    CONVERT(nvarchar(max), start_station_name) AS start_station_name\nFROM dbo.[2023_24_tripdata_all_tripdata]\nORDER BY start_station_name;\n\n\nUSE case_study;\n\nSELECT COUNT(ride_id) AS start_station_null   \nFROM dbo.[2023_24_tripdata_all_tripdata]\nWHERE start_station_name IS NULL OR start_station_id IS NULL;  \n\n\n```\nIn the dataset, there are a total of 7,613 rows where both the ***\"end_lat\"*** and ***\"end_lng\"*** values are missing.\n```\nUSE case_study;\n\nSELECT COUNT(ride_id) AS end_loc_null\nFROM dbo.[2023_24_tripdata_all_tripdata]\nWHERE end_lat IS NULL OR end_lng IS NULL;\n\n```\n#### Data Cleaning\n\nIn step, a new table will be created for cleaned data which is easier for analysis. Therefore, the following steps are implemented:\n\n - First, any rows containing missing values are removed from the dataset.\n - 3 new columns, namely ***\"ride_length\"*** to indicate the trip duration, ***\"day_of_week\"*** to specify the day of the week, and ***\"month\"*** to represent the month, are added.\n - Trips with durations less than a minute and longer than a day are excluded, leading to the removal of a total of 4,224,259 rows during this process.\n\n\nCreate a new table called ***\"alldata_cleaned\"*** with the following code:\n```\nUSE case_study;\n\n-- Create the new table\nCREATE TABLE [2023_24_tripdata].[alldata_cleaned] (\n    ride_id nvarchar(50),\n    rideable_type nvarchar(50),\n    started_at datetime2(7),\n    ended_at datetime2(7),\n    ride_length float,\n    day_of_week nvarchar(3),\n    month nvarchar(3),\n    start_station_name ntext,\n    end_station_name ntext,\n    start_lat float,\n    start_lng float,\n    end_lat float,\n    end_lng float,\n    member_casual nvarchar(50)\n);\n\n-- Insert data into the new table\nINSERT INTO [2023_24_tripdata].[alldata_cleaned] (\n    ride_id, rideable_type, started_at, ended_at, \n    ride_length, day_of_week, month,\n    start_station_name, end_station_name, \n    start_lat, start_lng, end_lat, end_lng, member_casual\n)\nSELECT \n    a.ride_id, a.rideable_type, a.started_at, a.ended_at, \n    DATEDIFF(MINUTE, a.started_at, a.ended_at) AS ride_length,\n    CASE DATEPART(WEEKDAY, a.started_at) \n      WHEN 1 THEN 'Sun'\n      WHEN 2 THEN 'Mon'\n      WHEN 3 THEN 'Tue'\n      WHEN 4 THEN 'Wed'\n      WHEN 5 THEN 'Thu'\n      WHEN 6 THEN 'Fri'\n      WHEN 7 THEN 'Sat'    \n    END AS day_of_week,\n    CASE DATEPART(MONTH, a.started_at)\n      WHEN 1 THEN 'Jan'\n      WHEN 2 THEN 'Feb'\n      WHEN 3 THEN 'Mar'\n      WHEN 4 THEN 'Apr'\n      WHEN 5 THEN 'May'\n      WHEN 6 THEN 'Jun'\n      WHEN 7 THEN 'Jul'\n      WHEN 8 THEN 'Aug'\n      WHEN 9 THEN 'Sep'\n      WHEN 10 THEN 'Oct'\n      WHEN 11 THEN 'Nov'\n      WHEN 12 THEN 'Dec'\n    END AS month,\n    a.start_station_name, a.end_station_name, \n    a.start_lat, a.start_lng, a.end_lat, a.end_lng, a.member_casual\nFROM [2023_24_tripdata].[all_tripdata] a\nWHERE \n    a.start_station_name IS NOT NULL AND\n    a.end_station_name IS NOT NULL AND\n    a.end_lat IS NOT NULL AND\n    a.end_lng IS NOT NULL AND\n    DATEDIFF(MINUTE, a.started_at, a.ended_at) \u003e 1 AND \n    DATEDIFF(MINUTE, a.started_at, a.ended_at) \u003c 1440;\n\n```\n\nSet ***\"ride_id\"*** as the primary key for the new table and remove rows:\n\n```\nUSE case_study;\n\nALTER TABLE dbo.[2023_24_tripdata].[alldata_cleaned]\nADD PRIMARY KEY (ride_id);\n\nUSE case_study;\n\nSELECT COUNT(ride_id) AS no_of_rows\nFROM dbo.[2023_24_tripdata_alldata_cleaned]\n\n```\n\n  \n## Analyze and Share\n\n#### Data Analysis\n\nIn step, all the reports are saved individually in .csv format for visual analysis.\n\n```\n-- bike type used by riders\n USE case_study;\n\nSELECT member_casual, rideable_type, COUNT(*) AS total_trips\nFROM dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY member_casual, rideable_type\nORDER BY member_casual, total_trips;\n\n-- No. of trips per month \n\nUSE case_study;\n\nSELECT month, member_casual, COUNT(ride_id) AS total_trips\nFROM dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY month, member_casual\nORDER BY member_casual;\n\n-- no. of trips per day of week\n\nUSE case_study;\n\nSELECT day_of_week, member_casual, COUNT(ride_id) AS total_trips\nFROM dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY day_of_week, member_casual\nORDER BY member_casual;\n\n-- no. of trips per hour\n\nuse case_study;\nSELECT \n    FORMAT(started_at, 'hh tt') AS hour_of_day_am_pm, \n    member_casual, \n    COUNT(ride_id) AS total_trips\nFROM \n    dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY \n    FORMAT(started_at, 'hh tt'), \n    member_casual\nORDER BY \n    member_casual, \n    hour_of_day_am_pm;\n\n\n-- average ride_length per month\n\nUSE case_study;\n\nSELECT \n    FORMAT(started_at, 'MMM') AS month, \n    member_casual, \n    AVG(DATEDIFF(MINUTE, started_at, ended_at)) AS avg_ride_duration\nFROM \n    dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY \n    FORMAT(started_at, 'MMM'), \n    member_casual\norder by\n\t\tmonth,\n\tmember_casual;\n\n\t\n-- average ride_length per day of week\n\n\tUSE case_study;\n\nSELECT \n    DATENAME(WEEKDAY, started_at) AS day_of_week, \n    member_casual, \n    AVG(DATEDIFF(MINUTE, started_at, ended_at)) AS avg_ride_duration\nFROM \n    dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY \n    DATENAME(WEEKDAY, started_at), \n    member_casual;\n\t\n-- average ride_length per hour\n\n\tUSE case_study;\n\nSELECT \n    FORMAT(started_at, 'hh tt') AS hour_of_day_am_pm, \n    member_casual, \n    AVG(ride_length) AS avg_ride_duration\nFROM \n    dbo.[2023_24_tripdata_alldata_cleaned]\nGROUP BY \n    FORMAT(started_at, 'hh tt'), \n    member_casual\nORDER BY \n    member_casual, \n    hour_of_day_am_pm;\n\n\n-- starting station locations\n\t\nUSE case_study;\n\nSELECT \n    start_station_name, \n    member_casual,\n    AVG(start_lat) AS avg_start_lat,\n    AVG(start_lng) AS avg_start_lng,\n    COUNT(ride_id) AS total_trips\nFROM \n    (\n        SELECT \n            CAST(start_station_name AS nvarchar(50)) AS start_station_name,\n            member_casual,\n            CAST(start_lat AS FLOAT) AS start_lat,\n            CAST(start_lng AS FLOAT) AS start_lng,\n            ride_id\n        FROM \n            dbo.[2023_24_tripdata_alldata_cleaned]\n    ) AS subquery\nGROUP BY \n    start_station_name, \n    member_casual;\n\n\n-- ending station locations\n\nUSE case_study;\n\nSELECT \n    end_station_name, \n    member_casual,\n    AVG(end_lat) AS avg_end_lat,\n    AVG(end_lng) AS avg_end_lng,\n    COUNT(ride_id) AS total_trips\nFROM \n    (\n        SELECT \n            CAST(end_station_name AS nvarchar(50)) AS end_station_name,\n            member_casual,\n            CAST(end_lat AS FLOAT) AS end_lat,\n            CAST(end_lng AS FLOAT) AS end_lng,\n            ride_id\n        FROM \n            dbo.[2023_24_tripdata_alldata_cleaned]\n    ) AS subquery\nGROUP BY \n    end_station_name, \n    member_casual;\n```\n\n\n\n### Data Visualization: [Tableau](https://public.tableau.com/app/profile/darshan.jani/viz/Case_Study-Bike-Trip-data/BikeTypes#1)  \n\nThe data is stored appropriately and is now prepared for analysis. I queried multiple relevant tables for the analysis and visualized them in Tableau.  \nThe analysis question is: How do annual members and casual riders use Cyclistic bikes differently?  \n\nFirst of all, member and casual riders are compared by the type of bikes they are using.  \n\n![Bike Types](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/assets/88767282/5a7efe9f-359a-4938-b544-3b5133bcb572)\n  \nThe members make 64.71% of the total while remaining 35.29% constitutes casual riders. Each bike type chart shows percentage from the total. Most used bike is classic bike followed by the electric bike. Docked bikes are used the least by only casual riders. \n  \nNext the number of trips distributed by the months, days of the week and hours of the day are examined.  \n  \n![Total Trips per Month_day_Hour](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/assets/88767282/0f184e16-3343-4aaf-b024-5b6606f75411)\n  \n__Months:__ When it comes to monthly trips, both casual and members exhibit comparable behavior, with more trips in the spring and summer and fewer in the winter. The gap between casuals and members is closest in the month of july in summmer.   \n__Days of Week:__ When the days of the week are compared, it is discovered that casual riders make more journeys on the weekends while members show a decline over the weekend in contrast to the other days of the week.  \n__Hours of the Day:__ The members shows 2 peaks throughout the day in terms of number of trips. One is early in the morning at around 6 am to 8 am and other is in the evening at around 4 pm to 8 pm while number of trips for casual riders increase consistently over the day till evening and then decrease afterwards.  \n  \nWe can infer from the previous observations that member may be using bikes for commuting to and from the work in the week days while casual riders are using bikes throughout the day, more frequently over the weekends for leisure purposes. Both are most active in summer and spring.  \n  \nRide duration of the trips are compared to find the differences in the behavior of casual and member riders.  \n  \n![Average Ride Diration per Month_Day_hour](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/assets/88767282/9ea2ac32-7b1e-4f02-b3bd-3790f9f67110)\n  \nTake note that casual riders tend to cycle longer than members do on average. The length of the average journey for members doesn't change throughout the year, week, or day. However, there are variations in how long casual riders cycle. In the spring and summer, on weekends, and from 10 am to 2 pm during the day, they travel greater distances. Between five and eight in the morning, they have brief trips.\n  \nThese findings lead to the conclusion that casual commuters travel longer (approximately 2x more) but less frequently than members. They make longer journeys on weekends and during the day outside of commuting hours and in spring and summer season, so they might be doing so for recreation purposes.    \n  \nTo further understand the differences in casual and member riders, locations of starting and ending stations can be analysed. Stations with the most trips are considered using filters to draw out the following conclusions.  \n  \n![Total Trips Strat Location](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/assets/88767282/485c519e-b327-4d3c-a356-f4737a7ab9b8)\n  \nCasual riders have frequently started their trips from the stations in vicinity of museums, parks, beach, harbor points and aquarium while members have begun their journeys from stations close to universities, residential areas, restaurants, hospitals, grocery stores, theatre, schools, banks, factories, train stations, parks and plazas.  \n\n![Total Trips End Location](https://github.com/darshanjanidev/Google_Capstone_Project_bike_trip_data_case-study/assets/88767282/3475c1ac-98e6-4069-b038-83c0fd57620d)\n\n  \nSimilar trend can be observed in ending station locations. Casual riders end their journay near parks, museums and other recreational sites whereas members end their trips close to universities, residential and commmercial areas. So this proves that casual riders use bikes for leisure activities while members extensively rely on them for daily commute.  \n  \nSummary:\n  \n|Casual|Member|\n|------|------|\n|Prefer using bikes throughout the day, more frequently over the weekends in summer and spring for leisure activities.|Prefer riding bikes on week days during commute hours (8 am / 5pm) in summer and spring.|\n|Travel 2 times longer but less frequently than members.|Travel more frequently but shorter rides (approximately half of casual riders' trip duration).|\n|Start and end their journeys near parks, museums, along the coast and other recreational sites.|Start and end their trips close to universities, residential and commercial areas.|  \n  \n## Act\n\n### Recommendation on Insights\n\n1. **Bike Type Usage**:\n   - **Classic and Electric Bikes**: Both annual members and casual riders predominantly use classic bikes, followed by electric bikes. Casual riders show a significant preference for docked bikes, unlike members.\n   - **Recommendation**: Promote the benefits and convenience of annual membership, emphasizing the ease of access to classic and electric bikes. Highlight exclusive offers or additional perks for docked bikes within memberships.\n\n2. **Monthly Trends**:\n   - **Peak Months**: Both user types peak in spring and summer, with July showing the smallest gap between casual and member rides.\n   - **Recommendation**: Launch membership promotions during spring and summer, leveraging the high usage periods. Offer seasonal discounts or incentives for casual riders to switch to annual memberships.\n\n3. **Weekly Patterns**:\n   - **Weekend Rides**: Casual riders prefer weekends, whereas members' rides drop during weekends.\n   - **Recommendation**: Create weekend-specific membership benefits, such as weekend ride discounts or family packages, to attract casual riders who primarily use bikes on weekends.\n\n4. **Daily Ride Patterns**:\n   - **Commuter Peaks**: Members have two peaks corresponding to typical commuting times, while casual riders show a gradual increase in trips throughout the day.\n   - **Recommendation**: Position annual membership as a cost-effective and convenient commuting option. Offer incentives like commuter packages or partnerships with local businesses for discounts during commuting hours.\n\n### Digital Media Strategies\n\n1. **Targeted Advertising**:\n   - Use data-driven insights to create targeted ad campaigns that highlight the convenience and cost savings of annual memberships.\n   - Focus on digital channels like social media, email newsletters, and mobile apps to reach casual riders effectively.\n\n2. **Engaging Content**:\n   - Develop engaging content that showcases the benefits of being a member, such as ease of access, availability of different bike types, and exclusive member perks.\n   - Use testimonials and stories from current members to build trust and credibility.\n\n3. **Promotional Campaigns**:\n   - Run limited-time promotions during peak usage months and weekends to entice casual riders to upgrade to annual memberships.\n   - Offer referral bonuses for existing members who bring in new annual members from the pool of casual riders.\n\n4. **Mobile App Enhancements**:\n   - Improve the mobile app experience by adding features like ride tracking, personalized recommendations, and seamless membership upgrade options.\n   - Push notifications about exclusive member benefits, upcoming promotions, and personalized offers based on casual riders' usage patterns.\n\n### Conclusion\n\nBy understanding the distinct usage patterns of annual members and casual riders, Cyclistic can design effective marketing strategies to convert casual riders into annual members. Leveraging digital media, targeted advertising, engaging content, and well-timed promotions can maximize the impact and drive membership growth, ensuring future success for Cyclistic.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarshanjanidev%2Fgoogle_capstone_project_bike_trip_data_case-study","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarshanjanidev%2Fgoogle_capstone_project_bike_trip_data_case-study","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarshanjanidev%2Fgoogle_capstone_project_bike_trip_data_case-study/lists"}