{"id":27120238,"url":"https://github.com/kskmemory/exploratory-vs-confirmatory-data-analysis-using-python","last_synced_at":"2025-04-07T09:53:00.795Z","repository":{"id":283508010,"uuid":"952002496","full_name":"kskmemory/Exploratory-vs-Confirmatory-data-analysis-using-Python","owner":"kskmemory","description":"Exploratory vs Confirmatory data analysis using Python","archived":false,"fork":false,"pushed_at":"2025-03-20T15:26:51.000Z","size":2113,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-20T16:34:18.342Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kskmemory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-20T15:21:22.000Z","updated_at":"2025-03-20T15:26:54.000Z","dependencies_parsed_at":"2025-03-20T16:44:53.828Z","dependency_job_id":null,"html_url":"https://github.com/kskmemory/Exploratory-vs-Confirmatory-data-analysis-using-Python","commit_stats":null,"previous_names":["kskmemory/exploratory-vs-confirmatory-data-analysis-using-python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kskmemory%2FExploratory-vs-Confirmatory-data-analysis-using-Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kskmemory%2FExploratory-vs-Confirmatory-data-analysis-using-Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kskmemory%2FExploratory-vs-Confirmatory-data-analysis-using-Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kskmemory%2FExploratory-vs-Confirmatory-data-analysis-using-Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kskmemory","download_url":"https://codeload.github.com/kskmemory/Exploratory-vs-Confirmatory-data-analysis-using-Python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247631452,"owners_count":20970040,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-07T09:52:57.997Z","updated_at":"2025-04-07T09:53:00.656Z","avatar_url":"https://github.com/kskmemory.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Exploratory-vs-Confirmatory-data-analysis-using-Python\nExploratory vs Confirmatory data analysis using Python\n\nKey Takeaways\n\n# Task 1 - Introduction\n\n● EDA or Exploratory Data Analysis is one of the data analysis methods where we use different statistical summaries and graphical representations to perform initial investigations on the data to discover interesting patterns, spot anomalies, and overall, for a better understanding of our data.\n\n● EDA is used to see how our data can be useful.\n\n# Task 2 - Exploratory Data Analysis - Where to start?\n\n● The first step of Data Exploration is to check what kinds of data types we are working with.\n\n● Create a road map for your data exploration based on the different data types you have in your dataset.\n\n● Having a list of different information types (Time, Place, Product, Sales, etc.) that are in your dataset always helps.\n\n# Task 3 Data Exploration - Time and Costumer Information Aspect\n\n● If you have datetime column in your data frame, make sure it has the datetime64 data type.\n\n● To start your data exploration always check the time span of your data.\n\n● If you have datetime column in your data frame, you can explore your data based on different granularity levels (Year, Month, Day, Hour, Minute and Second). For example, you can aggregate the profit gained based on different Years, Months and Days.\n\n● Data aggregation is one of the key required knowledge of data exploration.\n\n● Line charts are the most common visualization techniques used while working with time series data\n\n# Task 4 Data Exploration - Geo Information\n\n● Choropleth maps are the most common visualization techniques used, for exploring Geo Data.\n\n# Task 5 Exploratory Data Analysis - Hierarchical Information about the products\n\n● Sunburst Diagram and Treemap Diagram are two most common data visualization techniques that are used to explore hierarchical data.\n\n● Exploring hierarchical data always can be very insightful. Try to find hierarchical information in your data.\n\n● Time information is also a hierarchical information. You can use Treemap and sunburst diagrams to explore your data based on different hierarchical level (granularity level) such as year, month, day, hour, minute and even second.\n\n# Task 6 Data Exploration - Distributional analysis of sales information columns\n\n● You can apply distribution analysis to any numerical value column in your data.\n\n● You can use statistical summaries to see if there are any outliers in your column.\n\n● Histograms and Box plots are two visualization techniques used for distributional analysis.\n\n● Always pay attention to the skewness of your histogram.\n\n● Right-skewed histogram is telling you there are outliers in the right side of your data range. You can see the tail on the right side of your histogram\n\n● Left-skewed histogram is telling you there are outliers in the left side of your data range. You can see the tail on the left side of your histogram\n\n# Task 7 What is Confirmatory Data Analysis (CDA)? \n\n● Confirmatory Data Analysis is the process of using statistical summary and graphical representations to evaluate the validity of an assumption about the data at hand.\n\n● One of the popular data analysis methods is CDA. Where you make some assumptions about your data, and you start to validate them.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkskmemory%2Fexploratory-vs-confirmatory-data-analysis-using-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkskmemory%2Fexploratory-vs-confirmatory-data-analysis-using-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkskmemory%2Fexploratory-vs-confirmatory-data-analysis-using-python/lists"}