{"id":16133494,"url":"https://github.com/gabboraron/notes_from_data_visualization-kaggle_course","last_synced_at":"2025-04-06T15:27:29.260Z","repository":{"id":174540173,"uuid":"652282002","full_name":"gabboraron/Notes_from_Data_Visualization-Kaggle_course","owner":"gabboraron","description":"Make great data visualizations. A great way to see the power of coding!","archived":false,"fork":false,"pushed_at":"2023-06-11T23:28:51.000Z","size":110,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-12T21:33:38.756Z","etag":null,"topics":["data-visualization","kaggle-courses"],"latest_commit_sha":null,"homepage":"https://www.kaggle.com/learn/data-visualization","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gabboraron.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-11T16:54:22.000Z","updated_at":"2023-06-15T12:23:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"a2183503-12c8-4c7f-baab-9b07db8263f0","html_url":"https://github.com/gabboraron/Notes_from_Data_Visualization-Kaggle_course","commit_stats":null,"previous_names":["gabboraron/kaggle-data_visualization","gabboraron/notes_from_data_visualization-kaggle_course"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabboraron%2FNotes_from_Data_Visualization-Kaggle_course","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabboraron%2FNotes_from_Data_Visualization-Kaggle_course/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabboraron%2FNotes_from_Data_Visualization-Kaggle_course/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabboraron%2FNotes_from_Data_Visualization-Kaggle_course/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gabboraron","download_url":"https://codeload.github.com/gabboraron/Notes_from_Data_Visualization-Kaggle_course/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247502355,"owners_count":20949244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","kaggle-courses"],"created_at":"2024-10-09T22:44:51.305Z","updated_at":"2025-04-06T15:27:29.238Z","avatar_url":"https://github.com/gabboraron.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ccenter\u003e \u003cimg src=\"https://storage.googleapis.com/kaggle-media/learn/images/54BoIBW.png\"\u003e\u003c/img\u003e\u003c/center\u003e\n\n# Kaggle - Data Visualization, by [Alexis Cook](https://www.kaggle.com/alexisbcook)\n\n*These are only my notes, my cheat sheet from the course, go to [Kaggle course page](https://www.kaggle.com/learn/data-visualization), to get more. I highly recommned my [biostatistics course notes \u003e\u003eonly in Hungarian\u003c\u003c](https://github.com/gabboraron/biostatisztika_es_alkalmazasai), related to this topic. For more info: https://seaborn.pydata.org/index.html*\n\nFun fact:\n- [Los Angeles open data](https://data.lacity.org/)\n- [fivethirtyeight.com - The Ultimate Halloween Candy Power Ranking ](https://fivethirtyeight.com/videos/the-ultimate-halloween-candy-power-ranking/)\n\n\n```Python\nsns.lineplot(data=spotify_data)\n```\n\n\n```Python\n# Set the width and height of the figure\nplt.figure(figsize=(10,6))\n\n# Add title\nplt.title(\"Average Arrival Delay for Spirit Airlines Flights, by Month\")\n\n# Bar chart showing average arrival delay for Spirit Airlines flights by month\nsns.barplot(x=flight_data.index, y=flight_data['NK'])\n\n# Add label for vertical axis\nplt.ylabel(\"Arrival delay (in minutes)\")\n```\n\n```Python\n# Lines below will give you a hint or solution code\n#step_3.a.hint()\nplt.figure(figsize=(8, 6))\n# Bar chart showing average score for racing games by platform\nsns.barplot(x=ign_data['Racing'], y=ign_data.index)\n# Add label for horizontal axis\nplt.xlabel(\"\")\n# Add label for vertical axis\nplt.title(\"Average Score for Racing Games, by Platform\")\n```\n\n```Python\n# Set the width and height of the figure\nplt.figure(figsize=(10,10))\n# Heatmap showing average game score by platform and genre\nsns.heatmap(ign_data, annot=True)\n# Add label for horizontal axis\nplt.xlabel(\"Genre\")\n# Add label for vertical axis\nplt.title(\"Average Game Score, by Platform and Genre\")\n```\n\n```Python\nsns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])\n```\nTo double-check the strength of this relationship, you might like to add a regression line, or the line that best fits the data. We do this by changing the command to sns.regplot.\n\n```Python\nsns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])\n```\n\nFor instance, to understand how smoking affects the relationship between BMI and insurance costs, we can color-code the points by 'smoker', and plot the other two columns ('bmi', 'charges') on the axes.\n\n```Python\nsns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])\n\nsns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])\n```\n\nFinally, there's one more plot that you'll learn about, that might look slightly different from how you're used to seeing scatter plots. Usually, we use scatter plots to highlight the relationship between two continuous variables (like \"bmi\" and \"charges\"). However, we can adapt the design of the scatter plot to feature a categorical variable (like \"smoker\") on one of the main axes. We'll refer to this plot type as a categorical scatter plot, and we build it with the sns.swarmplot command.\n\n```Python\nsns.swarmplot(x=insurance_data['smoker'],\n              y=insurance_data['charges'])\n```\n\n```Python\n# Color-coded scatter plot w/ regression lines\nsns.lmplot(x=\"pricepercent\", y=\"winpercent\", hue=\"chocolate\", data=candy_data)\n```\n\n```Python\n# Histogram \nsns.histplot(iris_data['Petal Length (cm)'])\n```\n\n```Python\n# KDE plot, you can think of it as a smoothed histogram.\nsns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)\n```\n\nNote that in addition to the 2D KDE plot in the center,\n- the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, `iris_data['Petal Length (cm)'])`, and\n- the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, `iris_data['Sepal Width (cm)'])`.\n\n```PYthon\n# 2D KDE plot\nsns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind=\"kde\")\n```\n\n![plot type](https://storage.googleapis.com/kaggle-media/learn/images/LPWH19I.png)\n\nTrends - A trend is defined as a pattern of change.\n- `sns.lineplot` - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.\nRelationship - There are many different chart types that you can use to understand relationships between variables in your data.\n- `sns.barplot` - Bar charts are useful for comparing quantities corresponding to different groups.\n- `sns.heatmap` - Heatmaps can be used to find color-coded patterns in tables of numbers.\n- `sns.scatterplot` - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.\n- `sns.regplot` - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.\n- `sns.lmplot` - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.\n- `sns.swarmplot` - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.\nDistribution - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.\n- `sns.histplot` - Histograms show the distribution of a single numerical variable.\n- `sns.kdeplot` - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).\n- `sns.jointplot` - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.\n\n\n```Python\n# Change the style of the figure\nsns.set_style(\"dark\")\n\n# Line chart \nplt.figure(figsize=(12,6))\nsns.lineplot(data=spotify_data)\n\n# Mark the exercise complete after the code cell is run\nstep_1.check()\n```\n\n\u003e Seaborn has five different themes: (1)\"darkgrid\", (2)\"whitegrid\", (3)\"dark\", (4)\"white\", and (5)\"ticks\", and you need only use a command similar to the one in the code cell above (with the chosen theme filled in) to change it.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabboraron%2Fnotes_from_data_visualization-kaggle_course","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabboraron%2Fnotes_from_data_visualization-kaggle_course","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabboraron%2Fnotes_from_data_visualization-kaggle_course/lists"}