Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lisa-ho/small-data-projects
Repository of small data analysis and visualisation projects to try out libraries and create new types of visualisations. Mostly using Python.
https://github.com/lisa-ho/small-data-projects
analysis api dataviz exploration maps pandas python webscraping
Last synced: 6 days ago
JSON representation
Repository of small data analysis and visualisation projects to try out libraries and create new types of visualisations. Mostly using Python.
- Host: GitHub
- URL: https://github.com/lisa-ho/small-data-projects
- Owner: Lisa-Ho
- Created: 2021-11-18T15:52:36.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-03T09:37:10.000Z (26 days ago)
- Last Synced: 2025-01-16T14:26:16.138Z (13 days ago)
- Topics: analysis, api, dataviz, exploration, maps, pandas, python, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 32.5 MB
- Stars: 77
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Small data analysis and viz projects
Repository for small-ish analysis and data viz projects.
### 01/2025 Grid map of europe
Bauhaus inspired tile grid map of europe. Each country is coloured by the first letter of their ISO name.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2025/2501-eurogrid-names/europe-grid-iso-names.ipynb)
### 09/2024 Developer survey
Analysed data from Stackoverflow Developer survey provided by #TidyTuesday and turned it into a [little article](https://inside-numbers.quarto.pub/developer-survey-ai-usage-and-sentiment/) using Quarto. Really late to the party, but love how smooth it is to go from a jupyter notebook to a static site in Quarto.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2409-developer-survey/developer-survey-exploration.ipynb)
### 09/2024 Power Rangers
Been a while since I did a #TidyTuesday challenge. Week 35 was all about the Power Rangers Franchise. I didn't even know it was still going, which is incredible!
So I made a stripplot exploring average ratings of episodes + seasons. Looks like the latest ones (2019, 2020) were quite good :D
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2408-power-rangers/power-rangers-episodes.ipynb)
### 07/2024 Global bike ownership
Came across an [interesting paper](https://www.sciencedirect.com/science/article/abs/pii/S2214140515006787) from 2015 about tracking global bicycle ownership. Couldn't find any updates since then, so decided it's worth to map even if slightly outdated.
Good chance to play around with [pypalettes](https://github.com/JosephBARBIERDARNAL/pypalettes) - a great new python library that makes finding a nice colour map a breeze!
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2407-bike-ownership-rates/bike-owernship-world.ipynb)
### 06/2024 Songs about cycling
I've always wanted to explore the Spotify API. So I combined it with my love of cycling and looked at songs about bikes. Songs about bikes comes from [Wikipedia](https://en.wikipedia.org/wiki/List_of_songs_about_bicycles) which links to a more definite list of 2,300 songs from [Bike&Chain](https://drive.google.com/file/d/15euF9Sz1qbj0Cnhty6tJSrDUwadhBD82/view?usp=share_link). Track features (valence, energy, popularity) comes from Spotify API.
Played around with custom colour maps and gradients for the visualisations. Not sure gradient bar charts are best practice, but def fun! :D
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2405-bicycle-songs/bicycle-songs.ipynb)
### 03/2024 Recycling rates London
Since I first used `pyWaffle` it's been updated and you can now display a waffle chart in specific axes - so I wanted to try it out and made a little tile grid map of London displaying the proportion of household waste sent for recyling in 2022/23.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2403-recycling-rates/london-recycling-rates.ipynb)
### 02/2024 Du Bois Visualization Challenge: 2024
Contributions to the [Du Bois Visualization Challenge 2024](https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge/2024).
The goal is to celebrate the data visualization legacy of W.E.B Du Bois by recreating the visualizations from the 1900 Paris Exposition using modern tools - in my case Python.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2402-dubois-challenge/dubois-challenge.ipynb)
### 01/2024 Highest Paid Athletes
Chart for #MakeoverMonday looking at the World's Highest Paid Athletes. Used this simple data set to explore `plottable` - an awesome python library for creating stunning tables.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2401-highest-paid-athletes/highest-paid-athletes.ipynb)
### 01/2024 Birthdays of Canadian NHL Players
Chart for #TidyTuesday week 2 exploring birth dates of NHL players. Settled on a dotplot/stripplot and spent some time figuring out how to create a broken y-axis.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2024/2401-canadian-nhl-players/canadian-nhl-players.ipynb)
### 12/2023 What C-3PO says
Reused data from Star Wars scripts I cleaned before to try bigram analysis (ie. co-occurence) of words said by 3-CPO. Network graph shows most common words that occured together.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2312-starwars-what-XY-says/starwars-what-XY-says.ipynb)
### 10/2023 US grant opportunities
Contribution to #TidyTuesday exploring grant opportunities in the US. Found a great tutorial on how to create streamgraphs in python with the exact type of curve smoothing I wanted. Can't wait to use it for making a bumpchart.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2310-us-grants/us-grant-opportunities.ipynb)
### 08/2023 Star Wars Scripts - Each line of Anakin episode 1
Found data for all six Star Wars scripts in [this Github repository](https://github.com/jcwieme/data-scripts-star-wars) by Jean Wieme. He kindly made this data available for others to experiment and use it in data viz. So I did. Wanted to plot each line of Anakin and who he speaks to for the first three episodes, but the addressant is only available for the first one. So I focused on getting the chart right for just one episode.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2308-star-wars-scripts/star-wars-scripts.ipynb)
### 08/2023 Fix your bike voucher scheme
Came across the evaluation report for the Fix Your Bike voucher scheme that was run by the UK Department for Transport in 2020-2021 in an effort to increase active travel. The evaluation shows that reported levels of cycling have increased for people who have used their vouchers and got their bikes fixed. Read the full report [here](https://www.gov.uk/government/publications/fix-your-bike-voucher-scheme-evaluation)
Rough sketch for the chart made in python, final design tweaks in Figma. [Code here](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2308-fix-it-boris/fix-it-boris.ipynb)
### 07/2023 Gelaterias of Italy
Map for #MapPromptMonday Desserts showing Ice cream shops in Italy.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2307-gelaterias/gelaterias-map.ipynb)
### 06/2023 Trees of London
Map for #MapPromptMonday Plants showing total trees and main type per 4sqkm.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2306-trees-london/trees-london.ipynb)
### 05/2023 Google Search Autocomplete
Fun playing around with data from Google Search Autcomplete on "Why do cyclists ...". Managed to create a more complex layout in matplotlib using paths and bezier curves. Quite happy with the end result.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2305-google-search-autocomplete/google-search-autocomplete.ipynb)
### 04/2023 Artificial grass
A few weeks ago, I read an article about an increase in number of households replacing their lawns with artificial grass. I was wondering how sellers are promoting artificial grass and manually scraped some websites. Good project to get back to some NLP and text analysis. And I've always wanted to create a histogram that shows text instead of bars.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2304-artificial-grass/artificial-grass-marketing.ipynb)
### 02/2023 Car ownership vs cycling rates in London
For #MapPromptMonday bivariate, created this quick map of cycling rates versus car ownership. Definitely want to dig a bit deeper into car ownership rates at some point. Also created a helper script to speed up bivariate colourmap creation in the future.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2302-car-ownership/cycling-vs-carowners-london-bivariate.ipynb)
### 01/2023 Ridgemapp (WIP)
After falling in love with the ridge-map libarary, I'm currently working on a streamlit app that allows users to create their own ridge map designs. Watch this space ...
### 01/2023 Cultural venues in London
Follow up from maps I created as part of the #30DayMapChallenge 2022. Fun to play around with fishnet grids in python and created some maps that display aggregate data per square km (here: cultural venues in London). Definitely something I'll come back to at some point.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2023/2301-culture-venues-london/cultural-venues-london-maps.ipynb)
### 10/2022 Cycling in London
Continued playing around with cycling data for London. Updated the tilegrid map and created a few more charts. Definitely getting better at customising charts in matplotlib and using different chart types. Also turned the tile grid map into a Streamlit app (live app available [here](https://liloho-london-cycling-rates-app-kgvppz.streamlit.app/)).
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2210-london-cycling/london_cycling_rates_exploration.ipynb)
### 08/2022 Exploring OSMnx for mapping in python
Discovered [OSMnx](https://osmnx.readthedocs.io/en/stable/), a python library to easily extract geospatial data from OpenStreetMaps. Explored a few different ways to extract and display data. Definitely quite a powerful tool.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2208-OSMnx/Exploring-osmnx.ipynb)
### 07/2022 Gender pay gap (WIP)
Triggered by #TidyTuesday, got my hands on some gender pay gap data again. Worked up a couple of charts but wasn't too convinced where this was going. Wanted to focus on Tech vs charities, but haven't fully managed to do that yet. Was focused on trying out some new chart types, including beeswarm and ridge line plots. Might pick it up another time. Data comes from the [TidyTuesday repository](https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-06-28).
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2207-gender-pay-gap/GenderPayGap.ipynb)
### 04/2022 Cycling rates in London
Been looking at London cycling rates and found rates of regular cyclists are higher in inner + SW London. Latest data ending Nov 2020 shows increases in some boroughs since the pandemic. Data sourced from [Active Life Survey](https://www.gov.uk/government/statistical-data-sets/walking-and-cycling-statistics-cw). The map layout is based on the squared map from the [London Data Store](https://data.london.gov.uk/dataset/excel-mapping-template-for-london-boroughs-and-wards). Checkout [my Twitter thread](https://twitter.com/LisaHornung_/status/1514551012694102018?s=20&t=kroA3czupkueRsOOz1DlTw) to see how it's put together.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2204-prop-cycling-London-borough/cycling-rates-london-grid-tile-map.ipynb)
### 04/2022 Google searches of 'bike' across Europe
Another project using Google Search API to look at Google searches of 'bike' across Europe before & since the pandemic. Not much change in Scandinavia but higher interest in some countries suggesting a new normal? 🚲📈
Heavily inspired by Google Trends [The New Normal](http://thenewnormal.is).
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2204-bike-europe-google-trends/bike-searches-pandemic-europe.ipynb)
### 04/2022 Sourdough Google searches
First time using a new library ([pytrends](https://pypi.org/project/pytrends/)) to pull data from the Google Search API. Following a [blog I wrote](https://inside-numbers.com/kneading-to-relax-exploring-lockdown-baking-trends) last year on sourdough baking, was curious if interest in sourdough was only a peak during the pandemic or remained high.
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2204-sourdough-google-trends/sourdough-google-trends-2019-2022.ipynb)
### 03/2022 Star Wars characters
Been playing around with some Star Wars data using [SWAPI](https://t.co/KSn5X00PmE) and found a great template for making nice looking tables in [matplotlib](https://matplotlib.org/matplotblog/posts/how-to-create-custom-tables/).
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2022/2203-starwars-table/notebooks/most-popular-characters-table-viz.ipynb)
### 12/2021 UK Charities and their activities
Analysis using a new dataset that classifies and tags all active and inactive charities in the UK according to their activity/sector. This analysis explores how number of charities in specific activities have changed, whether specific sectors were more "trendy" at some point and whether others have died out. First time doing a streamgraph in python. Data available [here](https://charityclassification.org.uk/data/data-downloads/).
[Full code](https://github.com/Lisa-Ho/small-data-projects/blob/main/2021/2112-charity-class/notebooks/Charity-classification-analysis.ipynb)