Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hevalhazalkurt/exploring_the_data_of_lego_history
A data exploration project on LEGO history in Python with pandas, matplotlib etc. (WIP)
https://github.com/hevalhazalkurt/exploring_the_data_of_lego_history
data data-analysis data-science data-visualization datascience datasets lego lego-history matplotlib pandas python python3
Last synced: 14 days ago
JSON representation
A data exploration project on LEGO history in Python with pandas, matplotlib etc. (WIP)
- Host: GitHub
- URL: https://github.com/hevalhazalkurt/exploring_the_data_of_lego_history
- Owner: hevalhazalkurt
- Created: 2019-12-19T11:05:03.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-12-23T19:52:54.000Z (about 5 years ago)
- Last Synced: 2024-11-20T05:29:38.184Z (3 months ago)
- Topics: data, data-analysis, data-science, data-visualization, datascience, datasets, lego, lego-history, matplotlib, pandas, python, python3
- Language: Python
- Size: 5.24 MB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Exploring The Data of LEGO History
A data exploration project on LEGO history in Python with pandas, matplotlib etc. (WIP)
***Data source : [Rebrickable](https://rebrickable.com/downloads/)***
---
## Dataset Informations
| Dataset | Description | Columns |
|--|--|--|
| `colors.csv` | This file contains information on LEGO colors, including a unique ID for each color, its name, and approximate RGB value, and whether it's transparent | `id` Unique ID for this color.
`name` The human-readable name of the color.
`rgb` The approximate RGB color.
`is_trans` Whether or not the given color is transparent/translucent.|
| `inventories.csv` | This table contains information on inventories, including a unique ID, it's version and the set number. | `id` Unique ID for this inventory entry.
`version` Version number.
`set_num` Set number (form `sets.csv`). |
| `inventory_parts.csv` | This table contains information part inventories, including a unique ID number, the part number, the color of the part, how many are included and whether it's a spare. | `inventory_id` Unique ID for the inventory this part is appearing in. This is the same as the id value in `inventories.csv`.
`part_num` Unique ID for the part.
`color_id` Unique ID for the color, as per `colors.csv`.
`quantity` The number of copies of this part included in the set!
`is_spare` Whether or not this is a spare part. Spare parts are additional parts not needed to finish the set. |
| `inventory_sets.csv` | This file contains information on what inventory is included in which sets, including the inventory ID, the set number and the quantity of that inventory that are included. | `inventory_id` Unique inventory ID from `inventories.csv`.
`set_num` Unique set ID from `sets.csv`.
`quantity` The quantity of the inventory included. |
| `part_categories.csv` | This dataset includes information on the part category (what type of part it is) and a unique ID for that part category. | `id` Unique ID for the part category.
`name` The category of stuff the part is in. |
| `part_relationships.csv` | This dataset includes information on the different relationships of parts. | `rel_type` Relationship type of the part.
`child_part_num` Child part category ID.
`parent_part_num` Parent category unique ID |
| `parts.csv` | This dataset includes information on lego parts, including a unique ID number, the name of the part, and what part category it's from. | `part_num` Unique ID for the part.
`name` Name of the part.
`part_cat_id` Part category unique ID (from `part_categories.csv`). |
| `sets.csv` | This file contains information on LEGO sets, including a unique ID number, the name of the set, the year it was released, its theme and how many parts it includes. | `set_num` Unique set ID.
`name` The name of the set.
`year` Year the set was published.
`theme_id` Unique ID for the theme used for the set (from `themes.csv`).
`num_parts` The number of parts included in the set. |
| `themes.csv` | This file includes information on lego themes. Each theme is given a unique ID number, a name, and (if it's part of a bigger theme) which theme it's part of. | `id` Theme unique ID.
`name` Name of the theme.
`parent_id` Unique ID for the larger theme, if there is one. |---
## Basic Understanding of Datasets
The schema of database :
![](visuals/database_schema_v2.png)
---
## Exploring Colors
You can find source code of this section on `lego_colors.py` file.
**First rows of `colors.csv`**
```
id name rgb is_trans
0 -1 [Unknown] 0033B2 f
1 0 Black 05131D f
2 1 Blue 0055BF f
3 2 Green 237841 f
4 3 Dark Turquoise 008F9B f
```
| Data | Result | Detail |
|--|--|--|
| Colors | 179 | The number of colors available |
| Non-transparent | 151 | Number of non-transparent colors. |
| Transparent | 28 | Number of transparent colors. |
![](visuals/is_trans.png)
![](visuals/main_colors.png)
![](visuals/lego_colors.png)
---
## Exploring Sets
You can find source code of this section on `lego_sets.py` file.
**First rows of `sets.csv`**
```
set_num name year theme_id num_parts
0 001-1 Gears 1965 1 43
1 0011-2 Town Mini-Figures 1978 84 12
2 0011-3 Castle 2 for 1 Bonus Offer 1987 199 2
3 0012-1 Space Mini-Figures 1979 143 12
4 0013-1 Space Mini-Figures 1979 143 12
```![](visuals/sets_by_year.png)
![](visuals/parts_by_year.png)
---
## Exploring Themes
You can find source code of this section on `lego_themes.py` file.
**First rows of `sets.csv`**
```
set_num name year theme_id num_parts
0 001-1 Gears 1965 1 43
1 0011-2 Town Mini-Figures 1978 84 12
2 0011-3 Castle 2 for 1 Bonus Offer 1987 199 2
3 0012-1 Space Mini-Figures 1979 143 12
4 0013-1 Space Mini-Figures 1979 143 12
```
**First rows of `themes.csv`**
```
id name parent_id
0 1 Technic NaN
1 2 Arctic Technic 1.0
2 3 Competition 1.0
3 4 Expert Builder 1.0
4 5 Model 1.0
```![](visuals/toptenthemes.png)