Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datamgmt/music-chart-downloader
Download Music Chart Data
https://github.com/datamgmt/music-chart-downloader
beautifulsoup csv json music python
Last synced: 13 days ago
JSON representation
Download Music Chart Data
- Host: GitHub
- URL: https://github.com/datamgmt/music-chart-downloader
- Owner: datamgmt
- Created: 2024-05-02T09:14:04.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-08-29T07:14:47.000Z (5 months ago)
- Last Synced: 2024-12-06T19:55:35.717Z (about 2 months ago)
- Topics: beautifulsoup, csv, json, music, python
- Language: Python
- Homepage:
- Size: 377 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Music Chart Downloader
Music Chart Downloader (mcd.py) by David M Walker
(c) 2024 Data Management & Warehousing
Extract chart data from chart websites and store in files
Currently supports OfficialCharts.com Singles & Albums charts
## Overview
* This utility downloads the charts in html format from OfficialCharts.com (https://www.officialcharts.com)
* The script can download the UK Singles Chart or the UK Album Chart.
* The download can be for a single week, a range of weeks or all historical charts.
* It then writes out the data as either a json file or a csv file for use elsewhere.## Process
* Download all the html files for the data range requested from the chart website
* html files that have already been downloaded will not be re-downloaded
* Parse the files for the chart entries
* Write the files out in either a json or csv file
* Charts can be written out as one file per week or one file for all data## Running the script
Download all UK Singles charts and create a single CSV file with all the data
```
./mcd.py
```Download the 2023 UK Album charts and write one JSON file per week
```
./mcd.py --chart uk-albums --startdate 20230101 --enddate 20231231 --output_type json --output_set weekly
```Help and options
```
usage: mcdc.py [-h] [--chart {uk-singles,uk-albums}] [--startdate STARTDATE] [--enddate ENDDATE] [--datadir DATADIR]
[--output_type [{csv,json} ...]] [--output_set [{weekly,all} ...]]The Music Chart Data Collector
optional arguments:
-h, --help show this help message and exit
--chart {uk-singles,uk-albums}
Which music chart to download, (Default: uk-singles)
--startdate STARTDATE
The first chart to download in YYYYMMDD format (Default: 19521114)
--enddate ENDDATE The last chart to download in YYYYMMDD format (Default: 20240509)
--datadir DATADIR Location of datafiles used in processing (Default: ./data)
--output_type [{csv,json} ...]
Output file formats required (Default: ['csv'])
--output_set [{weekly,all} ...]
Weekly charts and/or one large file (Default: ['all'])(c)2024 Data Management & Warehousing
```## Directory Structure
```
.
├── README.md # This file
├── data # Data Firectories
│ ├── csv # CSV Output files
│ ├── html # Downloaded HTML Files
│ └── json # JSON Output files
├── mcdc.py # The script
└── samples # sample html files to understand the structure
├── uk-singles-chart.html
└── uk-singles-chart.png```
## Available data field
* chart_date e.g. 2024-04-26
* chart_position e.g. 1
* chart_artist e.g. AVA MAX
* chart_title e.g. SWEET BUT PSYCHO
* chart_movement e.g. 1
* chart_peak e.g. 1
* chart_weeks e.g. 12## CSV Format example
```
chart_date,chart_position,chart_artist,chart_title,chart_movement,chart_peak,chart_weeks
19521114,1,AL MARTINO,HERE IN MY HEART,New,1,1
19521114,2,JO STAFFORD,YOU BELONG TO ME,New,2,1
19521114,3,NAT 'KING' COLE,SOMEWHERE ALONG THE WAY,New,3,1
19521114,4,BING CROSBY,THE ISLE OF INNISFREE,New,4,1
19521114,5,GUY MITCHELL,FEET UP (PAT HIM ON THE PO-PO),New,5,1
19521114,6,ROSEMARY CLOONEY,HALF AS MUCH,New,6,1
```## JSON Format example
```
[
{
"chart_date": "20221230",
"chart_movement": "3",
"chart_position": "1",
"chart_artist": "MICHAEL BUBLE",
"chart_title": "CHRISTMAS",
"chart_peak": "1",
"chart_weeks": "105"
},
{
"chart_date": "20221230",
"chart_movement": "1",
"chart_position": "2",
"chart_artist": "TAYLOR SWIFT",
"chart_title": "MIDNIGHTS",
"chart_peak": "1",
"chart_weeks": "10"
}
]
```## Performance & capacity
* Downloading and processing one year of data takes around 20 seconds on good machine and fast network (see Build environment below)
* If the files have already been downloaded the processing time is around 3 seconds for the same set of files
* This is based on 100 entries per week and 52 entries
* Historical charts had significantly fewer entries and so are faster
* Processing a large data range to an 'all' file also consumes more memory
* All of the 2023 singles charts create a file of around 250Kb
* The earliest charts are in the 1950s so there are around ~4000 files to process all data for a single chart## To Do List
Potential enhancements for an indeterminate future update
* Add US Billboard Charts (e.g. URL https://www.billboard.com/charts/hot-100/2024-04-13/)
* XML Support## Build environment & maintenance
* This utility was written on an Apple M1 Ultra based Mac running Sonoma 14.4.1 and Python version 3.9.6..
* Written and tested in May 2024 - note that data formats of the source may vary over time. causing the programme to fail.
* This code is unmaintained.## Other code that has similar functionality
* https://pypi.org/project/uk-charts-api
* https://medium.com/@caineosborne/analysing-uk-chart-history-1956-to-2017-6fec0ecc991b
* https://www.markhneedham.com/blog/2020/01/04/quick-graph-uk-official-charts/## Acknowldgements
* https://github.com/Prokopp for unwittingly being a sounding board