{"id":13653854,"url":"https://github.com/dbeley/rymscraper","last_synced_at":"2025-04-23T06:32:06.779Z","repository":{"id":44902300,"uuid":"180775710","full_name":"dbeley/rymscraper","owner":"dbeley","description":"Python library to extract data from rateyourmusic.com.","archived":true,"fork":false,"pushed_at":"2024-06-26T21:44:56.000Z","size":212,"stargazers_count":168,"open_issues_count":11,"forks_count":25,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-10T04:36:34.834Z","etag":null,"topics":["python","rateyourmusic","scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dbeley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-11T11:14:03.000Z","updated_at":"2024-10-17T10:58:14.000Z","dependencies_parsed_at":"2024-01-14T14:29:34.528Z","dependency_job_id":"f71fc6f2-73b2-4405-90c2-236f43205d54","html_url":"https://github.com/dbeley/rymscraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbeley%2Frymscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbeley%2Frymscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbeley%2Frymscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbeley%2Frymscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dbeley","download_url":"https://codeload.github.com/dbeley/rymscraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250385303,"owners_count":21421893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","rateyourmusic","scraper","web-scraping"],"created_at":"2024-08-02T02:01:19.058Z","updated_at":"2025-04-23T06:32:01.743Z","avatar_url":"https://github.com/dbeley.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# rymscraper\n\n\u003e :warning: With the recent addition of Cloudflare protection to rateyourmusic, **rymscraper** is not properly working anymore.\n\n![Build Status](https://github.com/dbeley/rymscraper/workflows/CI/badge.svg)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/8601652424ab44698fd00f6a46a2140e)](https://www.codacy.com/app/dbeley/rymscraper?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=dbeley/rymscraper\u0026amp;utm_campaign=Badge_Grade)\n\n`rymscraper` is an **unofficial** Python API to extract data from [rateyourmusic.com](https://rateyourmusic.com) (👍 consider [supporting them](https://rateyourmusic.com/subscribe)!).\n\n\u003e :warning: **An excessive usage of `rymscraper` can make your IP address banned by rateyourmusic for a few days.**\n\n\n\n## Requirements\n\n- beautifulsoup4\n- lxml\n- requests\n- pandas\n- selenium with geckodriver\n- tqdm\n\n## Installation\n\nClassic installation\n\n```\npython setup.py install\n```\n\nInstallation in a virtualenv with pipenv\n\n```\npipenv install '-e .'\n```\n\n## Example\n\nThe data format used by the library is the python dict. It can be easily converted to CSV or JSON.\n\n```python\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e from rymscraper import rymscraper, RymUrl\n\n\u003e\u003e\u003e network = rymscraper.RymNetwork()\n```\n\n### Artist\n\n```python\n\u003e\u003e\u003e artist_infos = network.get_artist_infos(name=\"Daft Punk\")\n\u003e\u003e\u003e # or network.get_artist_infos(url=\"https://rateyourmusic.com/artist/daft-punk\")\n\u003e\u003e\u003e import json\n\u003e\u003e\u003e json.dumps(artist_infos, indent=2, ensure_ascii=False)\n```\n\n```\n{\n    \"Name\": \"Daft Punk\",\n    \"Formed\": \"1993, Paris, Île-de-France, France\",\n    \"Disbanded\": \"22 February 2021\",\n    \"Members\": \"Thomas Bangalter (programming, synthesizer, keyboards, drum machine, guitar, bass, vocals, vocoder, talk box), Guy-Manuel de Homem-Christo (programming, synthesizer, keyboards, drums, drum machine, guitar)\",\n    \"Related Artists\": \"Darlin'\",\n    \"Notes\": \"See also: Discovered: A Collection of Daft Funk Samples\",\n    \"Also Known As\": \"Draft Ponk\",\n    \"Genres\": \"French House, Film Score, Disco, Electronic, Synthpop, Electroclash\"\n}\n```\n\n```python\n\u003e\u003e\u003e # you can easily convert all returned values to a pandas dataframe\n\u003e\u003e\u003e df = pd.DataFrame([artist_infos])\n\u003e\u003e\u003e df[['Name', 'Formed', 'Disbanded']]\n```\n\n```\n     Name                              Formed         Disbanded\nDaft Punk  1993, Paris, Île-de-France, France  22 February 2021\n```\n\nYou can also extract several artists at once:\n\n```python\n# several artists\n\u003e\u003e\u003e list_artists_infos = network.get_artists_infos(names=[\"Air\", \"M83\"])\n\u003e\u003e\u003e # or network.get_artists_infos(urls=[\"https://rateyourmusic.com/artist/air\", \"https://rateyourmusic.com/artist/m83\"])\n\u003e\u003e\u003e df = pd.DataFrame(list_artists_infos)\n```\n\n### Album\n\n```python\n\u003e\u003e\u003e # name field should use the format Artist - Album name (not ideal but it works for now)\n\u003e\u003e\u003e album_infos = network.get_album_infos(name=\"XTC - Black Sea\")\n\u003e\u003e\u003e # or network.get_album_infos(url=\"https://rateyourmusic.com/release/album/xtc/black-sea/\")\n\u003e\u003e\u003e df = pd.DataFrame([album_infos])\n```\n\nYou can also extract several albums at once:\n\n```python\n# several albums\n\u003e\u003e\u003e list_album_infos = network.get_albums_infos(names=[\"Ride - Nowhere\", \"Electrelane - Axes\"])\n\u003e\u003e\u003e # or network.get_albums_infos(urls=[\"https://rateyourmusic.com/release/album/ride/nowhere/\", \"https://rateyourmusic.com/release/album/electrelane/axes/\"])\n\u003e\u003e\u003e df = pd.DataFrame(list_album_infos)\n```\n\n#### Album Timeline\n\nNumber of ratings per day:\n\n```python\n\u003e\u003e\u003e album_timeline = network.get_album_timeline(url=\"https://rateyourmusic.com/release/album/feu-chatterton/palais-dargile/\")\n\u003e\u003e\u003e df = pd.DataFrame(album_timeline)\n\u003e\u003e\u003e df[\"Date\"] = df[\"Date\"].apply(lambda x: datetime.datetime.strptime(x, \"%d %b %Y\"))\n\u003e\u003e\u003e df[\"Date\"].groupby(df[\"Date\"].dt.to_period(\"D\")).count().plot(kind=\"bar\")\n```\n\n![timeline_plot](https://github.com/dbeley/rymscraper/blob/master/docs/timeline.png?raw=true)\n\n### Chart\n\n```python\n\u003e\u003e\u003e # (slow for very long charts)\n\u003e\u003e\u003e rym_url = RymUrl.RymUrl() # default: top of all-time. See examples/get_chart.py source code for more options.\n\u003e\u003e\u003e chart_infos = network.get_chart_infos(url=rym_url, max_page=3)\n\u003e\u003e\u003e df = pd.DataFrame(chart_infos)\n\u003e\u003e\u003e df[['Rank', 'Artist', 'Album', 'RYM Rating', 'Ratings']]\n```\n\n```\nRank                         Artist                                              Album RYM Rating Ratings\n   1                      Radiohead                                        OK Computer       4.23   67360\n   2                     Pink Floyd                                 Wish You Were Here       4.29   46534\n   3                   King Crimson                   In the Court of the Crimson King       4.30   42784\n   4                      Radiohead                                              Kid A       4.21   55999\n   5            My Bloody Valentine                                           Loveless       4.24   47394\n   6                 Kendrick Lamar                                To Pimp a Butterfly       4.27   41040\n   7                     Pink Floyd                          The Dark Side of the Moon       4.20   55535\n   8                    The Beatles                                         Abbey Road       4.25   42739\n   9  The Velvet Underground \u0026 Nico                      The Velvet Underground \u0026 Nico       4.24   44002\n  10                    David Bowie  The Rise and Fall of Ziggy Stardust and the Sp...       4.26   37963\n```\n\n### Discography\n\n```python\n\u003e\u003e\u003e discography_infos = network.get_discography_infos(name=\"Aufgang\", complementary_infos=True)\n\u003e\u003e\u003e # or network.get_discography_infos(url=\"https://rateyourmusic.com/artist/aufgang\")\n\u003e\u003e\u003e df = pd.DataFrame.from_records(discography_infos)\n```\n\n```python\n\u003e\u003e\u003e # don't forget to close and quit the browser (prevent memory leaks)\n\u003e\u003e\u003e network.browser.close()\n\u003e\u003e\u003e network.browser.quit()\n```\n\n## Example Scripts\n\nSome scripts are included in the examples folder.\n\n- get_artist_infos.py : extract informations about one or several artists by name or url in a csv file.\n- get_chart.py : extract albums information appearing in a chart by name, year or url in a csv file.\n- get_discography.py : extract the discography of one or several artists by name or url in a csv file.\n- get_album_infos.py : extract informations about one or several albums by name or url in a csv file.\n- get_album_timeline.py : extract the timeline of an album into a json file.\n\n### Usage\n\n```\npython get_artist_infos.py -a \"u2,xtc,brad mehldau\"\npython get_artist_infos.py --file_artist artist_list.txt\n\npython get_chart.py -g rock\npython get_chart.py -g ambient -y 2010s -c France --everything\n\npython get_discography.py -a magma\npython get_discography.py -a \"the new pornographers, ween, stereolab\" --complementary_infos --separate_export\n\npython get_album_infos.py -a \"ride - nowhere\"\npython get_album_infos.py --file_url urls_list.txt --no_headless\n\npython get_album_timeline.py -a \"ride - nowhere\"\npython get_album_timeline.py -u \"https://rateyourmusic.com/release/album/feu-chatterton/palais-dargile/\"\n```\n\n### Help\n\n```\npython get_artist_infos.py -h\n```\n\n```\nusage: get_artist_infos.py [-h] [--debug] [-u URL] [--file_url FILE_URL]\n                           [--file_artist FILE_ARTIST] [-a ARTIST] [-s]\n                           [--no_headless]\n\nScraper rateyourmusic (artist version).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --debug               Display debugging information.\n  -u URL, --url URL     URLs of the artists to extract (separated by comma).\n  --file_url FILE_URL   File containing the URLs to extract (one by line).\n  --file_artist FILE_ARTIST\n                        File containing the artists to extract (one by line).\n  -a ARTIST, --artist ARTIST\n                        Artists to extract (separated by comma).\n  -s, --separate_export\n                        Also export the artists in separate files.\n  --no_headless         Launch selenium in foreground (background by default).\n```\n\n```\npython get_chart.py -h\n```\n\n```\nusage: get_chart.py [-h] [--debug] [-u URL] [-g GENRE] [-y YEAR] [-c COUNTRY]\n                    [-p PAGE] [-e] [--no_headless]\n\nScraper rateyourmusic (chart version).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --debug               Display debugging information.\n  -u URL, --url URL     Chart URL to parse.\n  -g GENRE, --genre GENRE\n                        Chart Option : Genre (use + if you need a space).\n  -y YEAR, --year YEAR  Chart Option : Year.\n  -c COUNTRY, --country COUNTRY\n                        Chart Option : Country.\n  -p PAGE, --page PAGE  Number of page to extract. If not set, every pages\n                        will be extracted.\n  -e, --everything      Chart Option : Extract Everything / All Releases\n                        (otherwise only albums).\n  --no_headless         Launch selenium in foreground (background by default).\n```\n\n```\npython get_discography.py -h\n```\n\n```\nusage: get_discography.py [-h] [--debug] [-u URL] [--file_url FILE_URL]\n                          [--file_artist FILE_ARTIST] [-a ARTIST] [-s] [-c]\n                          [--no_headless]\n\nScraper rateyourmusic (discography version).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --debug               Display debugging information.\n  -u URL, --url URL     URLs to extract (separated by comma).\n  --file_url FILE_URL   File containing the URLs to extract (one by line).\n  --file_artist FILE_ARTIST\n                        File containing the artists to extract (one by line).\n  -a ARTIST, --artist ARTIST\n                        Artists to extract (separated by comma).\n  -s, --separate_export\n                        Also export the artists in separate files.\n  -c, --complementary_infos\n                        Extract complementary informations for each releases\n                        (slower, more requests on rym).\n  --no_headless         Launch selenium in foreground (background by default).\n```\n\n```\npython get_album_infos.py -h\n```\n\n```\nusage: get_album_infos.py [-h] [--debug] [-u URL] [--file_url FILE_URL]\n                          [--file_album_name FILE_ALBUM_NAME] [-a ALBUM_NAME]\n                          [-s] [--no_headless]\n\nScraper rateyourmusic (album version).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --debug               Display debugging information.\n  -u URL, --url URL     URL to extract (separated by comma).\n  --file_url FILE_URL   File containing the URLs to extract (one by line).\n  --file_album_name FILE_ALBUM_NAME\n                        File containing the name of the albums to extract (one\n                        by line, format Artist - Album).\n  -a ALBUM_NAME, --album_name ALBUM_NAME\n                        Albums to extract (separated by comma, format Artist -\n                        Album).\n  -s, --separate_export\n                        Also export the artists in separate files.\n  --no_headless         Launch selenium in foreground (background by default).\n```\n\n```\npython get_album_timeline.py -h\n```\n\n```\nusage: get_album_timeline.py [-h] [--debug] [-u URL] [-a ALBUM_NAME]\n                             [--no_headless]\n\nScraper rateyourmusic (album timeline version).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --debug               Display debugging information.\n  -u URL, --url URL     URL to extract.\n  -a ALBUM_NAME, --album_name ALBUM_NAME\n                        Album to extract (format Artist - Album).\n  --no_headless         Launch selenium in foreground (background by default).\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbeley%2Frymscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbeley%2Frymscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbeley%2Frymscraper/lists"}