https://github.com/arthurdjn/scrape-lyricwiki
Scrape music data from LyricsWiki (https://lyrics.fandom.com). Artists, Albums, Songs can be extracted.
https://github.com/arthurdjn/scrape-lyricwiki
beautifulsoup lyrics lyrics-api lyrics-scraping lyricsfandom lyricwiki music scraper
Last synced: 8 months ago
JSON representation
Scrape music data from LyricsWiki (https://lyrics.fandom.com). Artists, Albums, Songs can be extracted.
- Host: GitHub
- URL: https://github.com/arthurdjn/scrape-lyricwiki
- Owner: arthurdjn
- License: mit
- Created: 2020-05-21T16:11:23.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-10-26T21:35:12.000Z (about 2 years ago)
- Last Synced: 2025-03-25T23:36:56.248Z (8 months ago)
- Topics: beautifulsoup, lyrics, lyrics-api, lyrics-scraping, lyricsfandom, lyricwiki, music, scraper
- Language: Jupyter Notebook
- Size: 582 KB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://lyricsfandom.readthedocs.io/en/latest/index.html)
[](https://pypi.org/project/lyricsfandom/)

> **Note**
> The LyricWiki website is closed (see [issue #1](https://github.com/arthurdjn/scrape-lyricwiki/issues/1) and [wiki notice](https://en.wikipedia.org/wiki/LyricWiki)). The package is no longer working as of 2023.
# lyricsfandom
Scrape music data from LyricsWiki (https://lyrics.fandom.com). Artists, Albums, Songs can be extracted.
*Project made during a Deep Learning project for music generation using GPT2 model.*
# Installation
Install *lyricsfandom* package from *PyPi*:
```
pip install lyricsfandom
```
Or from *GitHub*:
```
git clone https://github.com/arthurdjn/scrape-lyricwiki
cd scrape-lyricwiki
pip install .
```
# Getting Started
## LyricsFandom API
You can search for ``Artist``, ``Album`` or ``Song`` from the API:
```python
from lyricsfandom import LyricWiki
# Connect to the API
wiki = LyricWiki()
# Search for an artist. `LyricsFandom` is not case sensitive.
artist = wiki.search_artist('london grammar')
# Search for an album
album = wiki.search_album('london grammar', 'if you wait')
# ...Or a song
song = wiki.search_song('london grammar', 'wicked game')
# And retrieve its lyrics
lyrics = song.get_lyrics()
print(lyrics)
```
```
The world was on fire and no-one can save me but you
Strange what desire can make foolish people do
I'd never dreamed that I'd meet somebody like you
I'd never dreamed that I'd lose somebody like you
No, I don't wanna fall in love
No, I don't wanna fall in love
With you
What a wicked thing to do
To make me dream of you
What a wicked thing to say
To make me feel this way
[...]
```
## Structure
The package is divided as follows:
* ArtistMeta
* AlbumMeta, inherits from ArtistMeta
* SongMeta, inherits from AlbumMeta
## Retrieve data
Once you have one of these objects, you can also access data directly through their methods:
```python
artist = wiki.search_artist('london grammar')
albums = artist.get_albums()
songs = artist.get_songs()
# Idem from an album
album = wiki.search_album('london grammar', 'if you wait')
songs = album.get_songs()
```
In addition, you can retrieve parent objects from children:
```python
artist = wiki.search_artist('london grammar')
song = artist.search_song('strong')
# Access to parent classes
album = song.get_album()
artist = song.get_artist()
```
You can scrape for description, links and other details information:
```python
artist = wiki.search_artist('london grammar')
info = artist.get_info() # description of the artist (band members, genres, labels etc.)
links = artist.get_links() # links where to buy the artist's music.
```
## Save and export
You can save data in a JSON format (and encode it to ASCII if you want).
```python
artist = wiki.search_artist('london grammar')
artist_data = artist.to_json(encode='ascii')
# Idem for Album and Song
```
# Efficiency
This package can make a lot of connections while scraping data.
Here is a small comparison of different packages, made on scraping 10 songs from an album.
*pylyrics3* is the fastest to retrieve data, but it only return lyrics on a JSON format (and not OOP).
*lyricsfandom* have similar results, but *lyricsmaster* is 10 times slower.
