{"id":13412246,"url":"https://github.com/AlbertSuarez/azlyrics-scraper","last_synced_at":"2025-03-14T18:30:58.484Z","repository":{"id":112051961,"uuid":"195517300","full_name":"AlbertSuarez/azlyrics-scraper","owner":"AlbertSuarez","description":"🎵 AZLyrics scraper for getting song lyrics publishing to Box","archived":false,"fork":false,"pushed_at":"2020-01-31T10:21:12.000Z","size":34,"stargazers_count":18,"open_issues_count":2,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-07-31T20:50:04.558Z","etag":null,"topics":["azlyrics","box","cloud-storage","dataset","scraper","songs"],"latest_commit_sha":null,"homepage":"https://app.box.com/s/vats4n6slxtknuaxz58mxlo6ry8v04pd?sortColumn=name\u0026sortDirection=ASC","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlbertSuarez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-06T08:38:37.000Z","updated_at":"2023-12-21T21:29:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"ba747aa2-9dd7-473f-9d75-c510176daa30","html_url":"https://github.com/AlbertSuarez/azlyrics-scraper","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertSuarez%2Fazlyrics-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertSuarez%2Fazlyrics-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertSuarez%2Fazlyrics-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertSuarez%2Fazlyrics-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlbertSuarez","download_url":"https://codeload.github.com/AlbertSuarez/azlyrics-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243625064,"owners_count":20321226,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azlyrics","box","cloud-storage","dataset","scraper","songs"],"created_at":"2024-07-30T20:01:22.558Z","updated_at":"2025-03-14T18:30:58.477Z","avatar_url":"https://github.com/AlbertSuarez.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# AZLyrics scraper\n\n[![HitCount](http://hits.dwyl.io/AlbertSuarez/azlyrics-scraper.svg)](http://hits.dwyl.io/AlbertSuarez/azlyrics-scraper)\n[![GitHub stars](https://img.shields.io/github/stars/AlbertSuarez/azlyrics-scraper.svg)](https://GitHub.com/AlbertSuarez/azlyrics-scraper/stargazers/)\n[![GitHub forks](https://img.shields.io/github/forks/AlbertSuarez/azlyrics-scraper.svg)](https://GitHub.com/AlbertSuarez/azlyrics-scraper/network/)\n[![GitHub repo size in bytes](https://img.shields.io/github/repo-size/AlbertSuarez/azlyrics-scraper.svg)](https://github.com/AlbertSuarez/azlyrics-scraper)\n[![GitHub contributors](https://img.shields.io/github/contributors/AlbertSuarez/azlyrics-scraper.svg)](https://GitHub.com/AlbertSuarez/azlyrics-scraper/graphs/contributors/)\n[![GitHub license](https://img.shields.io/github/license/AlbertSuarez/azlyrics-scraper.svg)](https://github.com/AlbertSuarez/azlyrics-scraper/blob/master/LICENSE)\n\n[Box folder URL](https://app.box.com/s/vats4n6slxtknuaxz58mxlo6ry8v04pd) | [Static repo website](https://asuarez.dev/azlyrics-scraper/) | [Kaggle dataset](https://www.kaggle.com/albertsuarez/azlyrics)\n\n🎵 AZLyrics scraper for getting all the song lyrics and publishing to Box.\n\n## Python requirements\n\nThis project is using Python3. All these requirements have been specified in the `requirements.lock` file.\n\n1. [Requests](https://2.python-requests.org/en/master/): used for retrieving the HTML content of a website.\n2. [BeautifulSoup](https://pypi.org/project/beautifulsoup4/): used for scraping an HTML content.\n3. [Tor](https://2019.www.torproject.org/docs/debian.html.en): used for making requests anonymous using other IPs.\n4. [Stem](https://stem.torproject.org/): used for authentificating every request with a different IP.\n5. [Fake User-Agent](https://pypi.org/project/fake-useragent/): used for using random User-Agent's for every request.\n6. [Unidecode](https://pypi.org/project/Unidecode/): used for cleaning strings from weird characters.\n7. [Box SDK](https://github.com/box/box-python-sdk): used for uploading/downloading files to/from Box Cloud Storage.\n\n## Recommendations\n\nUsage of [virtualenv](https://realpython.com/blog/python/python-virtual-environments-a-primer/) is recommended for package library / runtime isolation.\n\n## Usage\n\nTo run this script, please execute the following from the root directory:\n\n1. Setup virutal environment\n\n2. Install dependencies\n\n  ```bash\n  pip3 install -r requirements.lock\n  ```\n\n3. Move [JWT configuration](#jwt-configuration) file from Box API\n\n4. Install [Tor browser](https://2019.www.torproject.org/docs/debian.html.en)\n\n5. Configure Tor IP renewal editting `/etc/tor/torrc` file\n\n   ```\n   ControlPort 9051\n   CookieAuthentication 1\n   ```\n\n6. Restart Tor browser\n\n  ```bash\n  sudo service tor restart\n  ```\n\n7. Run the script\n\n  ```bash\n  python3 -m src\n  ```\n\n## JWT configuration\n\nIn order to use Box Cloud Storage API in a secure way, this project is configured for using their service with the JWT authentication. After following the [tutorial](https://developer.box.com/docs/construct-jwt-claim-manually), we will obtain a configuration file which will have to be located under `data` folder with the name of `jwt_config.json` as the `__init__.py` configuration file says:\n\n```python\n# Box integration\nBOX_CONFIG_FILE_PATH = 'data/jwt_config.json'\n```\n\n## Authors\n\n- [Albert Suàrez](https://github.com/AlbertSuarez)\n\n## License\n\nMIT © AZLyrics scraper\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlbertSuarez%2Fazlyrics-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAlbertSuarez%2Fazlyrics-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlbertSuarez%2Fazlyrics-scraper/lists"}