{"id":23200509,"url":"https://github.com/bayunova28/spotify_lyrics","last_synced_at":"2025-04-05T09:24:58.886Z","repository":{"id":157263958,"uuid":"583869538","full_name":"Bayunova28/Spotify_Lyrics","owner":"Bayunova28","description":"This repository contains my personal project to generate mapreduce using apache hadoop ","archived":false,"fork":false,"pushed_at":"2022-12-31T13:02:01.000Z","size":20696,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-17T20:42:20.075Z","etag":null,"topics":["apache-derby","apache-hadoop","apache-hive","hadoop-mapreduce","mapreduce-python","spotify"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bayunova28.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-31T08:34:33.000Z","updated_at":"2022-12-31T09:41:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"d9a44641-28b3-466c-bed5-a2401d056e42","html_url":"https://github.com/Bayunova28/Spotify_Lyrics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayunova28%2FSpotify_Lyrics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayunova28%2FSpotify_Lyrics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayunova28%2FSpotify_Lyrics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayunova28%2FSpotify_Lyrics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bayunova28","download_url":"https://codeload.github.com/Bayunova28/Spotify_Lyrics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247314016,"owners_count":20918741,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-derby","apache-hadoop","apache-hive","hadoop-mapreduce","mapreduce-python","spotify"],"created_at":"2024-12-18T15:11:35.058Z","updated_at":"2025-04-05T09:24:58.855Z","avatar_url":"https://github.com/Bayunova28.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spotify Lyrics\n\u003cimg src=\"https://github.com/Bayunova28/Spotify_Lyrics/blob/master/cover.png\" height=\"450\" width=\"1100\"\u003e\n\n## Background \n\u003cp align=\"justify\"\u003eAudio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming \nservice providers, with over 456 million monthly active users, including 195 million paying subscribers, as of September 2022. Spotify is listed (through a Luxembourg \nCity-domiciled holding company, Spotify Technology S.A.) on the New York Stock Exchange in the form of American depositary receipts. Spotify offers digital copyright \nrestricted recorded music and podcasts, including more than 82 million songs, from record labels and media companies. As a freemium service, basic features are free with \nadvertisements and limited control, while additional features, such as offline listening and commercial-free listening, are offered via paid subscriptions. Users can \nsearch for music based on artist, album, or genre, and can create, edit, and share playlists. Spotify is available in most of Europe, as well as Africa, the Americas, \nAsia and Oceania, with a total availability in 184 markets. The service is available on most devices including Windows, macOS, and Linux computers, iOS and Android \nsmartphones and tablets, smart home devices such as the Amazon Echo and Google Nest lines of products and digital media players like Roku.\u003c/p\u003e\n\n## Requirement \n* \u003cb\u003e[Apache Hadoop](https://archive.apache.org/dist/hadoop/common/)\u003c/b\u003e\n* \u003cb\u003e[Apache Derby](https://db.apache.org/derby/derby_downloads.html)\u003c/b\u003e\n* \u003cb\u003e[Apache Hive](https://hive.apache.org/downloads.html)\u003c/b\u003e\n\n#### Mapper.py\n```py\n# import python library\nimport sys\n\n# input comes from STDIN (standard input)\nfor line in sys.stdin:\n    # remove leading and trailing whitespace\n    line = line.strip()\n    # split the line into words\n    words = line.split()\n    # increase counters\n    for word in words:\n        # write the results to STDOUT (standard output)\n        # tab-delimited; the trivial word count is 1\n        print('%s\\t%s' % (word, 1))\n```  \n#### Reducer.py\n```py\n# import python library\nfrom operator import itemgetter\nimport sys\n\n# set parameter from words dataset\ncurrent_word = None\ncurrent_count = 0\nword = None\n\n# input comes from STDIN\nfor line in sys.stdin:\n    # remove leading and trailing whitespace\n    line = line.strip()\n\n    # parse the input we got from mapper.py\n    word, count = line.split('\\t', 1)\n\n    # convert count (currently a string) to int\n    try:\n        count = int(count)\n    except ValueError:\n        # count was not a number, so silently\n        # ignore/discard this line\n        continue\n\n    # this IF-switch only works because Hadoop sorts map output\n    # by key (here: word) before it is passed to the reducer\n    if current_word == word:\n        current_count += count\n    else:\n        if current_word:\n            # write result to STDOUT\n            print('%s\\t%s' % (current_word, current_count))\n        current_count = count\n        current_word = word\n\n# do not forget to output the last word if needed!\nif current_word == word:\n    print('%s\\t%s' % (current_word, current_count))\n```\n\n#### Run mapper and reducer program  \n```sh\nEXEC=$hadoop jar c:\\hadoop-2.8.0\\share\\hadoop\\tools\\lib\\hadoop-streaming-*.jar -file \"D:\\mapper.py\" -mapper \"python D:\\mapper.py\" -file \"D:\\reducer.py\" -reducer \"python D:\\reducer.py\" -input spotify/samples.txt -output spotify/output/\necho \"$EXEC\"\n```\n#### Hadoop Web UI \n\u003cimg src=\"https://github.com/Bayunova28/Spotify_Lyrics/blob/master/hadoop-web-ui.jpg\" height=\"550\" width=\"1100\"\u003e\n\n## Acknowledgement\n* \u003cb\u003eData Source : [Spotify Million Song Dataset](https://www.kaggle.com/datasets/notshrirang/spotify-million-song-dataset)\n* \u003cb\u003eMapreduce Tutorial : [Michael G. Noll](https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/)\n* \u003cb\u003eHadoop Tutorial : [Edureka](https://www.youtube.com/watch?v=g7Qpnmi0Q-s)\n* \u003cb\u003eHive Tutorial : [Simplilearn](https://www.youtube.com/watch?v=rr17cbPGWGA)\u003c/b\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayunova28%2Fspotify_lyrics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbayunova28%2Fspotify_lyrics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayunova28%2Fspotify_lyrics/lists"}