{"id":20056099,"url":"https://github.com/vinicius999/spotify-api","last_synced_at":"2026-06-07T02:31:33.337Z","repository":{"id":151636358,"uuid":"622031670","full_name":"Vinicius999/Spotify-API","owner":"Vinicius999","description":"ETL simples, onde se buscou extrair, processar e carregar dados da API pública do Spotify em um banco de dados Postgresql, usando a linguagem Python.","archived":false,"fork":false,"pushed_at":"2023-06-24T14:32:59.000Z","size":11196,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-12T21:42:33.874Z","etag":null,"topics":["api","bulk-inserts","etl","postgresql","psycopg2","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Vinicius999.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-31T23:52:30.000Z","updated_at":"2023-06-22T00:04:57.000Z","dependencies_parsed_at":"2024-11-13T12:52:17.517Z","dependency_job_id":"ced50ce1-7637-4e77-809f-560b06816ee4","html_url":"https://github.com/Vinicius999/Spotify-API","commit_stats":null,"previous_names":["vinicius999/spotify-api"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vinicius999%2FSpotify-API","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vinicius999%2FSpotify-API/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vinicius999%2FSpotify-API/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vinicius999%2FSpotify-API/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Vinicius999","download_url":"https://codeload.github.com/Vinicius999/Spotify-API/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241484231,"owners_count":19970237,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","bulk-inserts","etl","postgresql","psycopg2","python"],"created_at":"2024-11-13T12:51:43.542Z","updated_at":"2025-03-02T09:26:31.385Z","avatar_url":"https://github.com/Vinicius999.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e ETL Spotify API \u003c/h1\u003e\n\n\u003cp align=\"center\"\u003eEste repositório contém o 1º desafio da mentoria individual do \u003ca href=\"https://desenvolve.grupoboticario.com.br/\"\u003ePrograma Desenvolve 2023 - Trilha Dados\u003c/a\u003e. O desafio se trata do desenvolvimento de um \u003cstrong\u003eETL\u003c/strong\u003e simples, onde se buscou \u003cstrong\u003eextrair, processar e carregar\u003c/strong\u003e dados da API pública do Spotify em um banco de dados PostgreSQL, usando a linguagem Python. \u003cp\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/Vinicius999/Spotify-API#executar-projeto\"\u003eExecutar projeto\u003c/a\u003e |\n    \u003ca href=\"https://github.com/Vinicius999/Spotify-API#desafio\"\u003eDesafio\u003c/a\u003e |\n    \u003ca href=\"https://github.com/Vinicius999/Spotify-API#tecnologias\"\u003eTecnologias\u003c/a\u003e |\n    \u003ca href=\"https://github.com/Vinicius999/Spotify-API#dados\"\u003eDados\u003c/a\u003e |\n    \u003ca href=\"https://github.com/Vinicius999/Spotify-API#projeto-etl\"\u003eProjeto ETL\u003c/a\u003e\n\u003c/p\u003e\n\n\n\n## Executar projeto\n\nPara instalar as bibliotecas necessárias para executar este projeto, deve ser usado o arquivo [`requirements.txt`](https://github.com/Vinicius999/Desafio-01-Mentor-Spotify-API/blob/main/requirements.txt). Para fazer isso, abra o terminal, navegue até a pasta do seu priojeto e execute o seguinte comando:\n\n```\npip install -r requirements.txt\n```\n\nFeito isso, podemos executar o arquivo [`main.py`](https://github.com/Vinicius999/Desafio-01-Mentor-Spotify-API/blob/main/main.py) usando o comando:\n\n```\npython3 main.py\n```\n\n## Desafio\n\n1. Utilizar Python para ler dados da [API do Spotify](https://developer.spotify.com/documentation/web-api/tutorials/getting-started) e encontrar episódios onde a palavra \"Python\" aparecer.\n\n2. Armazenar no banco de dados Postgresql as seguintes informações:\n    - id do episodio;\n    - descrição;\n    - link;\n    - lista de imagens\n\n3. Baixar as imagens dos eposódios a partir da lista de imagens gravadas no banco de dados e gravar em uma pasta dentro do projeto.\n\n#### Pontos analisados:\n\n- Padronização do código;\n- Identação;\n- Versionamento;\n- Programação Orientada a Objetos\n\n## Tecnologias\n\n\u003cp style='margin: 16px 4px 32px;'\u003e\n    \u003ca href=\"https://www.python.org/\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n        \u003cimg src=\"https://cdn.jsdelivr.net/gh/devicons/devicon/icons/python/python-original.svg\" alt=\"Vini-python\" width=\"40\" height=\"40\" /\u003e\n    \u003c/a\u003e\n\t\u003ca href=\"https://pandas.pydata.org/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n        \u003cimg src=\"https://cdn.jsdelivr.net/gh/devicons/devicon/icons/pandas/pandas-original-wordmark.svg\" alt=\"Vini-streamlit\" width=\"40\" height=\"40\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://www.postgresql.org/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n        \u003cimg src=\"https://cdn.jsdelivr.net/gh/devicons/devicon/icons/postgresql/postgresql-original.svg\" alt=\"Vini-Postgress\" height=\"40\" width=\"40\" \u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n## Dados\n\n- Fonte dos dados: [Spotify API](https://developer.spotify.com/)\n\n- Documentação: [documentação](https://developer.spotify.com/documentation/web-api)\n\n## Projeto ETL\n\n### PARTE 1 - Utilizar Python para ler dados da [API do Spotify](https://developer.spotify.com/documentation/web-api/tutorials/getting-started) e encontrar episódios onde a palavra \"Python\" aparecer.\n\n##### EXTRACT\n\nPara se conectar e ler dados da API do Spotify, foi desenvolvida a classe `Spotipy`, contendo as funções de altenticação e busca de espisódios usando a biblioteca [spotipy](https://spotipy.readthedocs.io/en/2.22.1/):\n\n```python\nclass Spotipy():\n    def __init__ (self, CLIENT_ID, CLIENT_SECRET):\n        self.CLIENT_ID = CLIENT_ID\n        self.CLIENT_SECRET = CLIENT_SECRET\n    \n    def authentication(self):\n        auth_manager = SpotifyClientCredentials(\n            client_id = self.CLIENT_ID,\n            client_secret = self.CLIENT_SECRET\n        )\n\n        self.sp = spotipy.Spotify(auth_manager = auth_manager)\n        return self.sp\n\n    def get_all_episodes_with_python(self, sp):\n        self.episodes = []\n        self.offset = 0\n        self.limit = 50  # max limit\n        self.market='BR' \n        \n        while True:\n            self.results = sp.search(q='Python', type='episode', limit=self.limit, offset=self.offset, market=self.market)\n            self.episodes += self.results['episodes']['items']\n            self.offset += self.limit\n            if len(self.results['episodes']['items']) == 0:\n                break  \n        return self.episodes\n```\n\n### PARTE 2 - Armazenar no banco de dados Postgresql as seguintes informações: id do episodio, descrição, link, lista de imagens\n\n##### TRANSFORM\n\nA transformação dos dados foi realizada usando **listas** e **tuplas**, tipos de dados nativos do Python. A motivação do uso desses tipos de dados foi a performance, seja na manipulação ou na inserção desses dados no banco de dados usando **BULK INSERT**.\n\n```python\nepisode = tuple()\nepisode_list = list()\nfor ep in episodes:\n    episode = ep['id'], ep['description'], ep['external_urls']['spotify'], ep['href']\n    episode_list.append(episode)\n    \nimage = tuple()\nimage_list = list()\nfor ep in episodes:\n    for j, im in enumerate(ep['images']):\n        image = ep['id'], j+1, im['height'], im['width'], im['url']\n        image_list.append(image)\n```\n\n##### LOAD\n\nPara as funções relacionadas ao banco de dados, foi usada a biblioteca [psycopg2](https://pypi.org/project/psycopg2/). Todas essas funções, desde a conexão com o banco, criação das tabelas, inserção e consulta aos dados, estão contidas na classe `Database`:\n\n```python\nclass Database:\n    def __init__ (self, HOST, DATABASE, USER, PASSWORD):\n        print('Connecting to spotifydb...')\n        self.HOST=HOST\n        self.DATABASE=DATABASE\n        self.USER=USER\n        self.PASSWORD=PASSWORD\n        \n    def connect_db(self):\n        self.conn = psycopg2.connect(\n            host=self.HOST,\n            database=self.DATABASE,\n            user=self.USER,\n            password=self.PASSWORD\n        )\n        return self.conn\n        \n    def create_db(self, sql):\n        self.conn = self.connect_db()\n        self.cur = self.conn.cursor()\n        self.cur.execute(sql)\n        self.conn.commit()\n        self.conn.close()   \n    \n    def insert_db(self, sql):\n        self.conn = self.connect_db()\n        self.cur = self.conn.cursor()\n        try:\n            self.cur.execute(sql)\n            self.conn.commit()\n        except (Exception, psycopg2.DatabaseError) as error:\n            print(f\"Error: {error}\")\n            self.conn.rollback()\n            self.cur.close()\n            return 1\n        self.cur.close()\n    \n    def bulk_insert_db(self, sql, data):\n        self.conn = self.connect_db()\n        self.cur = self.conn.cursor()\n        psycopg2.extras.execute_values(self.cur, sql, data)\n        self.conn.commit()\n        self.cur.close()\n        self.conn.close()\n        \n    def select_db(self, sql):\n        self.conn = self.connect_db()\n        self.cur = self.conn.cursor()\n        self.cur.execute(sql)\n        self.recset = self.cur.fetchall()\n        self.records = []\n        for rec in self.recset:\n            self.records.append(rec)\n        self.conn.close()\n        return self.records\n```\n\n### PARTE 3 - Baixar as imagens dos eposódios a partir da lista de imagens gravadas no banco de dados e gravar em um diretório dentro do projeto\n\nPara o download das imagens, foi usada a biblioteca [requests](https://pypi.org/project/requests/), enquando a biblioteca [os](https://docs.python.org/3/library/os.html) foi usada para criação das pastas e gravação dos arquivos das imagens:\n```Python\nfor i, row in df_images.iterrows():\n    response = requests.get(row['url'])\n    image_filename = f\"{row['id']}_{i}.jpg\"\n    id = ''\n    if id != row['id']:\n        id = row['id']\n        path = f'images/{id}'\n        if not os.path.exists(f'{path}/'):\n            os.mkdir(f'{path}/')\n    image_path = os.path.join(f'images/{id}', image_filename)\n    \n    with open(image_path, 'wb') as f:\n        f.write(response.content)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinicius999%2Fspotify-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvinicius999%2Fspotify-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinicius999%2Fspotify-api/lists"}