{"id":26262460,"url":"https://github.com/danielbmeireles/azspark","last_synced_at":"2026-04-17T10:02:35.180Z","repository":{"id":238521723,"uuid":"195417951","full_name":"danielbmeireles/azspark","owner":"danielbmeireles","description":"Tech Challenge - DTB Hub","archived":false,"fork":false,"pushed_at":"2019-10-12T08:22:00.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-05-06T16:03:37.835Z","etag":null,"topics":["ansible","azure","powershell"],"latest_commit_sha":null,"homepage":null,"language":"PowerShell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielbmeireles.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-05T13:55:05.000Z","updated_at":"2024-05-06T16:03:40.445Z","dependencies_parsed_at":"2024-05-06T16:03:40.328Z","dependency_job_id":"44b58da6-df89-46d0-9b72-24380d7706ff","html_url":"https://github.com/danielbmeireles/azspark","commit_stats":null,"previous_names":["danielbmeireles/azspark"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielbmeireles%2Fazspark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielbmeireles%2Fazspark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielbmeireles%2Fazspark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielbmeireles%2Fazspark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielbmeireles","download_url":"https://codeload.github.com/danielbmeireles/azspark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243500748,"owners_count":20300775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","azure","powershell"],"created_at":"2025-03-14T00:19:14.681Z","updated_at":"2025-12-29T11:21:21.073Z","avatar_url":"https://github.com/danielbmeireles.png","language":"PowerShell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚚 azspark\n\n## 👩🏿‍💻 Tech Challenge - DTB Hub\n\nThis document describes the implementantion process of a Spark cluster with 1 (one) master node and 2 (two) slaves nodes.\n\nThe main documentation used to understand the basics of the software installation, configuration and administration can be found at:\n\n- Apache Spark Home Page: \u003chttps://spark.apache.org\u003e\n- Installation Process: \u003chttps://spark.apache.org/docs/2.4.0/#launching-on-a-cluster\u003e\n\nThe infrastructure used to deploy the solution was entirely based on Microsoft Azure. A free / trial account was used in order to implement the network, virtual machines, persistence (storage) and all other computational resources used in this laboratory.\n\n## ☁ How to implement the Azure Virtual Infrastructure\n\n1. From the Azure Portal, start a Cloud Shell terminal (the system will create a Resource Group and a persistence layer automaticaly in order to launch the application).\n\n2. Upload the file \"deploySparkInfrastructure.ps1\" using the upload option found at the terminal's menu bar.\n\n3. Move the file from your home directory to the \"clouddrive\" directory and execute the script:\n\n~~~~powershell\nPS Azure:\\\u003e cd $HOME\nPS /home/username\u003e mv ./deploySparkInfrastructure.ps1 ./cloudrive\nPS /home/username\u003e cd ./clouddrive/\nPS /home/username/clouddrive\u003e ./deploySparkInfrastructure.ps1\n~~~~\n\n## 💡 How to implement the Spark cluster\n\nAn Ansible Playbook is available in order to deploy the Spark cluster. However, some pre-deployment steps are necessary in order to install and configure the Ansible itself.\n\n1. Using the Cloud Shell, collect the public IP's associated with each Virtual Machine deployed:\n\n~~~~powershell\nPS Azure:\\\u003e Get-AzPublicIpAddress -ResourceGroupName \"sparkResourceGroup\" | Select \"Name\", \"IpAddress\"\n\nName                IpAddress\n----                ---------\nsparkNameDNS-master XXX.XXX.XXX.XXX\nsparkNameDNS-slave1 YYY.YYY.YYY.YYY\nsparkNameDNS-slave2 ZZZ.ZZZ.ZZZ.ZZZ\n~~~~\n\n2. Log in on each Virtual Machine and create a SSH Key without password:\n\n~~~~bash\nsparkadmin@master:~$ ssh-keygen -t rsa -b 2048\n~~~~\n\n~~~~bash\nsparkadmin@slave1:~$ ssh-keygen -t rsa -b 2048\n~~~~\n\n~~~~bash\nsparkadmin@slave2:~$ ssh-keygen -t rsa -b 2048\n~~~~\n\n3. Copy the public keys of each Virtual Machine at the ~/.ssh/authorized_keys file of every node. This way, Ansible will be able to execute the playbook without asking for a password.\n\nThe next steps are performed only on the master node:\n\n4. Install Ansible:\n\n~~~~bash\nsparkadmin@master:~$ sudo apt-get install ansible -y\n~~~~\n\n5. Copy the \"hosts\" file to /etc/ansible/. Also, copy the ansible.cfg file to the same directory, or just edit the file and change the \"host_key_checking\" option to \"false\".\n\n6. Deploy the Spark cluster running the \"deploySparkCluster.yml\" playbook:\n\n~~~~bash\nsparkadmin@master:~$ ansible-playbook deploySparkCluster.yml\n~~~~\n\n7. From any browser, access the Spark Dashboard via ```http://\u003cmaster_public_ip\u003e:8080```. The Worker's dashboard can be accessed via ```http://\u003cslave_public_ip\u003e:8081```.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielbmeireles%2Fazspark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielbmeireles%2Fazspark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielbmeireles%2Fazspark/lists"}