{"id":22154129,"url":"https://github.com/nathadriele/airflow-tableau-ec2-maintenance","last_synced_at":"2025-10-25T22:50:54.757Z","repository":{"id":251304629,"uuid":"837008496","full_name":"nathadriele/airflow-tableau-ec2-maintenance","owner":"nathadriele","description":"This project automates weekly maintenance for a Tableau server on an EC2 instance using Apache Airflow, ensuring optimal performance and reliability. The DAG performs disk cleanup and sends notifications with the results via AWS SNS.","archived":false,"fork":false,"pushed_at":"2024-08-02T04:04:46.000Z","size":29,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-29T19:11:51.150Z","etag":null,"topics":["apache-airflow","automation-scripts","aws-ec2","aws-sns","cleanup-script","data-engineering","monitoring-scripts","notifications","python","tableau-server"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nathadriele.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-02T02:55:36.000Z","updated_at":"2024-08-02T14:23:52.000Z","dependencies_parsed_at":"2024-08-02T04:41:38.843Z","dependency_job_id":"6b7d9d57-0a57-4b82-8865-7b97f8db01c5","html_url":"https://github.com/nathadriele/airflow-tableau-ec2-maintenance","commit_stats":null,"previous_names":["nathadriele/airflow-tableau-ec2-maintenance"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathadriele%2Fairflow-tableau-ec2-maintenance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathadriele%2Fairflow-tableau-ec2-maintenance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathadriele%2Fairflow-tableau-ec2-maintenance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathadriele%2Fairflow-tableau-ec2-maintenance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nathadriele","download_url":"https://codeload.github.com/nathadriele/airflow-tableau-ec2-maintenance/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245284736,"owners_count":20590307,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-airflow","automation-scripts","aws-ec2","aws-sns","cleanup-script","data-engineering","monitoring-scripts","notifications","python","tableau-server"],"created_at":"2024-12-02T01:41:04.294Z","updated_at":"2025-10-25T22:50:49.733Z","avatar_url":"https://github.com/nathadriele.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Apache Airflow Tableau EC2 Maintenance\n\n![image](https://github.com/user-attachments/assets/f5cdecfb-5913-4a4c-95b3-27171ba12be0)\n\n### Overview\n\nThis project automates weekly maintenance tasks for a Tableau server hosted on an EC2 instance using Apache Airflow. The maintenance workflow includes cleaning up temporary files, checking disk usage before and after the cleanup, and sending notifications with the results via AWS SNS.\n\n### Prerequisites\n\n- **Apache Airflow**: Ensure Airflow is installed and configured.\n- **AWS Credentials**: AWS access and secret keys should be configured for SNS.\n- **SSH Access**: Proper SSH access to the EC2 instance running the Tableau server.\n    - `Airflow Variables`: Set the following Airflow variables:\n    - `TABLEAU_SERVER_INSTANCE_IP`: IP address of the Tableau server EC2 instance.\n    - `SNS_TOPIC_ARN_TSM`: ARN of the SNS topic to send notifications.\n    - `tableau_ssh_conn`: Airflow SSH connection ID for the EC2 instance.\n\n### Installation\n\n1. Clone the repository:\n\n```py\ngit clone https://github.com/nathadriele/airflow-tableau-ec2-maintenance.git\ncd airflow-tableau-ec2-maintenance\n```\n\n2. Install required dependencies:\nEnsure you have the necessary Python packages installed, either through Airflow's requirements or manually:\n\n```py\npip install apache-airflow\npip install apache-airflow-providers-ssh\npip install apache-airflow-providers-amazon\n```\n\n3. Set Airflow Variables:\nUse the Airflow UI or CLI to set the required variables:\n\n```py\nairflow variables --set TABLEAU_SERVER_INSTANCE_IP \"your_instance_ip\"\nairflow variables --set SNS_TOPIC_ARN_TSM \"your_sns_topic_arn\"\n```\n\n### Usage\n\n1. Deploy the DAG:\nPlace the DAG file in your Airflow DAGs directory:\n\n```py\ncp tsm_cleanup_tableau_dag.py /path/to/your/airflow/dags/\n```\n\n2. Start Airflow:\nEnsure Airflow is running:\n\n```py\nairflow scheduler\nairflow webserver\n```\n\n3. Trigger the DAG:\n\nTrigger the DAG: The DAG is scheduled to run weekly on Mondays at 6 AM (for example). You can also trigger it manually via the Airflow UI.\n\n### Code Explanation\n\n#### Configuration and Constants\n\nThese variables configure the IP address, SSH connection, cleanup command, disk usage command, and SNS ARN for the DAG.\n\n```py\nINSTANCE_IP = Variable.get(\"TABLEAU_SERVER_INSTANCE_IP\")\nSSH_CONN_ID = 'tableau_ssh_conn'\nCLEANUP_COMMAND = \"tsm maintenance cleanup\"\nDISK_USAGE_COMMAND = \"df /dev/nvme0n1p1 | tail -1 | awk '{print $5}'\"\nSNS_ARN = Variable.get('SNS_TOPIC_ARN_TSM')\n```\n\n- `INSTANCE_IP`: Holds the IP address of the Tableau server, used for connecting via SSH.\n- `SSH_CONN_ID`: Contains the ID of the SSH connection configuration in Airflow.\n- `CLEANUP_COMMAND`: The command to execute Tableau's maintenance cleanup.\n- `DISK_USAGE_COMMAND`: Command to check disk usage on a specific partition.\n- `SNS_ARN`: ARN of the SNS topic for sending notifications.\n\n#### Default Arguments\n\nDefines the default arguments for the DAG tasks, such as the owner, retry behavior, and dependencies.\n\n#### Setting Dependencies\n\n```py\ncheck_disk_before_cleanup \u003e\u003e tsm_cleanup_task \u003e\u003e check_disk_after_cleanup\ntsm_cleanup_task \u003e\u003e send_sns_failure_task\n[check_disk_before_cleanup, tsm_cleanup_task, check_disk_after_cleanup] \u003e\u003e send_results_task\n```\n\nDefines the order in which tasks are executed and handles failure scenarios.\n\n#### Python Function send_sns_message\n\nThis function pulls disk usage results before and after cleanup from XCom, decodes the results, and sends a notification via SNS.\n\n#### DAG Definition\n\nDefines the DAG, including its schedule, description, and tags. It sets up the tasks and their dependencies.\n\n#### Tasks\n\n1. `Check Disk Usage Before Cleanup`\n2. `Execute TSM Maintenance Cleanup`\n3. `Check Disk Usage After Cleanup`\n4. `Send Failure Notification`\n5. `Send Results`\n\n### SNS Workflow in the Code\n\n**`Before Cleanup:`**\n\n**Task**: `check_disk_before_cleanup` executes an SSH command to check disk usage before the cleanup.\n**Result**: The result is stored and used later for comparison.\n\n**`Cleanup Execution:`**\n\n**Task**: `tsm_cleanup_task` executes the cleanup command on the Tableau server.\n**Result**: The cleanup is performed to optimize the server's performance.\n\n**`After Cleanup:`**\n\n**Task**: `check_disk_after_cleanup` executes an SSH command to check disk usage after the cleanup.\n**Result**: The result is compared with the disk usage before the cleanup.\n\n**`Sending Results:`**\n\n**Task**: `send_results_task` calls the send_sns_message function, which prepares and sends a message to the SNS topic with the cleanup results.\n**Objective**: Inform administrators about the success or failure of the cleanup task and its impact on disk usage.\n\n- **Failure Notification**:\n**Task**: `send_sns_failure_task` is triggered if the cleanup task fails, sending a failure notification to the SNS topic.\n\n### Result\n\n- After successful execution, the DAG will:\n1. Perform disk cleanup on the Tableau server.\n2. Check disk usage before and after cleanup.\n3. Send an SNS notification with the results.\n4. Log any errors encountered during the process.\n\n### Contribution to Data Engineering\n\nThis automation ensures regular maintenance of the Tableau server, leading to improved performance and reliability. By integrating Airflow and AWS SNS, the workflow enhances operational efficiency and provides timely notifications, contributing to proactive monitoring and management of data infrastructure.\n\n### Additional Information\n- `Logging`: Logs are available in the Airflow UI for each task, providing detailed insights into the execution.\n- `Customization`: Modify the schedule or commands as needed to fit your specific maintenance requirements.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnathadriele%2Fairflow-tableau-ec2-maintenance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnathadriele%2Fairflow-tableau-ec2-maintenance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnathadriele%2Fairflow-tableau-ec2-maintenance/lists"}