{"id":28646733,"url":"https://github.com/harisekhon/devops-python-tools","last_synced_at":"2025-06-13T02:06:48.272Z","repository":{"id":38391406,"uuid":"45049026","full_name":"HariSekhon/DevOps-Python-tools","owner":"HariSekhon","description":"80+ DevOps \u0026 Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters \u0026 Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.","archived":false,"fork":false,"pushed_at":"2025-04-25T17:42:23.000Z","size":3262,"stargazers_count":794,"open_issues_count":39,"forks_count":347,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-04-25T18:42:27.313Z","etag":null,"topics":["avro","aws","cloudformation","devops","docker","dockerhub","elasticsearch","gcf","gcp","hadoop","hbase","hdfs","json","linux","parquet","pyspark","python","solr","spark","travis-ci"],"latest_commit_sha":null,"homepage":"https://www.linkedin.com/in/HariSekhon","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HariSekhon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-10-27T15:06:56.000Z","updated_at":"2025-04-25T17:42:28.000Z","dependencies_parsed_at":"2023-01-31T15:31:10.496Z","dependency_job_id":"c5e2ef09-16cc-426d-9232-1a4813167e6b","html_url":"https://github.com/HariSekhon/DevOps-Python-tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HariSekhon/DevOps-Python-tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HariSekhon%2FDevOps-Python-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HariSekhon%2FDevOps-Python-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HariSekhon%2FDevOps-Python-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HariSekhon%2FDevOps-Python-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HariSekhon","download_url":"https://codeload.github.com/HariSekhon/DevOps-Python-tools/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HariSekhon%2FDevOps-Python-tools/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259565562,"owners_count":22877347,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avro","aws","cloudformation","devops","docker","dockerhub","elasticsearch","gcf","gcp","hadoop","hbase","hdfs","json","linux","parquet","pyspark","python","solr","spark","travis-ci"],"created_at":"2025-06-13T02:06:47.244Z","updated_at":"2025-06-13T02:06:48.235Z","avatar_url":"https://github.com/HariSekhon.png","language":"Python","readme":"# Hari Sekhon - DevOps Python Tools\n\n[![GitHub stars](https://img.shields.io/github/stars/harisekhon/devops-python-tools?logo=github)](https://github.com/HariSekhon/DevOps-Python-tools/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/harisekhon/devops-python-tools?logo=github)](https://github.com/HariSekhon/DevOps-Python-tools/network)\n[![LineCount](https://sloc.xyz/github/HariSekhon/DevOps-Python-tools/?badge-bg-color=2081C2)](https://github.com/boyter/scc/)\n[![Cocomo](https://sloc.xyz/github/HariSekhon/DevOps-Python-tools/?badge-bg-color=2081C2\u0026category=cocomo)](https://github.com/boyter/scc/)\n[![License](https://img.shields.io/github/license/HariSekhon/DevOps-Python-tools)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/LICENSE)\n[![My LinkedIn](https://img.shields.io/badge/LinkedIn%20Profile-HariSekhon-blue?logo=data:image/svg%2bxml;base64,PHN2ZyByb2xlPSJpbWciIGZpbGw9IiNmZmZmZmYiIHZpZXdCb3g9IjAgMCAyNCAyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48dGl0bGU+TGlua2VkSW48L3RpdGxlPjxwYXRoIGQ9Ik0yMC40NDcgMjAuNDUyaC0zLjU1NHYtNS41NjljMC0xLjMyOC0uMDI3LTMuMDM3LTEuODUyLTMuMDM3LTEuODUzIDAtMi4xMzYgMS40NDUtMi4xMzYgMi45Mzl2NS42NjdIOS4zNTFWOWgzLjQxNHYxLjU2MWguMDQ2Yy40NzctLjkgMS42MzctMS44NSAzLjM3LTEuODUgMy42MDEgMCA0LjI2NyAyLjM3IDQuMjY3IDUuNDU1djYuMjg2ek01LjMzNyA3LjQzM2MtMS4xNDQgMC0yLjA2My0uOTI2LTIuMDYzLTIuMDY1IDAtMS4xMzguOTItMi4wNjMgMi4wNjMtMi4wNjMgMS4xNCAwIDIuMDY0LjkyNSAyLjA2NCAyLjA2MyAwIDEuMTM5LS45MjUgMi4wNjUtMi4wNjQgMi4wNjV6bTEuNzgyIDEzLjAxOUgzLjU1NVY5aDMuNTY0djExLjQ1MnpNMjIuMjI1IDBIMS43NzFDLjc5MiAwIDAgLjc3NCAwIDEuNzI5djIwLjU0MkMwIDIzLjIyNy43OTIgMjQgMS43NzEgMjRoMjAuNDUxQzIzLjIgMjQgMjQgMjMuMjI3IDI0IDIyLjI3MVYxLjcyOUMyNCAuNzc0IDIzLjIgMCAyMi4yMjIgMGguMDAzeiIvPjwvc3ZnPgo=)](https://www.linkedin.com/in/HariSekhon/)\n[![GitHub Last Commit](https://img.shields.io/github/last-commit/HariSekhon/DevOps-Python-tools?logo=github)](https://github.com/HariSekhon/DevOps-Python-tools/commits/master)\n\u003c!-- doesn't include /tests?/ or comments\n[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=ncloc)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools)\n--\u003e\n\n\u003c!-- site broken\n[![PyUp](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/shield.svg)](https://pyup.io/account/repos/github/HariSekhon/DevOps-Python-tools/)\n[![Python 3](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/python-3-shield.svg)](https://pyup.io/repos/github/HariSekhon/DevOps-Python-tools/)\n--\u003e\n\n[![Codacy](https://app.codacy.com/project/badge/Grade/40a82d53f3394f4b99aa6eccb08e3c8d)](https://www.codacy.com/gh/HariSekhon/DevOps-Python-tools/dashboard)\n[![CodeFactor](https://www.codefactor.io/repository/github/harisekhon/DevOps-Python-tools/badge)](https://www.codefactor.io/repository/github/harisekhon/DevOps-Python-tools)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools)\n[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=sqale_rating)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools)\n[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=reliability_rating)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools)\n[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=security_rating)](https://sonarcloud.io/dashboard?id=HariSekhon_DevOps-Python-tools)\n[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=HariSekhon_DevOps-Python-tools\u0026metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=HariSekhon_DevOps-Python-tools)\n\n[![Linux](https://img.shields.io/badge/OS-Linux-blue?logo=linux)](https://github.com/HariSekhon/DevOps-Python-tools)\n[![Mac](https://img.shields.io/badge/OS-Mac-blue?logo=apple)](https://github.com/HariSekhon/DevOps-Python-tools)\n[![Docker](https://img.shields.io/badge/container-Docker-blue?logo=docker\u0026logoColor=white)](https://hub.docker.com/r/harisekhon/github/)\n[![Dockerfile](https://img.shields.io/badge/repo-Dockerfiles-blue?logo=docker\u0026logoColor=white)](https://github.com/HariSekhon/Dockerfiles)\n[![DockerHub Pulls](https://img.shields.io/docker/pulls/harisekhon/pytools?label=DockerHub%20pulls\u0026logo=docker\u0026logoColor=white)](https://hub.docker.com/r/harisekhon/pytools)\n[![DockerHub Build Automated](https://img.shields.io/docker/automated/harisekhon/pytools?logo=docker\u0026logoColor=white)](https://hub.docker.com/r/harisekhon/pytools/)\n[![StarTrack](https://img.shields.io/badge/Star-Track-blue?logo=github)](https://seladb.github.io/StarTrack-js/#/preload?r=HariSekhon,Nagios-Plugins\u0026r=HariSekhon,Dockerfiles\u0026r=HariSekhon,DevOps-Python-tools\u0026r=HariSekhon,DevOps-Perl-tools\u0026r=HariSekhon,DevOps-Bash-tools\u0026r=HariSekhon,HAProxy-configs\u0026r=HariSekhon,SQL-scripts)\n[![StarCharts](https://img.shields.io/badge/Star-Charts-blue?logo=github)](https://github.com/HariSekhon/DevOps-Bash-tools/blob/master/STARCHARTS.md)\n\u003c!-- these badges don't work any more\n[![Docker Build Status](https://img.shields.io/docker/build/harisekhon/pytools?logo=docker\u0026logoColor=white)](https://hub.docker.com/r/harisekhon/pytools/builds)\n[![MicroBadger](https://images.microbadger.com/badges/image/harisekhon/pytools.svg)](http://microbadger.com/#/images/harisekhon/pytools)\n--\u003e\n\n[![CI Builds Overview](https://img.shields.io/badge/CI%20Builds-Overview%20Page-blue?logo=circleci)](https://harisekhon.github.io/CI-CD/)\n[![Jenkins](https://img.shields.io/badge/Jenkins-ready-blue?logo=jenkins\u0026logoColor=white)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/Jenkinsfile)\n[![Concourse](https://img.shields.io/badge/Concourse-ready-blue?logo=concourse)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/cicd/.concourse.yml)\n[![GoCD](https://img.shields.io/badge/GoCD-ready-blue?logo=go)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/cicd/.gocd.yml)\n[![TeamCity](https://img.shields.io/badge/TeamCity-ready-blue?logo=teamcity)](https://github.com/HariSekhon/TeamCity-CI)\n\n[![CircleCI](https://circleci.com/gh/HariSekhon/DevOps-Python-tools.svg?style=svg)](https://circleci.com/gh/HariSekhon/DevOps-Python-tools)\n[![BuildKite](https://img.shields.io/buildkite/8377537d0d9dddf4bf32826a6bf1c4e9ab88bc265007e1882c/master?label=BuildKite\u0026logo=buildkite)](https://buildkite.com/hari-sekhon/devops-python-tools)\n[![AppVeyor](https://img.shields.io/appveyor/build/harisekhon/DevOps-Python-tools/master?logo=appveyor\u0026label=AppVeyor)](https://ci.appveyor.com/project/HariSekhon/DevOps-Python-tools/branch/master)\n[![Drone](https://img.shields.io/drone/build/HariSekhon/DevOps-Python-tools/master?logo=drone\u0026label=Drone)](https://cloud.drone.io/HariSekhon/DevOps-Python-tools)\n[![Codefresh](https://g.codefresh.io/api/badges/pipeline/harisekhon/GitHub%2FDevOps-Python-tools?branch=master\u0026key=eyJhbGciOiJIUzI1NiJ9.NWU1MmM5OGNiM2FiOWUzM2Y3ZDZmYjM3.O69674cW7vYom3v5JOGKXDbYgCVIJU9EWhXUMHl3zwA\u0026type=cf-1)](https://g.codefresh.io/pipelines/edit/new/builds?id=5e58e2e6353f5d1ada385bf2\u0026pipeline=DevOps-Python-tools\u0026projects=GitHub\u0026projectId=5e52ca8ea284e00f882ea992\u0026context=github\u0026filter=page:1;pageSize:10;timeFrameStart:week)\n[![Cirrus CI](https://img.shields.io/cirrus/github/HariSekhon/DevOps-Python-tools/master?logo=Cirrus%20CI\u0026label=Cirrus%20CI)](https://cirrus-ci.com/github/HariSekhon/DevOps-Python-tools)\n[![Semaphore](https://harisekhon.semaphoreci.com/badges/DevOps-Python-tools.svg)](https://harisekhon.semaphoreci.com/projects/DevOps-Python-tools)\n[![Buddy](https://img.shields.io/badge/Buddy-ready-1A86FD?logo=buddy)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/buddy.yml)\n[![Shippable](https://img.shields.io/badge/Shippable-legacy-lightgrey?logo=jfrog\u0026label=Shippable)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/shippable.yml)\n[![Travis CI](https://img.shields.io/badge/TravisCI-ready-blue?logo=travis\u0026label=Travis%20CI)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/.travis.yml)\n\n[![Azure DevOps Pipeline](https://dev.azure.com/harisekhon/GitHub/_apis/build/status/HariSekhon.DevOps-Python-tools?branchName=master)](https://dev.azure.com/harisekhon/GitHub/_build/latest?definitionId=8\u0026branchName=master)\n[![GitLab Pipeline](https://img.shields.io/badge/GitLab%20CI-legacy-lightgrey?logo=gitlab)](https://gitlab.com/HariSekhon/DevOps-Python-tools/pipelines)\n[![BitBucket Pipeline](https://img.shields.io/badge/Bitbucket%20CI-legacy-lightgrey?logo=bitbucket)](https://bitbucket.org/harisekhon/devops-python-tools/addon/pipelines/home#!/)\n[![AWS CodeBuild](https://img.shields.io/badge/AWS%20CodeBuild-ready-blue?logo=amazon%20aws)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/cicd/buildspec.yml)\n[![GCP Cloud Build](https://img.shields.io/badge/GCP%20Cloud%20Build-ready-blue?logo=google%20cloud\u0026logoColor=white)](https://github.com/HariSekhon/DevOps-Python-tools/blob/master/cicd/cloudbuild.yaml)\n\n[![Repo on GitHub](https://img.shields.io/badge/repo-GitHub-2088FF?logo=github)](https://github.com/HariSekhon/DevOps-Python-tools)\n[![Repo on GitLab](https://img.shields.io/badge/repo-GitLab-FCA121?logo=gitlab)](https://gitlab.com/HariSekhon/DevOps-Python-tools)\n[![Repo on Azure DevOps](https://img.shields.io/badge/repo-Azure%20DevOps-0078D7?logo=azure%20devops)](https://dev.azure.com/harisekhon/GitHub/_git/DevOps-Python-tools)\n[![Repo on BitBucket](https://img.shields.io/badge/repo-BitBucket-0052CC?logo=bitbucket)](https://bitbucket.org/HariSekhon/DevOps-Python-tools)\n\n[![ShellCheck](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/shellcheck.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/shellcheck.yaml)\n[![JSON](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/json.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/json.yaml)\n[![YAML](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/yaml.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/yaml.yaml)\n[![XML](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/xml.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/xml.yaml)\n[![Markdown](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/markdown.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/markdown.yaml)\n[![Validation](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/validate.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/validate.yaml)\n[![Kics](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/kics.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/kics.yaml)\n[![Grype](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/grype.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/grype.yaml)\n[![Semgrep](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/semgrep.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/semgrep.yaml)\n[![Semgrep Cloud](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/semgrep-cloud.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/semgrep-cloud.yaml)\n[![Trivy](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/trivy.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/trivy.yaml)\n\n[![Docker Build (Alpine)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_alpine.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_alpine.yaml)\n[![Docker Build (Debian)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_debian.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_debian.yaml)\n[![Docker Build (Fedora)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_fedora.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_fedora.yaml)\n[![Docker Build (Ubuntu)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_ubuntu.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/docker_pytools_ubuntu.yaml)\n\n[![GitHub Actions Ubuntu](https://github.com/HariSekhon/DevOps-Python-tools/workflows/GitHub%20Actions%20Ubuntu/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22GitHub+Actions+Ubuntu%22)\n[![Mac](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac.yaml)\n[![Mac 11](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac_11.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac_11.yaml)\n[![Mac 12](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac_12.yaml/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions/workflows/mac_12.yaml)\n[![Ubuntu](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Ubuntu/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Ubuntu%22)\n[![Ubuntu 20.04](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Ubuntu%2020.04/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Ubuntu+20.04%22)\n[![Ubuntu 22.04](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Ubuntu%2022.04/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Ubuntu+22.04%22)\n[![Debian](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Debian/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Debian%22)\n[![Debian 10](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Debian%2010/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Debian+10%22)\n[![Debian 11](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Debian%2011/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Debian+11%22)\n[![Debian 12](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Debian%2012/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Debian+12%22)\n[![Fedora](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Fedora/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Fedora%22)\n[![Alpine](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Alpine/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Alpine%22)\n[![Alpine 3](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Alpine%203/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Alpine+3%22)\n\n[![Python versions](https://img.shields.io/badge/Python-2.7+-3776AB?logo=python\u0026logoColor=white)](https://github.com/HariSekhon/DevOps-Python-tools)\n[![Python 3.7](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Python%203.7/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Python+3.7%22)\n[![Python 3.8](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Python%203.8/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Python+3.8%22)\n[![Python 3.9](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Python%203.9/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Python+3.9%22)\n[![Python 3.10](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Python%203.10/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Python+3.10%22)\n[![Python 3.11](https://github.com/HariSekhon/DevOps-Python-tools/workflows/Python%203.11/badge.svg)](https://github.com/HariSekhon/DevOps-Python-tools/actions?query=workflow%3A%22Python+3.11%22)\n\n[git.io/pytools](https://git.io/pytools)\n\n## AWS, Docker, Spark, Hadoop, HBase, Hive, Impala, Python \u0026 Linux Tools\n\nDevOps, Cloud, Big Data, NoSQL, Python \u0026 Linux tools. All programs have `--help`.\n\nHari Sekhon\n\nCloud \u0026 Big Data Contractor, United Kingdom\n\n[![My LinkedIn](https://img.shields.io/badge/LinkedIn%20Profile-HariSekhon-blue?logo=data:image/svg%2bxml;base64,PHN2ZyByb2xlPSJpbWciIGZpbGw9IiNmZmZmZmYiIHZpZXdCb3g9IjAgMCAyNCAyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48dGl0bGU+TGlua2VkSW48L3RpdGxlPjxwYXRoIGQ9Ik0yMC40NDcgMjAuNDUyaC0zLjU1NHYtNS41NjljMC0xLjMyOC0uMDI3LTMuMDM3LTEuODUyLTMuMDM3LTEuODUzIDAtMi4xMzYgMS40NDUtMi4xMzYgMi45Mzl2NS42NjdIOS4zNTFWOWgzLjQxNHYxLjU2MWguMDQ2Yy40NzctLjkgMS42MzctMS44NSAzLjM3LTEuODUgMy42MDEgMCA0LjI2NyAyLjM3IDQuMjY3IDUuNDU1djYuMjg2ek01LjMzNyA3LjQzM2MtMS4xNDQgMC0yLjA2My0uOTI2LTIuMDYzLTIuMDY1IDAtMS4xMzguOTItMi4wNjMgMi4wNjMtMi4wNjMgMS4xNCAwIDIuMDY0LjkyNSAyLjA2NCAyLjA2MyAwIDEuMTM5LS45MjUgMi4wNjUtMi4wNjQgMi4wNjV6bTEuNzgyIDEzLjAxOUgzLjU1NVY5aDMuNTY0djExLjQ1MnpNMjIuMjI1IDBIMS43NzFDLjc5MiAwIDAgLjc3NCAwIDEuNzI5djIwLjU0MkMwIDIzLjIyNy43OTIgMjQgMS43NzEgMjRoMjAuNDUxQzIzLjIgMjQgMjQgMjMuMjI3IDI0IDIyLjI3MVYxLjcyOUMyNCAuNzc0IDIzLjIgMCAyMi4yMjIgMGguMDAzeiIvPjwvc3ZnPgo=)](https://www.linkedin.com/in/HariSekhon/)\n\u003cbr\u003e*(you're welcome to connect with me on LinkedIn)*\n\n**Make sure you run `make update` if updating and not just `git pull` as you will often need the latest library submodule and possibly new upstream libraries**\n\n## Quick Start\n\n### Ready to run Docker image\n\nAll programs and their pre-compiled dependencies can be found ready to run on [DockerHub](https://hub.docker.com/r/harisekhon/pytools/).\n\nList all programs:\n\n```shell\ndocker run harisekhon/pytools\n```\n\nRun any given program:\n\n```shell\ndocker run harisekhon/pytools \u003cprogram\u003e \u003cargs\u003e\n```\n\n### Automated Build from source\n\ninstalls git, make, pulls the repo and build the dependencies:\n\n```shell\ncurl -L https://git.io/python-bootstrap | sh\n```\n\nor manually:\n\n```shell\ngit clone https://github.com/HariSekhon/DevOps-Python-tools pytools\ncd pytools\nmake\n```\n\nTo only install pip dependencies for a single script, you can just type make and the filename with a `.pyc` extension\ninstead of `.py`:\n\n```shell\nmake anonymize.pyc\n```\n\nMake sure to read [Detailed Build Instructions](https://github.com/HariSekhon/DevOps-Python-tools#detailed-build-instructions) further down for more information.\n\nSome Hadoop tools with require Jython, see [Jython for Hadoop Utils](https://github.com/HariSekhon/DevOps-Python-tools#jython-for-hadoop-utils) for details.\n\n### Usage\n\nAll programs come with a `--help` switch which includes a program description and the list of command line options.\n\nEnvironment variables are supported for convenience and also to hide credentials from being exposed in the process list\neg. `$PASSWORD`, `$TRAVIS_TOKEN`. These are indicated in the `--help` descriptions in brackets next to each option and\noften have more specific overrides with higher precedence eg. `$AMBARI_HOST`, `$HBASE_HOST` take priority over `$HOST`.\n\n### DevOps Python Tools - Inventory\n\n- Linux:\n  - `anonymize.py` - anonymizes your configs / logs from files or stdin (for pasting to Apache Jira tickets or mailing\n  - lists)\n    - anonymizations include these and more:\n      - hostnames / domains / FQDNs\n      - email addresses\n      - IP + MAC addresses\n      - AWS Access Keys, Secret Keys, ARNs, STS tokens\n      - Kerberos principals\n      - LDAP sensitive fields (eg. CN, DN, OU, UID, sAMAccountName, member, memberOf...)\n      - Cisco \u0026 Juniper ScreenOS configurations passwords, shared keys and SNMP strings\n    - `anonymize_custom.conf` - put regex of your Name/Company/Project/Database/Tables to anonymize to `\u003ccustom\u003e`\n    - placeholder tokens indicate what was stripped out (eg. `\u003cfqdn\u003e`, `\u003cpassword\u003e`, `\u003ccustom\u003e`)\n    - `--ip-prefix` leaves the last IP octect to aid in cluster debugging to still see differentiated nodes\n      communicating with each other to compare configs and log communications\n    - `--hash-hostnames` - hashes hostnames to look like Docker temporary container ID hostnames so that vendors support\n      teams can differentiate hosts in clusters\n    - `anonymize_parallel.sh` - splits files in to multiple parts and runs `anonymize.py` on each part in parallel\n      before re-joining back in to a file of the same name with a `.anonymized` suffix. Preserves order of evaluation\n      important for anonymization rules, as well as maintaining file content order. On servers this parallelization can\n      result in a 30x speed up for large log files\n  - `find_duplicate_files.py` - finds duplicate files in one or more directory trees via multiple methods including file\n    basename, size, MD5 comparison of same sized files, or bespoke regex capture of partial file basename\n  - `find_active_server.py` - finds fastest responding healthy server or active master in high availability deployments,\n    useful for scripting against clustered technologies (eg. Elasticsearch, Hadoop, HBase, Cassandra etc).\n    Multi-threaded for speed and highly configurable - socket, http, https, ping, url and/or regex content match. See\n    further down for more details and sub-programs that simplify usage for many of the most common cluster technologies\n  - `welcome.py` - cool spinning welcome message greeting your username and showing last login time and user to put in\n    your shell's `.profile` (there is also a perl version in my [DevOps Perl Tools](https://github.com/harisekhon/perl-tools) repo)\n- [Amazon Web Services](https://aws.amazon.com/):\n  - `aws_users_access_key_age.py` - lists all users access keys, status, date of creation and age in days. Optionally\n    filters for active keys and older than N days (for key rotation governance)\n  - `aws_users_unused_access_keys.py` - lists users access keys that haven't been used in the last N days or that have\n    never been used (these should generally be removed/disabled). Optionally filters for only active keys\n  - `aws_users_last_used.py` - lists all users and their days since last use across both passwords and access keys.\n    Optionally filters for users not used in the last N days to find old accounts to remove\n  - `aws_users_pw_last_used.py` - lists all users and dates since their passwords were last used. Optionally filters for\n    users with passwords not used in the last N days\n- [Google Cloud Platform](https://cloud.google.com/):\n  - [GCF](https://cloud.google.com/functions) - Google Cloud Functions written in Python:\n    - [gcp_cloud_function_sql_export/](https://github.com/HariSekhon/DevOps-Python-tools/tree/master/gcp_cloud_function_sql_export) - runs [Cloud SQL](https://cloud.google.com/sql) export backups to\n      [GCS](https://cloud.google.com/storage), subscribing to [PubSub](https://cloud.google.com/pubsub) topic that is\n      triggered by [Cloud Scheduler](https://cloud.google.com/scheduler)\n      - see the [DevOps Bash tools](https://github.com/HariSekhon/DevOps-Bash-tools/) repo for several related GCP SQL to set up service account permissions and\n        [Cloud Scheduler](https://cloud.google.com/scheduler) jobs\n    - [gcp_cloud_function_ifconfig/](https://github.com/HariSekhon/DevOps-Python-tools/tree/master/gcp_cloud_function_ifconfig) - debug your cloud function public networking by determining its public IP\n      address - use this to test your VPC connector public routing, comparison with firewall rules etc.\n    - [gcp_cloud_function_proxy/](https://github.com/HariSekhon/DevOps-Python-tools/tree/master/gcp_cloud_function_proxy) - debug your cloud function networking by querying a given URL to check its\n      accessibility, returning the HTTP status code and content. Use this to validate access through firewall rules via\n      VPC connector routing\n  - `gcp_service_account_credential_keys.py` - lists all GCP service account credential keys for a given project with\n    their age and expiry details, optionally filtering by non-expiring, already expired, or will expire within N days\n- [Docker](https://www.docker.com/):\n  - `docker_registry_show_tags.py` / `dockerhub_show_tags.py` / `quay_show_tags.py` - shows tags for docker repos in a\n    docker registry or on [DockerHub](https://hub.docker.com/u/harisekhon/) or [Quay.io](https://quay.io/) - Docker CLI doesn't support this yet but it's a very\n    useful thing to be able to see live on the command line or use in shell scripts (use `-q`/`--quiet` to return only\n    the tags for easy shell scripting). You can use this to pre-download all tags of a docker image before running tests\n    across versions in a simple bash for loop, eg. `docker_pull_all_tags.sh`\n  - `dockerhub_search.py` - search DockerHub with a configurable number of returned results (older official\n    `docker search` was limited to only 25 results), using `--verbose` will also show you how many results were returned\n    to the termainal and how many DockerHub has in total (use `-q / --quiet` to return only the image names for easy\n    shell scripting). This can be used to download all of my DockerHub images in a simple bash for loop eg.\n    `docker_pull_all_images.sh` and can be chained with `dockerhub_show_tags.py` to download all tagged versions for all\n    docker images eg. `docker_pull_all_images_all_tags.sh`\n  - `dockerfiles_check_git*.py` - check Git tags \u0026 branches align with the containing Dockerfile's `ARG *_VERSION`\n- [Spark](https://spark.apache.org/) \u0026 Data Format Converters:\n  - `spark_avro_to_parquet.py` - PySpark Avro =\u003e Parquet converter\n  - `spark_parquet_to_avro.py` - PySpark Parquet =\u003e Avro converter\n  - `spark_csv_to_avro.py` - PySpark CSV =\u003e Avro converter, supports both inferred and explicit schemas\n  - `spark_csv_to_parquet.py` - PySpark CSV =\u003e Parquet converter, supports both inferred and explicit schemas\n  - `spark_json_to_avro.py` - PySpark JSON =\u003e Avro converter\n  - `spark_json_to_parquet.py` - PySpark JSON =\u003e Parquet converter\n  - `xml_to_json.py` - XML to JSON converter\n  - `json_to_xml.py` - JSON to XML converter\n  - `json_to_yaml.py` - JSON to YAML converter\n  - `json_docs_to_bulk_multiline.py` - converts json files to bulk multi-record one-line-per-json-document format for\n    pre-processing and loading to big data systems like [Hadoop](http://hadoop.apache.org/) and\n    [MongoDB](https://www.mongodb.com/), can recurse directory trees, and mix json-doc-per-file / bulk-multiline-json /\n    directories / standard input, combines all json documents and outputs bulk-one-json-document-per-line to standard\n    output for convenient command line chaining and redirection, optionally continues on error, collects broken records\n    to standard error for logging and later reprocessing for bulk batch jobs, even supports single quoted json while not\n    technically valid json is used by MongoDB and even handles embedded double quotes in 'single quoted json'\n  - `yaml_to_json.py` - YAML to JSON converter (because some APIs like GitLab CI Validation API require JSON)\n  - see also `validate_*.py` further down for all these formats and more\n- [Hadoop](http://hadoop.apache.org/) ecosystem \u0026 NoSQL:\n  - [Ambari](https://hortonworks.com/apache/ambari/):\n    - `ambari_blueprints.py` - Blueprint cluster templating and deployment tool using Ambari API\n      - list blueprints\n      - fetch all blueprints or a specific blueprint to local json files\n      - blueprint an existing cluster\n      - create a new cluster using a blueprint\n      - sorts and prettifies the resulting JSON template for deterministic config and line-by-line diff necessary for\n        proper revision control\n      - optionally strips out the excessive and overly specific configs to create generic more reusable templates\n      - see the `ambari_blueprints/` directory for a variety of Ambari blueprint templates generated by and deployable\n        using this tool\n    - `ambari_ams_*.sh` - query the Ambari Metrics Collector API for a given metrics, list all metrics or hosts\n    - `ambari_cancel_all_requests.sh` - cancel all ongoing operations using the Ambari API\n    - `ambari_trigger_service_checks.py` - trigger service checks using the Ambari API\n  - [Hadoop](http://hadoop.apache.org/) HDFS:\n    - `hdfs_find_replication_factor_1.py` - finds HDFS files with replication factor 1, optionally resetting them to\n      replication factor 3 to avoid missing block alerts during datanode maintenance windows\n    - `hdfs_time_block_reads.jy` - HDFS per-block read timing debugger with datanode and rack locations for a given file\n      or directory tree. Reports the slowest Hadoop datanodes in descending order at the end. Helps find cluster data\n      layer bottlenecks such as slow datanodes, faulty hardware or misconfigured top-of-rack switch ports.\n    - `hdfs_files_native_checksums.jy` - fetches native HDFS checksums for quicker file comparisons (about 100x faster\n      than doing `hdfs dfs -cat | md5sum`)\n    - `hdfs_files_stats.jy` - fetches HDFS file stats. Useful to generate a list of all files in a directory tree\n      showing block size, replication factor, underfilled blocks and small files\n  - [Hive](https://hive.apache.org/) / [Impala](https://impala.apache.org/):\n    - `hive_schemas_csv.py` / `impala_schemas_csv.py` - dumps all databases, tables, columns and types out in CSV format\n      to standard output\n\n    The following programs can all optionally filter by database / table name regex:\n\n    - `hive_foreach_table.py` / `impala_foreach_table.py` - execute any query or statement against every Hive / Impala\n      table\n    - `hive_tables_row_counts.py` / `impala_tables_row_counts.py` - outputs tables row counts. Useful for reconciliation\n      between cluster migrations\n    - `hive_tables_column_counts.py` / `impala_tables_column_counts.py` - outputs tables column counts. Useful for\n      finding unusually wide tables\n    - `hive_tables_row_column_counts.py` / `impala_tables_row_column_counts.py` - outputs tables row and column counts.\n      Useful for finding unusually big tables\n    - `hive_tables_row_counts_any_nulls.py` / `impala_tables_row_counts_any_nulls.py` - outputs tables row counts where\n      any field is NULL. Useful for reconciliation between cluster migrations or catching data quality problems or\n      subtle ETL bugs\n    - `hive_tables_null_columns.py` / `impala_tables_null_columns.py` - outputs tables columns containing only NULLs.\n      Useful for catching data quality problems or subtle ETL bugs\n    - `hive_tables_null_rows.py` / `impala_tables_null_rows.py` - outputs tables row counts where all fields contain\n      NULLs. Useful for catching data quality problems or subtle ETL bugs\n    - `hive_tables_metadata.py` / `impala_tables_metadata.py` - outputs for each table the matching regex metadata DDL\n      property from describe table\n    - `hive_tables_locations.py` / `impala_tables_locations.py` - outputs for each table its data location\n  - [HBase](https://hbase.apache.org/):\n    - `hbase_generate_data.py` - inserts random generated data in to a given [HBase](https://hbase.apache.org/) table,\n      with optional skew support with configurable skew percentage. Useful for testing region splitting, balancing, CI\n      tests etc. Outputs stats for number of rows written, time taken, rows per sec and volume per sec written.\n    - `hbase_show_table_region_ranges.py` - dumps HBase table region ranges information, useful when pre-splitting\n      tables\n    - `hbase_table_region_row_distribution.py` - calculates the distribution of rows across regions in an HBase table,\n      giving per region row counts and % of total rows for the table as well as median and quartile row counts per\n      regions\n    - `hbase_table_row_key_distribution.py` - calculates the distribution of row keys by configurable prefix length in\n      an HBase table, giving per prefix row counts and % of total rows for the table as well as median and quartile row\n      counts per prefix\n    - `hbase_compact_tables.py` - compacts HBase tables (for off-peak compactions). Defaults to finding and iterating\n      on all tables or takes an optional regex and compacts only matching tables.\n    - `hbase_flush_tables.py` - flushes HBase tables. Defaults to finding and iterating on all tables or takes an\n      optional regex and flushes only matching tables.\n    - `hbase_regions_by_*size.py` - queries given RegionServers JMX to lists topN regions by storeFileSize or\n      memStoreSize, ascending or descending\n    - `hbase_region_requests.py` - calculates requests per second per region across all given RegionServers or average\n      since RegionServer startup, configurable intervals and count, can filter to any combination of reads / writes /\n      total requests per second. Useful for watching more granular region stats to detect region hotspotting\n    - `hbase_regionserver_requests.py` - calculates requests per regionserver second across all given regionservers or\n      average since regionserver(s) startup(s), configurable interval and count, can filter to any combination of read,\n      write, total, rpcScan, rpcMutate, rpcMulti, rpcGet, blocked per second. Useful for watching more granular\n      RegionServer stats to detect RegionServer hotspotting\n    - `hbase_regions_least_used.py` - finds topN biggest/smallest regions across given RegionServers than have received\n      the least requests (requests below a given threshold)\n  - [OpenTSDB](http://opentsdb.net/):\n    - `opentsdb_import_metric_distribution.py` - calculates metric distribution in bulk import file(s) to find data skew\n      and help avoid HBase region hotspotting\n    - `opentsdb_list_metrics*.sh` - lists OpenTSDB metric names, tagk or tagv via OpenTSDB API or directly from HBase\n      tables with optionally their created date, sorted ascending\n  - [Pig](https://pig.apache.org/)\n    - `pig-text-to-elasticsearch.pig` - bulk index unstructured files in [Hadoop](http://hadoop.apache.org/) to\n      [Elasticsearch](https://www.elastic.co/products/elasticsearch)\n    - `pig-text-to-solr.pig` - bulk index unstructured files in [Hadoop](http://hadoop.apache.org/) to\n      [Solr](http://lucene.apache.org/solr/) / [SolrCloud clusters](https://wiki.apache.org/solr/SolrCloud)\n    - `pig_udfs.jy` - Pig Jython UDFs for Hadoop\n- `find_active_server.py` - returns first available healthy server or active master in high availability deployments,\n  useful for chaining with single argument tools. Configurable tests include socket, http, https, ping, url and/or regex\n  content match, multi-threaded for speed. Designed to extend tools that only accept a single `--host` option but for\n  which the technology has later added multi-master support or active-standby masters (eg. Hadoop, HBase) or where you\n  want to query cluster wide information available from any online peer (eg. Elasticsearch)\n  - The following are simplified specialisations of the above program, just pass host arguments, all the details have\n    been baked in, no switches required\n    - `find_active_hadoop_namenode.py` - returns active [Hadoop](http://hadoop.apache.org/) Namenode in HDFS HA\n    - `find_active_hadoop_resource_manager.py` - returns active [Hadoop](http://hadoop.apache.org/) Resource Manager in Yarn HA\n    - `find_active_hbase_master.py` - returns active [HBase](https://hbase.apache.org/) Master in HBase HA\n    - `find_active_hbase_thrift.py` - returns first available [HBase](https://hbase.apache.org/) Thrift Server (run\n      multiple of these for load balancing)\n    - `find_active_hbase_stargate.py` - returns first available [HBase](https://hbase.apache.org/) Stargate rest server\n      (run multiple of these for load balancing)\n    - `find_active_apache_drill.py` - returns first available [Apache Drill](https://drill.apache.org/) node\n    - `find_active_cassandra.py` - returns first available [Apache Cassandra](https://cassandra.apache.org/) node\n    - `find_active_impala*.py` - returns first available [Impala](https://impala.apache.org/) node of either Impalad,\n      Catalog or Statestore\n    - `find_active_presto_coordinator.py` - returns first available [Presto](https://prestodb.io/) Coordinator\n    - `find_active_kubernetes_api.py` - returns first available [Kubernetes](https://kubernetes.io/) API server\n    - `find_active_oozie.py` - returns first active [Oozie](http://oozie.apache.org/) server\n    - `find_active_solrcloud.py` - returns first available [Solr](http://lucene.apache.org/solr/) / [SolrCloud](https://wiki.apache.org/solr/SolrCloud) node\n    - `find_active_elasticsearch.py` - returns first available [Elasticsearch](https://www.elastic.co/products/elasticsearch) node\n    - see also: [Advanced HAProxy configurations](https://github.com/HariSekhon/HAProxy-configs) which are part of the\n      [Advanced Nagios Plugins Collection](https://github.com/HariSekhon/Nagios-Plugins)\n- [Travis CI](https://travis-ci.org/):\n  - `travis_last_log.py` - fetches [Travis CI](https://travis-ci.org/) latest running / completed / failed build log for given repo -\n    useful for quickly getting the log of the last failed build when CCMenu or BuildNotify applets turn red\n  - `travis_debug_session.py` - launches a [Travis CI](https://travis-ci.org/) interactive debug build session via Travis API, tracks\n    session creation and drops user straight in to the SSH shell on the remote Travis build, very convenient one shot\n    debug launcher for Travis CI\n- `selenium_hub_browser_test.py` - checks [Selenium Grid Hub / Selenoid](https://www.selenium.dev/documentation/en/grid/) is working by calling browsers such as\n  Chrome and Firefox to fetch a given URL and content/regex match the result\n- Data Validation (useful in CI):\n  - `validate_*.py` - validate files, directory trees and/or standard input streams\n    - supports the following file formats:\n      - Avro\n      - CSV\n      - INI / Java Properties (also detects duplicate sections and duplicate keys within sections)\n      - JSON (both normal and json-doc-per-line bulk / big data format as found in MongoDB and Hadoop json data files)\n      - LDAP LDIF\n      - Parquet\n      - XML\n      - YAML\n    - directories are recursed, testing any files with relevant matching extensions (`.avro`, `.csv`, `json`, `parquet`,\n      `.ini`/`.properties`, `.ldif`, `.xml`, `.yml`/`.yaml`)\n    - used for Continuous Integration tests of various adjacent Spark data converters as well as configuration files for\n      things like Presto, Ambari, Apache Drill etc found in my [DockerHub](https://hub.docker.com/u/harisekhon/) images\n      [Dockerfiles master repo](https://github.com/HariSekhon/Dockerfiles) which contains docker builds and configurations for many open source Big Data \u0026\n      Linux technologies\n\n### Detailed Build Instructions\n\n#### Python VirtualEnv localized installs\n\nThe automated build will use 'sudo' to install required Python PyPI libraries to the system unless running as root or it\ndetects being inside a VirtualEnv. If you want to install some of the common Python libraries using your OS packages\ninstead of installing from PyPI then follow the Manual Build section below.\n\n### Manual Setup\n\nEnter the pytools directory and run git submodule init and git submodule update to fetch my library repo:\n\n```shell\ngit clone https://github.com/HariSekhon/DevOps-Python-tools pytools\ncd pytools\ngit submodule init\ngit submodule update\nsudo pip install -r requirements.txt\n```\n\n### Offline Setup\n\nDownload the DevOps Python Tools and Pylib git repos as zip files:\n\n\u003chttps://github.com/HariSekhon/DevOps-Python-tools/archive/master.zip\u003e\n\n\u003chttps://github.com/HariSekhon/pylib/archive/master.zip\u003e\n\nUnzip both and move Pylib to the `pylib` folder under DevOps Python Tools.\n\n```shell\nunzip devops-python-tools-master.zip\nunzip pylib-master.zip\n\nmv -v devops-python-tools-master pytools\nmv -v pylib-master pylib\nmv -vf pylib pytools/\n```\n\nProceed to install PyPI modules for whichever programs you want to use using your usual procedure - usually an internal\nmirror or proxy server to PyPI, or rpms / debs (some libraries are packaged by Linux distributions).\n\nAll PyPI modules are listed in the `requirements.txt` and `pylib/requirements.txt` files.\n\nInternal Mirror example ([JFrog Artifactory](https://jfrog.com/artifactory/) or similar):\n\n```shell\nsudo pip install --index https://host.domain.com/api/pypi/repo/simple --trusted host.domain.com -r requirements.txt\n```\n\nProxy example:\n\n```shell\nsudo pip install --proxy hari:mypassword@proxy-host:8080 -r requirements.txt\n```\n\n#### Mac OS X\n\nThe automated build also works on Mac OS X but you'll need to install [Apple XCode](https://developer.apple.com/download/) (on recent Macs just typing\n`git` is enough to trigger Xcode install).\n\nI also recommend you get [HomeBrew](https://brew.sh/) to install other useful tools and libraries you may need like OpenSSL for\ndevelopment headers and tools such as wget (these are installed automatically if Homebrew is detected on Mac OS X):\n\n```shell\nbash-tools/install/install_homebrew.sh\n```\n\n```shell\nbrew install openssl wget\n```\n\nIf failing to build an OpenSSL lib dependency, just prefix the build command like so:\n\n```shell\nsudo OPENSSL_INCLUDE=/usr/local/opt/openssl/include OPENSSL_LIB=/usr/local/opt/openssl/lib ...\n```\n\nYou may get errors trying to install to Python library paths even as root on newer versions of Mac, sometimes this is\ncaused by pip 10 vs pip 9 and downgrading will work around it:\n\n```shell\nsudo pip install --upgrade pip==9.0.1\nmake\nsudo pip install --upgrade pip\nmake\n```\n\n### Jython for Hadoop Utils\n\nThe 3 Hadoop utility programs listed below require Jython (as well as Hadoop to be installed and correctly configured)\n\n```shell\nhdfs_time_block_reads.jy\nhdfs_files_native_checksums.jy\nhdfs_files_stats.jy\n```\n\nRun like so:\n\n```shell\njython -J-cp $(hadoop classpath) hdfs_time_block_reads.jy --help\n```\n\nThe `-J-cp $(hadoop classpath)` part dynamically inserts the current Hadoop java classpath required to use the Hadoop\nAPIs.\n\nSee below for procedure to install Jython if you don't already have it.\n\n#### Automated Jython Install\n\nThis will download and install jython to /opt/jython-2.7.0:\n\n```shell\nmake jython\n```\n\n#### Manual Jython Install\n\nJython is a simple download and unpack and can be fetched from \u003chttp://www.jython.org/downloads.html\u003e\n\nThen add the Jython install bin directory to the $PATH or specify the full path to the `jython` binary, eg:\n\n```shell\n/opt/jython-2.7.0/bin/jython hdfs_time_block_reads.jy ...\n```\n\n### Configuration for Strict Domain / FQDN validation\n\nStrict validations include host/domain/FQDNs using TLDs which are populated from the official IANA list is done via my\n[PyLib](https://github.com/HariSekhon/pylib) library submodule - see there for details on configuring this to permit custom TLDs like `.local`,\n`.intranet`, `.vm`, `.cloud` etc. (all already included in there because they're common across companies internal\nenvironments).\n\n### Python SSL certificate verification problems\n\nIf you end up with an error like:\n\n```shell\n./dockerhub_show_tags.py centos ubuntu\n[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765)\n```\n\nIt can be caused by an issue with the underlying Python + libraries due to changes in OpenSSL and certificates. One\nquick fix is to do the following:\n\n```shell\nsudo pip uninstall -y certifi \u0026\u0026\nsudo pip install certifi==2015.04.28\n```\n\n### Updating\n\nRun:\n\n```shell\nmake update\n```\n\nThis will git pull and then git submodule update which is necessary to pick up corresponding library updates.\n\nIf you update often and want to just quickly git pull + submodule update but skip rebuilding all those dependencies each\ntime then run `make update-no-recompile` (will miss new library dependencies - do full `make update` if you encounter\nissues).\n\n### Testing\n\n[Continuous Integration](https://travis-ci.org/HariSekhon/devops-python-tools) is run on this repo with tests for success and failure scenarios:\n\n- unit tests for the custom supporting [python library](https://github.com/HariSekhon/pylib)\n- integration tests of the top level programs using the libraries for things like option parsing\n- [functional tests](https://github.com/HariSekhon/DevOps-Python-tools/tree/master/tests) for the top level programs using local test data and [Docker containers](https://hub.docker.com/u/harisekhon/)\n\nTo trigger all tests run:\n\n```shell\nmake test\n```\n\nwhich will start with the underlying libraries, then move on to top level integration tests and functional tests using\ndocker containers if docker is available.\n\n### Contributions\n\nPatches, improvements and even general feedback are welcome in the form of GitHub pull requests and issue tickets.\n\nYou might also be interested in the following really nice Jupyter notebook for HDFS space analysis created by another\nHortonworks guy Jonas Straub:\n\n\u003chttps://github.com/mr-jstraub/HDFSQuota/blob/master/HDFSQuota.ipynb\u003e\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=HariSekhon/DevOps-Python-tools\u0026type=Date)](https://star-history.com/#HariSekhon/DevOps-Python-tools\u0026Date)\n\n[git.io/python-tools](https://git.io/python-tools)\n\n[git.io/pytools](https://git.io/pytools)\n\n## More Core Repos\n\n\u003c!-- OTHER_REPOS_START --\u003e\n\n### Knowledge\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Knowledge-Base\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Knowledge-Base)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Diagrams-as-Code\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Diagrams-as-Code)\n\n\u003c!--\n\nNot support on GitHub Markdown:\n\n\u003ciframe src=\"https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md\" width=\"100%\" height=\"500px\"\u003e\u003c/iframe\u003e\n\nDoes nothing:\n\n\u003cembed src=\"https://raw.githubusercontent.com/HariSekhon/HariSekhon/main/knowledge.md\" width=\"100%\" height=\"500px\" /\u003e\n\n--\u003e\n\n### DevOps Code\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=DevOps-Bash-tools\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/DevOps-Bash-tools)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=DevOps-Python-tools\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/DevOps-Python-tools)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=DevOps-Perl-tools\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/DevOps-Perl-tools)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=DevOps-Golang-tools\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/DevOps-Golang-tools)\n\n\u003c!--\n[![Gist Card](https://github-readme-stats.vercel.app/api/gist?id=f8f551332440f1ca8897ff010e363e03)](https://gist.github.com/HariSekhon/f8f551332440f1ca8897ff010e363e03)\n--\u003e\n\n### Containerization\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Kubernetes-configs\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Kubernetes-configs)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Dockerfiles\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Dockerfiles)\n\n### CI/CD\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=GitHub-Actions\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/GitHub-Actions)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Jenkins\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Jenkins)\n\n### DBA - SQL\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=SQL-scripts\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/SQL-scripts)\n\n### DevOps Reloaded\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Nagios-Plugins\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Nagios-Plugins)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=HAProxy-configs\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/HAProxy-configs)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Terraform\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Terraform)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Packer-templates\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Packer-templates)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Nagios-Plugin-Kafka\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Nagios-Plugin-Kafka)\n\n### Templates\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Templates\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Templates)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Template-repo\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Template-repo)\n\n### Misc\n\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Spotify-tools\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Spotify-tools)\n[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=HariSekhon\u0026repo=Spotify-playlists\u0026theme=ambient_gradient\u0026description_lines_count=3)](https://github.com/HariSekhon/Spotify-playlists)\n\nThe rest of my original source repos are\n[here](https://github.com/HariSekhon?tab=repositories\u0026q=\u0026type=source\u0026language=\u0026sort=stargazers).\n\nPre-built Docker images are available on my [DockerHub](https://hub.docker.com/u/harisekhon/).\n\n\u003c!-- 1x1 pixel counter to record hits --\u003e\n![](https://hit.yhype.me/github/profile?user_id=2211051)\n\n\u003c!-- OTHER_REPOS_END --\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharisekhon%2Fdevops-python-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharisekhon%2Fdevops-python-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharisekhon%2Fdevops-python-tools/lists"}