{"id":14065535,"url":"https://github.com/doitintl/cloud-catalog","last_synced_at":"2025-04-30T08:33:36.744Z","repository":{"id":38316682,"uuid":"452364408","full_name":"doitintl/cloud-catalog","owner":"doitintl","description":"Extract categories and services (as unified JSON) for major public cloud services.","archived":false,"fork":false,"pushed_at":"2024-08-25T09:22:30.000Z","size":382,"stargazers_count":11,"open_issues_count":6,"forks_count":1,"subscribers_count":32,"default_branch":"master","last_synced_at":"2024-08-25T10:32:26.369Z","etag":null,"topics":["aws","azure","gcp","google-cloud"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/doitintl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-26T17:02:59.000Z","updated_at":"2024-08-20T08:36:27.000Z","dependencies_parsed_at":"2023-11-12T10:24:25.191Z","dependency_job_id":"27605b48-bcdf-4e07-8aa9-7587563e4fb5","html_url":"https://github.com/doitintl/cloud-catalog","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doitintl%2Fcloud-catalog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doitintl%2Fcloud-catalog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doitintl%2Fcloud-catalog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doitintl%2Fcloud-catalog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/doitintl","download_url":"https://codeload.github.com/doitintl/cloud-catalog/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224203934,"owners_count":17273019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","azure","gcp","google-cloud"],"created_at":"2024-08-13T07:04:32.857Z","updated_at":"2024-11-12T02:22:53.535Z","avatar_url":"https://github.com/doitintl.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Public Cloud Services\n\nUnfortunately, all cloud vendors do not provide a friendly API to list all public cloud services and categories, as listed on [AWS Products](https://aws.amazon.com/products), [GCP Products](https://cloud.google.com/products) and [Azure Services](https://azure.microsoft.com/en-us/services/) pages.\n\nThe idea is to have a unified `JSON` schema for all cloud services.\n\n```json\n{\n  \"$schema\": \"http://json-schema.org/draft-04/schema#\",\n  \"type\": \"array\",\n  \"items\": [\n    {\n      \"type\": \"object\",\n      \"properties\": {\n        \"id\": {\n          \"type\": \"string\"\n        },\n        \"name\": {\n          \"type\": \"string\"\n        },\n        \"summary\": {\n          \"type\": \"string\"\n        },\n        \"url\": {\n          \"type\": \"string\"\n        },\n        \"categories\": {\n          \"type\": \"array\",\n          \"items\": [\n            {\n              \"type\": \"object\",\n              \"properties\": {\n                \"id\": {\n                  \"type\": \"string\"\n                },\n                \"name\": {\n                  \"type\": \"string\"\n                }\n              },\n              \"required\": [\n                \"id\",\n                \"name\"\n              ]\n            }\n          ]\n        },\n        \"tags\": {\n          \"type\": \"array\",\n          \"items\": [\n            {\n              \"type\": \"string\"\n            }\n          ]\n        }\n      },\n      \"required\": [\n        \"id\",\n        \"name\",\n        \"summary\",\n        \"url\",\n        \"categories\",\n        \"tags\"\n      ]\n    }\n  ]\n}\n```\n\n## Scraping AWS Cloud Services\n\nThe AWS Products page uses **undocumented** `https://aws.amazon.com/api/dirs/items/search` endpoint to fetch paged JSON records for available cloud products.\n\n```sh\n# download AWS service JSON file and generate data/aws.json\npip install -r requirements.txt\npython discovery/aws.py \u003e data/aws.json\n```\n\n## Scraping GCP Cloud Services\n\nThe GCP Products page is rendered on the server side and all data is embedded into the web page.\n\n```sh\n# scrap GCP Products page to get all services and generate data/gcp.json\npip install -r requirements.txt\npython discovery/gcp.py \u003e data/gcp.json\n```\n\n## Scraping Azure Cloud Services\n\nThe [Azure Services](https://azure.microsoft.com/en-us/products/) page is rendered on the server side and all data is embedded into the web page.\n\n```sh\n# scrap Azure Services page to get all services and generate data/azure.json \npip install -r requirements.txt\npython discovery/azure.py \u003e data/azure.json\n```\n\n## Microsoft365 Services\n\nEdit the `ms365.json` file. Use data from this [page](https://www.microsoft.com/en-us/microsoft-365/compare-microsoft-365-enterprise-plans).\n\n## Scraping Google Workspace Services (GSuite)\n\nThe [page](https://workspace.google.com/features/) page contains all Google Workspace services.\n\n```sh\n# scrap Google Workspace page to get all services and generate data/gsuite.json\npip install -r requirements.txt\npython discovery/gsuite.py \u003e data/gsuite.json\n```\n\n## CMP Services\n\nEdit the `cmp.json` file. Use the CMP UI and documentation.\n\n## Credits\n\nEdit the `credits.json` file.\n\n## Update/merge all tags\n\nRun the `tags.sh` script to regenerate the `tags.json` file that contains all platform, category and services tags from all services.\n\n## Public static location\n\nUpload all generated `json` files to the public [cloud_tags](https://console.cloud.google.com/storage/browser/cloud_tags;tab=objects?forceOnBucketsSortingFiltering=false\u0026project=zenrouter) Cloud Storage bucket.\n\n## Focus Areas update process    \nFocus Areas support specific services and categories based on this repo.    \nUpdates to service/category mappings to Focus Areas are performed using the following process, and then updating the zenrouter-infra repo with the output.\n\n### Adding support of a Product to a Focus Area\n\n#### Editing the ProductToFocusArea mapping file\n- Create a feature branch and clone this repo\n- Edit [ProductToFocusArea.tsv](data/focus_areas/ProductToFocusArea.tsv) , adding the entry\n  - The file has the following columns:\n    - `product` - This is the name of the product, take from the `name` attribute in the cloud catalog files\n    - `platform, p_group, focus_area` - these values must be match one of the Focus Areas defined in [FocusAreas.tsv](data/focus_areas/FocusAreas.tsv)\n    - `support_level` - This must be PRIMARY or SECONDARY, mapped to ZenRouter skills tier\n    - `status` - only entries with status `VERIFIED` will be processed by the build process\n    - `meets_volume_criteria` and `support_level_desc` can be ignored and set to any value, this was used for initial FA mapping\n- Here is an example of adding AWS Amplify to AWS DevOps Focus Area as a SECONDARY skill\n```shell\nproduct     platform            p_group         focus_area  support_level   status        meets_volume_criteria  support_level_desc\nAWS Amplify AWS                 Infrastructure  DevOps      SECONDARY       VERIFIED      N/A                    N/A\n```\n\n### Adding support of a Category to a Focus Area\n\n#### Editing the CategoryToFocusArea mapping file\n- Create a feature branch and clone this repo\n- Edit [CategoryToFocusArea.tsv](data/focus_areas/CategoryToFocusArea.tsv) , adding the entry\n  - The file has the following columns:\n    - `platform, p_group, focus_area` - these values must be match one of the Focus Areas defined in [FocusAreas.tsv](data/focus_areas/FocusAreas.tsv)\n    - `platform_tag` - this is the actual category tag\n    - `support_level` - This must be PRIMARY or SECONDARY, mapped to ZenRouter skills tier\n- Here is an example of adding aws/category/migration to AWS Databases Focus Area as a SECONDARY skill\n```shell\nplatform\tp_group\tfocus_area\tcategory_tag\tsupport_level\nAWS\tData\tDatabases\taws/category/migration\tSECONDARY\n```\n\n#### Build the Focus Area mapping json file\n- Once the desired Product and/or Category changes have been completed, generate the focus areas json mapping file\n```shell\n# Python 3.12 breaks PySpark due to removal of distutils, as such, Python3.11 is required\nbrew install python@3.11\nbrew install java\nmkdir -p build\npython3.11 -m venv build/venv\nsource build/venv/bin/activate\ncd focus_areas/\npython -m pip install -r requirements.txt\npython ./build_focus_areas.py\n# verify the product was added to the focus area in data/focus_areas/all.json, then commit/push\ngit add data/focus_areas/\ngit commit -m \"Added xyz Product to abc Focus Area\"\ngit push\n```\n- Open a PR into master and add PLs as reviewers    \n\n#### Deploying to BigQuery\nOnce merged into master, deploy the changes into BigQuery    \n```shell\n# Valid ADC required to deploy this, or a configured service account\ngcloud auth application-default login\ngit checkout master\ngit pull\nmkdir -p build\npython3 -m venv build/venv\nsource build/venv/bin/activate\ncd focus_areas/\npython -m pip install -r requirements.txt\npython ./deploy_to_bq.py --build --deploy --project doit-zendesk-analysis\n```\n\n#### Deploying the changes in ZenRouter\nOnce merged into master, deploy the changes into [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra)    \n- Generate the HCL for datastore.tf and copy to your clipboard    \n```shell\ngit checkout master\ngit pull\nmkdir -p build\npython3 -m venv build/venv\nsource build/venv/bin/activate\ncd focus_areas/\npython -m pip install -r requirements.txt\npython ./generate_hcl.py  | pbcopy\n```\n- Create a feature branch in [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra), clone and checkout the branch    \n- edit datastore.tf, replacing `variable \"focus_areas\"` `default` attribute    \n```shell\nvariable \"focus_areas\" {\n  type = map(object({\n    id               = string\n    name             = string\n    practice_area    = string\n    primary_skills   = list(string)\n    secondary_skills = list(string)\n  }))\n  default =  \n    \u003cPaste the output from generate_hcly.py here\u003e\n}\n# initialize terraform\nterraform init\n# validate the file has no syntax issues\nterraform validate\n# format the file before commit\nterraform fmt datastore.tf\n```\n- commit/push/submit a PR in [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoitintl%2Fcloud-catalog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdoitintl%2Fcloud-catalog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoitintl%2Fcloud-catalog/lists"}