https://github.com/doitintl/cloud-catalog

Extract categories and services (as unified JSON) for major public cloud services.
https://github.com/doitintl/cloud-catalog

aws azure gcp google-cloud

Last synced: 5 months ago
JSON representation

Extract categories and services (as unified JSON) for major public cloud services.

Host: GitHub
URL: https://github.com/doitintl/cloud-catalog
Owner: doitintl
Created: 2022-01-26T17:02:59.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2024-08-25T09:22:30.000Z (about 1 year ago)
Last Synced: 2024-08-25T10:32:26.369Z (about 1 year ago)
Topics: aws, azure, gcp, google-cloud
Language: Python
Homepage:
Size: 373 KB
Stars: 11
Watchers: 32
Forks: 1
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

jimsghstars - doitintl/cloud-catalog - Extract categories and services (as unified JSON) for major public cloud services. (Python)

README

# Public Cloud Services

Unfortunately, all cloud vendors do not provide a friendly API to list all public cloud services and categories, as listed on [AWS Products](https://aws.amazon.com/products), [GCP Products](https://cloud.google.com/products) and [Azure Services](https://azure.microsoft.com/en-us/services/) pages.

The idea is to have a unified `JSON` schema for all cloud services.

```json
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items": [
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"summary": {
"type": "string"
},
"url": {
"type": "string"
},
"categories": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"id",
"name"
]
}
]
},
"tags": {
"type": "array",
"items": [
{
"type": "string"
}
]
}
},
"required": [
"id",
"name",
"summary",
"url",
"categories",
"tags"
]
}
]
}
```

## Scraping AWS Cloud Services

The AWS Products page uses **undocumented** `https://aws.amazon.com/api/dirs/items/search` endpoint to fetch paged JSON records for available cloud products.

```sh
# download AWS service JSON file and generate data/aws.json
pip install -r requirements.txt
python discovery/aws.py > data/aws.json
```

## Scraping GCP Cloud Services

The GCP Products page is rendered on the server side and all data is embedded into the web page.

```sh
# scrap GCP Products page to get all services and generate data/gcp.json
pip install -r requirements.txt
python discovery/gcp.py > data/gcp.json
```

## Scraping Azure Cloud Services

The [Azure Services](https://azure.microsoft.com/en-us/products/) page is rendered on the server side and all data is embedded into the web page.

```sh
# scrap Azure Services page to get all services and generate data/azure.json
pip install -r requirements.txt
python discovery/azure.py > data/azure.json
```

## Microsoft365 Services

Edit the `ms365.json` file. Use data from this [page](https://www.microsoft.com/en-us/microsoft-365/compare-microsoft-365-enterprise-plans).

## Scraping Google Workspace Services (GSuite)

The [page](https://workspace.google.com/features/) page contains all Google Workspace services.

```sh
# scrap Google Workspace page to get all services and generate data/gsuite.json
pip install -r requirements.txt
python discovery/gsuite.py > data/gsuite.json
```

## CMP Services

Edit the `cmp.json` file. Use the CMP UI and documentation.

## Credits

Edit the `credits.json` file.

## Update/merge all tags

Run the `tags.sh` script to regenerate the `tags.json` file that contains all platform, category and services tags from all services.

## Public static location

Upload all generated `json` files to the public [cloud_tags](https://console.cloud.google.com/storage/browser/cloud_tags;tab=objects?forceOnBucketsSortingFiltering=false&project=zenrouter) Cloud Storage bucket.

## Focus Areas update process
Focus Areas support specific services and categories based on this repo.
Updates to service/category mappings to Focus Areas are performed using the following process, and then updating the zenrouter-infra repo with the output.

### Adding support of a Product to a Focus Area

#### Editing the ProductToFocusArea mapping file
- Create a feature branch and clone this repo
- Edit [ProductToFocusArea.tsv](data/focus_areas/ProductToFocusArea.tsv) , adding the entry
- The file has the following columns:
- `product` - This is the name of the product, take from the `name` attribute in the cloud catalog files
- `platform, p_group, focus_area` - these values must be match one of the Focus Areas defined in [FocusAreas.tsv](data/focus_areas/FocusAreas.tsv)
- `support_level` - This must be PRIMARY or SECONDARY, mapped to ZenRouter skills tier
- `status` - only entries with status `VERIFIED` will be processed by the build process
- `meets_volume_criteria` and `support_level_desc` can be ignored and set to any value, this was used for initial FA mapping
- Here is an example of adding AWS Amplify to AWS DevOps Focus Area as a SECONDARY skill
```shell
product platform p_group focus_area support_level status meets_volume_criteria support_level_desc
AWS Amplify AWS Infrastructure DevOps SECONDARY VERIFIED N/A N/A
```

### Adding support of a Category to a Focus Area

#### Editing the CategoryToFocusArea mapping file
- Create a feature branch and clone this repo
- Edit [CategoryToFocusArea.tsv](data/focus_areas/CategoryToFocusArea.tsv) , adding the entry
- The file has the following columns:
- `platform, p_group, focus_area` - these values must be match one of the Focus Areas defined in [FocusAreas.tsv](data/focus_areas/FocusAreas.tsv)
- `platform_tag` - this is the actual category tag
- `support_level` - This must be PRIMARY or SECONDARY, mapped to ZenRouter skills tier
- Here is an example of adding aws/category/migration to AWS Databases Focus Area as a SECONDARY skill
```shell
platform p_group focus_area category_tag support_level
AWS Data Databases aws/category/migration SECONDARY
```

#### Build the Focus Area mapping json file
- Once the desired Product and/or Category changes have been completed, generate the focus areas json mapping file
```shell
# Python 3.12 breaks PySpark due to removal of distutils, as such, Python3.11 is required
brew install python@3.11
brew install java
mkdir -p build
python3.11 -m venv build/venv
source build/venv/bin/activate
cd focus_areas/
python -m pip install -r requirements.txt
python ./build_focus_areas.py
# verify the product was added to the focus area in data/focus_areas/all.json, then commit/push
git add data/focus_areas/
git commit -m "Added xyz Product to abc Focus Area"
git push
```
- Open a PR into master and add PLs as reviewers

#### Deploying to BigQuery
Once merged into master, deploy the changes into BigQuery
```shell
# Valid ADC required to deploy this, or a configured service account
gcloud auth application-default login
git checkout master
git pull
mkdir -p build
python3 -m venv build/venv
source build/venv/bin/activate
cd focus_areas/
python -m pip install -r requirements.txt
python ./deploy_to_bq.py --build --deploy --project doit-zendesk-analysis
```

#### Deploying the changes in ZenRouter
Once merged into master, deploy the changes into [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra)
- Generate the HCL for datastore.tf and copy to your clipboard
```shell
git checkout master
git pull
mkdir -p build
python3 -m venv build/venv
source build/venv/bin/activate
cd focus_areas/
python -m pip install -r requirements.txt
python ./generate_hcl.py | pbcopy
```
- Create a feature branch in [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra), clone and checkout the branch
- edit datastore.tf, replacing `variable "focus_areas"` `default` attribute
```shell
variable "focus_areas" {
type = map(object({
id = string
name = string
practice_area = string
primary_skills = list(string)
secondary_skills = list(string)
}))
default =

}
# initialize terraform
terraform init
# validate the file has no syntax issues
terraform validate
# format the file before commit
terraform fmt datastore.tf
```
- commit/push/submit a PR in [ZenRouter Infra](https://github.com/doitintl/zenrouter-infra)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/doitintl/cloud-catalog

Awesome Lists containing this project

README