https://github.com/mtulio/cloud-costs-parser
PROPOSAL: The Cloud Costs processor (Import/Export to common DB)
https://github.com/mtulio/cloud-costs-parser
Last synced: 4 months ago
JSON representation
PROPOSAL: The Cloud Costs processor (Import/Export to common DB)
- Host: GitHub
- URL: https://github.com/mtulio/cloud-costs-parser
- Owner: mtulio
- Created: 2020-10-01T03:17:51.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-01T04:26:54.000Z (over 5 years ago)
- Last Synced: 2025-02-24T03:17:44.707Z (over 1 year ago)
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cloud-costs-parser (State:Proposal)
The Cloud Costs processor, Importer/Exporter, to a database unified.
## Description
Cloud Costs parser import CSV file costs data from Cloud Provider (initially AWS and Azure) and export to a common database - preferably NoSQL.
## Command (SPEC suggested)
### Parser `parser`
Command to start the parser.
#### Subcommand
- `--vendor`
The Cloud provider to create metadata field `vendor`
- `--in`
The value of CSV file path. The source driver could be discovered the source (s3, gzip, csv, blob, etc)
Example of AWS CUR v2: `--in s3://my-bucker-reported-by-cur/path/to/monthlyReport/20201001-20201101`
Will try to read all CSV files inside the path. See `--latest` option to save extra processing.
- `--latest`
Read latest data. Latest by default will be D-1 (last day). See `--latest-days N`
This option could save extra operation on the destination storage (less read/comparation/upserts)
- `--latest-days N`
Overrides `--latest` to define how many days to parse the data. By default it will get N days from now until the end of file.
If the monthly processor range is older than current month, one option could be created consider N days from last day of month. (v2)
This option could save extra operation on the destination storage (less read/comparation/upserts)
- `--override-char --override-from CHAR_ORIGIN --override-to CHAR_DEST`
Enable the override character on field name, usefull when destination storage does not support special characters on field name, or you just want to have a better view without nesting first level of dictionaries on NoSQL databases (it will save complexity on queries);
Example on AWS CUR v2: `--override-char --override-from / --override-to _`
The field `identity/LineItemId` on CSV will be saved as `identity_LineItemId`
- `--key-fields`
Fields sepparated by commad that will be used to UID (`_key`) of cost item.
For example on AWS CUR the following fields could be used to identify the cost item on whole file: `identity_LineItemId,identity_TimeInterval`
- `--transform-field-data`
Default: `false`
In AWS vendor, from the `identity/TimeInterval` (current day interval of cost item) field will be splited into `identity/TimeStart` and `identity/TimeEnd`.
This option could be very helpfull on the queries on the database.
- `--filter-keys KEYS`
Default: `None`
The keys sepparated by commad to be filtered on destination database - could save space, but could limit the insights.
- `--out`
The value of stdout storage, could be MySQL, PGSQL, ArangoDB, S3.
Example:
`--out arangodb://server:5432/database/collection?username=x,password=y`
`--out psql://server:5432/database/table?username=x,password=y`
`--out aztables://stgAccount/TableName?storageKey=x`
- `--out-builk-size N`
Default: 100
By default the write to output storage will be done by bulk inserts from 100, to change the size just set a new option.
- `--out-indexes FIELDS`
The fields name to be createed a index on destination storage (if applicable)
- `--out-meta`
Display the metadata fields.
The fields inserted by parser:
`_key` : the unique cost item identifier. Could be the PK of relational DBs.
`_vendor`: the vendor from the source file
`_datetime`: datetime of the importer
### Fields `fields`
Extract the fields from the source file and show the output (without parsing).
Subcommands allowed for this command:
- `--in`
The output will be the fields in input file and metadata fields.