{"id":13821189,"url":"https://github.com/SundaySky/cost-anomaly-detector","last_synced_at":"2025-05-16T12:32:54.508Z","repository":{"id":119007181,"uuid":"105120184","full_name":"SundaySky/cost-anomaly-detector","owner":"SundaySky","description":null,"archived":false,"fork":false,"pushed_at":"2020-01-21T08:57:16.000Z","size":63,"stargazers_count":13,"open_issues_count":1,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-19T21:36:01.651Z","etag":null,"topics":["aws","cost-optimization","cost-saving","detect-anomalies","redshift"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SundaySky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-09-28T08:06:07.000Z","updated_at":"2023-01-18T21:38:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"bb631cdc-8b0f-49ba-836e-fd558633cf5e","html_url":"https://github.com/SundaySky/cost-anomaly-detector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SundaySky%2Fcost-anomaly-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SundaySky%2Fcost-anomaly-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SundaySky%2Fcost-anomaly-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SundaySky%2Fcost-anomaly-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SundaySky","download_url":"https://codeload.github.com/SundaySky/cost-anomaly-detector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254530611,"owners_count":22086644,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cost-optimization","cost-saving","detect-anomalies","redshift"],"created_at":"2024-08-04T08:01:17.127Z","updated_at":"2025-05-16T12:32:49.499Z","avatar_url":"https://github.com/SundaySky.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# AWS Cost Anomaly Detector (CAD)\n\n## License\nThis program is available to be used under GNU general public license v3  \nhttps://choosealicense.com/licenses/gpl-3.0/\n\n## Background\nAWS’s pay-for-what-you-use policy is one of its great advantages, but it can also be dangerous – a bug or high traffic might cause unexpected billing.  \nThe “aws cost anomaly detector” is a product made to keep track of your aws account billing and notify you whenever you pay more than expected.\n\nThe anomaly detector has 2 main functions:\n* Writing the billing data to RedShift\n* Finding billing anomalies\n\n\n### General Flow\n* Whenever a new CUR is uploaded to your billing bucket a lambda is triggered\n* The lambda issues a command to the anomaly detector instance \n* The instance creates a Redshift table and writes the CUR data to it\n* After the data is written, the instance runs the anomaly detection algorithm and writes the results\n\n\n### Infrastructure Overview\n* Billing S3 bucket\n\t* With CURs\n* Redshift database\n\t* With awsbilling_anomalies table\n* A Lambda function\n\t* Triggered by new CUR in the billing bucket and initiates the job on the instance\n* An Auto Scaling Group\n\t* The anomaly detector instance is started from an autoscaling group and gets all configuration on startup. This makes the service redundant and easly mantainable.  \n\t  We reccomendend to have only one instance up at a time, otherwise each run would be randomly assigned to one of the active instances.\n* An anomaly detector instance (t2.micro)\n\t* To write the data and detect the anomalies (the process takes too long to run with a lambda function)\n\n#### How much should it cost?\n**TL;DR** - *$20~$200 per month (depending on whether you already have a Redshift DB)*\n  \nThe CAD infrastratue requires minor usage of a few AWS services: the lambda function, S3 storage and ec2 instance should sum up to approximately $10-$20 per month.  \nAnother cost to take into consideration is the database storage usage, The CURs are written to S3 buckets as gzipped csv files, the CAD creates a table for each month and writes the uncompressed data to it.\nIn order to estimate the required storage go to your billing bucket and follow these steps:  \n* find a directory of a report from the last day of a month, you would see one or more .csv.gz files\n* calculate the aggregated size of the csv files for that day\n* each 120-130MB of compressed files would take about 1.2GB-1.5GB on your DB\n* the CURs are on a month-to-date basis - so the cost of the size of the last day would be the size of the entire month  \nTo sum up:  \nIn case you already have a redshift cluster, the costs are minor - the CUR sizes are minor related to even the smallest Redshift instances offered by aws, so adding the CAD table would probably require no additional cost.\nOtherwise, setting up a Redshift DB can be a bit more costly - you can use AWS smallest redshift instance, dc2.large has 160GB storage which should be enough even for quite large accounts, it costs $180 per month on-demand, far less with RI.\n\n\n### Algorithm\nIn order to detect anomalies, our algorithm compares service cost on a specific day to past cost, and determines if that day's cost is unusually high.  \n\nAll the constant parameters that we use for the algorithm are configured in the CAD_conf.yml file and can be easily modified.  \nWe tested and fine-tuned them on real traffic, so generally we reccomend not to change them - but in case you receive false positives or miss anomalies, you may lower/increase them.  \n\n#### Algorithm Flow\n* The anomaly detector queries the billing data from the last 14 days (by default)\n* The anomaly detector reads the queries section of the CAD_conf.yml file and runs the following steps for each query specified:\n\t* Caluculates the cost of the resources specified in the query for each day in range (*excluding costs of reserved resources*)\n\t* Calculates the average daily cost and the standard deviation\n\t* Reports anomaly only if *all 3 thresholds* regarding the daily cost were crossed:\n\t\t* *relative threshold*: It is at least 1.25 times the average cost of previous days. (by default)\n\t\t* *standard deviation threshold*: It is at least 3.5 standard deviations higher than the average cost of previous days. (by default)\n\t\t* *absolute threshold*: It is higher than $10. (by default)\n\t\t\nWe found out that we get the best results by using all 3 thresholds together, each for it's own reason:  \n*Relative threshold* filters out insignificant anomalies.  \n*Standard deviation threshold* filters out regular usage of services with normal differing daily usage (high days/low days).\n*Absolute threshold* prevents us from getting notified about inexpensive anomalies, which will lead to no action.\n\t\n\n\n## Setup \u0026 Deployment\nTo start using the anomaly detector you’ll have to go through just a few simple steps!\n\n* Configure CUR to be written to a bucket with redshift copy command  \n\t(if you already have the report, make sure the required options are enabled)\n\t* In the AWS console Go to:  \n\t\tMy Account --\u003e Reports --\u003e Create New Report\n\t* Mark the following:\n\t\t\t* Time unit: hour\n\t\t\t* Include: Resource IDs\n\t\t\t* Enable support for Redshift\n\t* Choose S3 bucket and prefix (save them for later)\n* Create Redshift table\n\t* Use the following SQL command on your Redshift cluster:  \n\t```sql\n  CREATE TABLE IF NOT EXISTS \"public\".\"awsbilling_anomalies\" (\"anomaly_date\" DATE   ENCODE lzo,\"service\" VARCHAR(512)   ENCODE lzo,\"isanomaly\" INTEGER   ENCODE lzo,\"daily_cost\" NUMERIC(10,2)   ENCODE lzo,\"total_daily_cost\" NUMERIC(10,2)   ENCODE lzo,\"mean_cost\" NUMERIC(10,2)   ENCODE lzo,\"std_cost\" NUMERIC(10,2)   ENCODE lzo\t,\"score\" NUMERIC(10,2)   ENCODE lzo);\n  ```\n* Upload Lambda Code to S3\n  * *deployment/CUR_to_Redshift_lambda.zip*\n* Edit and then upload configuration file to S3 (parameters detailed in the section below)\n  * *deployment/CAD_conf.yml*\n* Run CloudFormation (parameters detailed in the section below)\n  * *deployment/anomaly_detector.yml*\n* (optional) If left the redshift_role parameter in the conf file empty\n  * Add the Instance role created by the cloudformation to your Redshift cluster:\n    * In the aws console go to:  \n      redshift --\u003e clusters --\u003e your cluster --\u003e manage IAM roles\n    * Add the anomalyDetectorInstanceRole\n    * Edit your CAD_conf.yml file and set the param value to the role's ARN (can be found in the IAM console --\u003e Roles --\u003e anomalyDetectorInstanceRole)\n* Set trigger for the Lambda function:  \n  * In the AWS console go to:  \n    Lambda --\u003e CUR_Write_Trigger --\u003e Triggers --\u003e add trigger\n  * Choose S3\n    * *Bucket*: choose your CUR bucket\n    * *Prefix*: enter your reports prefix\n    * *Suffix*: enter `RedshiftCommands.sql`\n\n\n### CloudFormation\nThe cloudformation is written in YAML format.\nGenerally, there is no reason to open or change it, just input the parameter values in the aws console.\n\n#### Parameters:\n* **AutoScalingGroupAvailabilityZone**\n  * *Usage*: The availability zone in which the anomaly instance would be started, can be one or more.\n  * *Example*: us-east-1d\n* **ConfigurationFilePath**\n  * *Usage*: The configuration file would be downloaded on instance startup and used for the scripts.\n  * *Example*: my-bucket/directory/CAD_conf.yml\n* **gitBranch**\n  * *Usage*: The branch of sundaysky's anomaly detector repo that would be pulled on instance startup.\n  * *Default*: master. Unless you want to create and use your on branch, there is no need to change it.\n  * *Example*: master, my-branch\n* **InstanceImageId**\n  * *Usage*: The image used for the anomaly instance, we reccomendend using the lastest AWS linux AMI by AWS.\n  * *Example*: ami-a4c7edb2\n* **InstanceKeyPair**\n  * *Usage*: The key-pair used to log in to the instance\n  * *Example*: my-key-pair\n* **InstanceSecurityGroup**\n  * *Usage*: Security groups provided to the instance and the lambda\n\t\t * required: access from the lambda (the same vpc, security groups), access to the redshift database (we reccomend same vpc as the db)\n\t\t * reccomendend: ssh access to the instance for you\n  * *Example*: sg-abcd, vpc-security-group\n* **InstanceSubnets**\n  * *Usage*: Subnets available for the instance. \n\t\t * *Required*: at least 1 per availability-zone provided above\n  * *Example*: subnet-abcd, subnet-vpc-private\n* **InstanceType**\n  * *Usage*: the anomaly instance type (defaults and reccomend: t.micro, no need for an expensive instance)\n  * *Example*: t2.micro, c4.large\n* **LambdaCodeBucket**\n  * *Usage*: The bucket where you uploaded the lambda code\n  * *Example*: my-bucket\n* **LambdaCodeKey**\n  * *Usage*: The key for the zip file\n  * *Example*: directory/CUR_to_Redshift_lambda.zip\n\n  \n### Configuration File\nthe configuration file is created in YAML format, there is an example CAD_conf.yml file in the 'deployment' directory.  \nPlease edit the file according to your usage and upload it to s3.\n\nThe parameters in the file are divided to 3 sections:\n\tDB params - Should be changed according to your database.\n\tAlgorithem params - Thresholds for anomalies. Generally, should not be modified at all.\n\tQueries - We provide some examples, you should add/remove queries to get the most relevant data to your account and use cases. (This part will have it's own section)\n\n##### DB params\n* **redshift_user** (*String*)\n  * *Usage*: The database user used to run the SQL commands. the user must have SELECT, CREATE and UPDATE (we reccomend GRANT as well).\n* **redshift_password** (*String*)\n  * *Usage*: password for the database user.\n* **redshift_db_name** (*String*)\n  * *Usage*: The RedShift database name used for the connection (as shown in aws redshift console)\n  * *Example*: dbname\n* **redshift_hostname** (*String*)\n  * *Usage*: endpoint used to connect to the database (as shown in aws redshift console)\n  * *Example*: dbname.abcd.region.redshift.amazonaws.com\n* **redshift_role** (*String*)\n  * *Usage*: full ARN of the role currently used by the redshift cluster. The role must have read access to the CUR S3 bucket. \n  We reccomend using the arn for the anomalyDetectorInstanceRole created by the cloudformation.  \n  *If you do, leave that param empty, run the clouformation, and then follow the step described in the setup section*.\n  * *Example*: arn:aws:iam::12345:role/Redshift_role\n* **redshift_table_permitted_users** (*String, Optional*)\n  * *Usage*: Give users read permission to the billing tables. T string can contain the name of a user or names of a few users seperated by ,\n  * *Example*: admin,quicksight,monitoring,jhon\n* **s3_aws_region** (*String*)\n  * *Usage*: The AWS region of the bucket containig the CURs\n  * *Example*: us-east-1\n\n##### Algorithem params\n* **threshold_relative** (*float*)\n  * *Usage*: explained in Algorithem section above\n  * *Default*: 1.25\n* **threshold_std** (*int*)\n  * *Usage*: explained in Algorithem section above\n  * *Default*: 3.5\n* **threshold_absolute** (*int*)\n  * *Usage*: explained in Algorithem section above\n  * *Default*: 10\n* **history_period_days** (*int*)\n  * *Usage*: explained in Algorithem section above\n  * *Default*: 14 \n* **aws_account** (*String or List of Strings, Optional*)\n  * *Usage*: In case your CUR contain data for more than 1 aws account, you can specify which accounts do you want to run the algorithm for. Input accountId string or list of account Ids\n  * *Default*: All accounts that appear in the CUR\n  * *Example*: 123456789012\n* **aws_query_regions** (*List of Strings*)\n  * *Usage*: All aws region in which you have aws resources. Usage will be explained in the Queries section below.\n* **log_folder** (*String*)\n  * *Usage*: Folder for CUR_writer and anomaly detector logs\n  * *Default*: /sundaysky/logs/anomaly_detector/\n\n#### Queries\nThe real power of the anomaly detector comes from the possibility to easliy analyze many differnt combinations of data relevant for your specific account and use cases.  \nTo do so, you would define the queries relevant for you in the queries section of the CAD_conf.yml file.  \n\nThe anomaly detector can divide your data in 4 methods:\n* *By AWS Service*: such as ec2, s3, lambda, etc..\n* *By AWS Operation*: such as RunInstances, LoadBalancing, PutObject, GetObject, etc..\n* *By AWS Region*: such as us-east-1, eu-west-2, etc...\n* *By Resource Tags*: user given tags. Smart usage of tags will enable you to get exceptional value!\n\n\n##### Write your own queries\nThe queries are written in easily readable yaml format. We already added a few example queries to the example CAD_conf.yml file.  \nExample queries:\n```yaml\nqueries:\n  ec2:\n    service: AmazonEC2\n    region: all\n  ec2_instances:\n    service: AmazonEC2\n    operation: RunInstances*\n    region: us-east-1, us-west-1\n```\n\nLets go over some important things to know:\n* *Service/Operation*: You can use the SQL queries in the bottom on your billing table to find the right values for those fields (relevant to you)\n* *Region*: Can receive 3 types of values:\n\t* single region: (example: us-east-1)\n\t* multiple regions: (example: us-east-1,us-west-1)\n\t* all regions: all region you use, as defined in 'aws_query_regions' parameter in the conf file.  \n\t  If you input more than one region, the query would be replicated and run seperately, once for each region and once for all region combined (general),\n\tyou will see all of them in the results table.\n* *Tags*: Just use your tag key as key and it's value as value.  \n\n**Note**: *each key can take a list value to match any one of the values in the list*\n\n###### Example\nI would like to find anomalies in my account ec2 costs:\n```yaml\nqueries:\n  ec2:\n    service: AmazonEC2\n```\nIn addition, I would like to keep track of my web servers usage, I would make a query to check all instances with tag 'component' equals 'web':\n```yaml\nqueries:\n  ec2:\n    service: AmazonEC2\n  ec2_web_instances:\n    service: AmazonEC2\n    operation: RunInstances*\n    component: web\n```\nIf I would like to check my general web usage, but also find out if anomalies occur on any specific region, I would add the region key:\n```yaml\nqueries:\n  ec2:\n    service: AmazonEC2\n  ec2_web_instances:\n    service: AmazonEC2\n    operation: RunInstances*\n    component: web\n    region: all\n```\n\nTo find anomalies in my entire service I want to combine the price of instances with either the 'web' or 'worker' component.\n```yaml\nqueries:\n  ec2:\n    service: AmazonEC2\n  ec2_web_instances:\n    service: AmazonEC2\n    operation: RunInstances*\n    component: \n      - web\n      - worker\n```\n\n### Usage\nAfter the setup is done, you should have a lambda function which would be triggered whenever new billing data appears in the bucket and an instance which will write that data to redshift and run the algorithm right away.  \nAccording to AWS, the CUR data about the current day and previous day, might not be partial and inaccurate - unfourtunatly, for that reason the **algorithm would run for the day before yesterday**.  \nThat means (by default) - every day, you would see the results for 2-days-ago added to the table.  \n\n#### Anomaly detector data\nThe anomaly detector would create 2 new table for each month, called *awsbillingYYYYMM and awsbillingYYYYMM_tagmapping*. (example: awsbilling201710, awsbilling201710_tagmapping)  \nThe first table would contain all the monthly billing data, it can also be queried manually to drill down the data (examples of usage will be provided in the 'Useful queries section below)\nThe second would map the tag values in the CUR to the names you use to enable queries by tags, you probably shouldn't use or change that table.  \n\nIn addition, the anomaly detector would write all you query results by date to the *awsbilling_anomalies* table you created during the setup, we'll discuss it's usage soon.\n\n#### The results table\nYou can directly query the results table using the queries below.\n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date=DATE 'today'-2;\n```\nThe table has 7 columns:\n* *anomaly_date*: The date which the query was *made for* (no neccecarly the day it was made)\n* *service*: The query name (given in the conf file) with additional _region or _general if it was region-specific\n* *isanomaly*: 0 or 1, would be 1 if the 3 thresholds were breached\n* *daily_cost*: Sum cost of the resources included in the query that day (*anomaly_date* day)\n* *mean_cost*: Average cost of the resources included in the query for the days included\n* *std_cost*: Standard deviation of the *mean_cost*\n* *score*: A mesure of how abnormal the result is.  \n\t\t   If the *daily_cost* is more expensive than the *mean_cost*, the score would be the differece divided by the standard deviation \n\n#### Alerting \nWe reccoming using a system to query the results table and send notfication whenever a result with an anomaly score of 1 appears.  \n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date=DATE 'today'-2 AND isanomaly=1;\n```\nYou would want to receive alerts whenever the query above returns resuls.\n\n#### Manual Usage\nSometimes you might like to run the anomaly detector manually for a specific date or with different parameters.  \nYou can just log in to the instance and run the script, you can pass any parameter as a keyword argument.  \nThe default date is the day before yesterday, to run for a specific date, pass the date in 'YYYY-MM-DD' format:\n```\npython /sundaysky/cost_anomaly_detector/anomaly_detector.py date=2017-10-15\n```\nBy default the script will run with the parameters provided in the conf file, but if you want, you can override any of them by using their names as the keywords:\n```\npython /sundaysky/cost_anomaly_detector/anomaly_detector.py date=2017-10-15 threshold_std=3\n```\n\n#### Useful SQL queries\n**Get relative date results**  \nyesterday:\n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date=DATE 'yesterday';\n```\n2 days ago:\n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date=DATE 'today'-2;\n```\n\n**Get results by date**  \n*by date*:\n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date=DATE '2017-10-15';\n```\n*Since date*:\n```sql\nSELECT * FROM awsbilling_anomalies WHERE anomaly_date\u003e=DATE '2017-10-15';\n```\n\n**Get anomalies**\nAdd to the end of one of the queries above:\n```sql\nAND isanomaly=1;\n```\n\n##### General Data\n**Get your billing tables**\n```sql\nSELECT DISTINCT tablename FROM PG_TABLE_DEF\tWHERE tablename ilike 'awsbilling%';\n```\n\n**Get AWS Service names**\nUsing this query would give you the names of all AWS service you pay for. Those are the names you should you in your queries 'service' parameter\n```sql\nSELECT lineitem_productcode as service FROM awsbilling201710 GROUP BY lineitem_productcode;\n```\n**Get AWS Operation names**\nUsing this query would give you the names of all AWS operations you pay for. Those are the names you should you in your queries 'operation' parameter\n```sql\nSELECT lineitem_operation as operation FROM awsbilling201710 GROUP BY lineitem_operation;\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSundaySky%2Fcost-anomaly-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSundaySky%2Fcost-anomaly-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSundaySky%2Fcost-anomaly-detector/lists"}