{"id":20874478,"url":"https://github.com/imagingdatacommons/realtimecosts","last_synced_at":"2025-09-20T23:52:03.181Z","repository":{"id":137619263,"uuid":"369868182","full_name":"ImagingDataCommons/RealtimeCosts","owner":"ImagingDataCommons","description":"System for estimating realtime burn rates","archived":false,"fork":false,"pushed_at":"2021-12-21T00:59:20.000Z","size":182,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-01-19T09:09:44.483Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ImagingDataCommons.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-22T17:27:38.000Z","updated_at":"2021-11-22T06:44:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"dc07cdcc-f718-4794-8d0f-4004a1c808d1","html_url":"https://github.com/ImagingDataCommons/RealtimeCosts","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImagingDataCommons%2FRealtimeCosts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImagingDataCommons%2FRealtimeCosts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImagingDataCommons%2FRealtimeCosts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImagingDataCommons%2FRealtimeCosts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ImagingDataCommons","download_url":"https://codeload.github.com/ImagingDataCommons/RealtimeCosts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243248251,"owners_count":20260752,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-18T06:32:56.283Z","updated_at":"2025-09-20T23:51:58.121Z","avatar_url":"https://github.com/ImagingDataCommons.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RealtimeCosts\n\nThis system was set up so that IDC can provide sponsored GCP projects, each with a specified total budget, to researchers.\nAll projects are attached to the same single billing account, so this system monitors the spending of each project\nindividually to make sure each one stays within the set spending limit. Each project is set up with a\nseparate per-project budget, but each budget sends a pub/sub message to a single monitoring cloud function.\n\nThe routine ProjectCostControlWithEstimates.py is intended to monitor costs in a GCP project \"A\" by being installed\nin a separate project \"B\". A budget alert is established in the billing account attached to project \"A\", and this\nalert publishes to a Pub/Sub topic set up in project \"B\". Messages are sent about every twenty minutes to the topic.\n\nIn project \"B\", the function receives the message, which contains the current spending for the project, and uses\nthis trigger to determine if the project needs to be shut down in any fashion. Using example code snippets,\nprovided by Google, it shuts down all VMs and all AppEngine services if the budget has been reached. If the\nspending exceeds some multiplier (e.g. 1.2 times the budget), it pulls the billing account off the project, which\nhalts all spending, at the risk of losing resources if the account is not restored in a timely fashion.\n\nNote that to be able to perform these actions, the service account in project \"B\" running the function needs\nto have \"Editor\" and \"Project Billing Manager\" roles in project \"A\". Editor is certainly too broad a role,\nand could be reduced to a custom role with some analysis [TODO].\n\nIn addition to writing to logs in project \"B\", each run reports spending to a log in project \"A\" that can be\nmonitored by members of project \"A\". Set an advanced filter in Stackdriver in project \"A\" to see these messages:\n\n```\nlogName=\"projects/project-a-id/logs/cost_estimation_log\"\n```\n\nNote that the log messages issued in project \"B\" are set up with the intention of having project monitoring\nin project \"B\" trigger alerts with emails or text messages to monitor the system.\n\nThis function was developed before Google allowed custom ranges to be specified for budgets, and assumes the\nbudget alert is monthly, but that the budget set specifies the \"all-time\" spending for the project. Thus, it\nkeeps state in a bucket in project \"B\" that you specify. Note that each project has a separate file, since the function\nis being triggered by each separate project, and we want to avoid race conditions. (Note the function usually takes\n10-20 seconds to run for each project it processes). With enough projects to monitor, it might fall behind (?). This\nhas not been explored.\n\nIn addition to monitoring the overall spending reported, the function checks if reported egress out of the cloud\nexceeds a specified fraction of the budget, since this has been an issue seen with some sponsored projects created\nfor new cloud users. In this case, it just issues a log message alert, but does nothing else.\n\nNote that monitoring egress spending requires the billing account to have a BigQuery table set up to receive the\ncurrent spending data from Google. Google does not currently have an API to query about amounts spent,\nso the BigQuery export is the only way to get the current fine-grained spending data needed to estimate egress\ncharges.\n\nAnother thing the system checks is the number of running CPUs. While Google does allow you to reduce quotas for\nVM counts in projects, this can take awhile and needs human intervention by Google support. Thus, this function\nprovides another way to cap VM usage. It counts the number of CPUs running in the project, and if that exceeds\na configurable value, the function will shut down VMs in a LIFO fashion to get back under the limit.\n\nFinally, one of the big shortcomings of the above system is that the published spending numbers can lag actual\nexpense by 12 hours (give or take). So shutting a project down based on stale numbers may be too late to avoid\nunexpected charges. Thus, this function also estimates the current burn rate for running VMs and persistent storage\neach time the function runs (e.g. every 20 minutes). The system estimates the total spending up to the current\ninstant, compares that to what has been reported in the Google BQ table that records charges, and boosts the\namount spent by the difference between the two. Note that at the moment, this is only done for VM and persistent\ndisks. AppEngine, and all sorts of other services, are not estimated yet.\n\nTo achieve this, a set of BigQuery tables need to be created in project \"B\" that quantify the costs associated\nwith running VMs. The seed for these tables is the SKU export table that Google will create if this export is\nspecified for the billing account. With that table in hand, running the function \"extract-sku-tables-from-desktop.sh\"\npointed at that table will generate all the specified SKU tables needed by the cloud function to do this estimation.\nNote this routine uses the standard method used by the ISB-CGC ETL system, where config files to run the script\nlive in a cloud configuration bucket. Usually those scripts run on a VM, but the \"-from-desktop\" bit means this\nscript is set up to run on your local laptop, usually after running \"gcloud auth application-default login\"\nto get the script to run locally using personal credentials. (Remember to run \"gcloud auth application-default revoke\"\nwhen done to get back to normal service account-driven script execution.)\n\nQuantifying this spending exactly is a work in progress, and currently involves educated guesswork as to how\na VM tag maps to specific VM and RAM SKUs. E.g., it appears that \"c2-standard\" might map to \"Compute optimized Core\"\nand \"Compute optimized Ram\". But only a small set of the search space has been confirmed; the rest is educated\nguessing.\n\nThe problem is even more complex for OS licensing, especially for Windows machines. The system is currently\nconfigured to try and handle the various cases (e.g. OS licensing costs based on GPU count, system RAM, etc).\nAll pull requests to improve the mapping tables are welcome.\n\nFinally, these price estimates do not include tiered pricing and sustained use discounts (or external IP charges, yet).\nIt is intended to be conservative. But these enhancements would be useful.\n\nNote that the cloud function can be triggered from an HTTP entry point if you just want to run the realtime\nestimates. To track a budget and (maybe) shut down resources, the Pub/Sub entry is the one to use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimagingdatacommons%2Frealtimecosts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimagingdatacommons%2Frealtimecosts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimagingdatacommons%2Frealtimecosts/lists"}