https://github.com/mindaugasnakrosis/azure-costs-analyzer
Read-only Azure cost & FinOps audit, delivered as a Claude Code skill. Snapshots a tenant via az CLI, evaluates against Microsoft + FinOps Foundation rules, produces a written analysis suitable for a PE operating partner.
https://github.com/mindaugasnakrosis/azure-costs-analyzer
azure azure-finops claude-code claude-skill cloud-cost cost-optimization finops pe-operations
Last synced: about 2 months ago
JSON representation
Read-only Azure cost & FinOps audit, delivered as a Claude Code skill. Snapshots a tenant via az CLI, evaluates against Microsoft + FinOps Foundation rules, produces a written analysis suitable for a PE operating partner.
- Host: GitHub
- URL: https://github.com/mindaugasnakrosis/azure-costs-analyzer
- Owner: mindaugasnakrosis
- License: mit
- Created: 2026-04-29T15:44:25.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-29T16:13:12.000Z (about 2 months ago)
- Last Synced: 2026-04-29T18:14:53.025Z (about 2 months ago)
- Topics: azure, azure-finops, claude-code, claude-skill, cloud-cost, cost-optimization, finops, pe-operations
- Language: Python
- Size: 138 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing-a-rule.md
- License: LICENSE
Awesome Lists containing this project
README
# azure-investigator
[](https://github.com/mindaugasnakrosis/azure-costs-analyzer/actions/workflows/test.yml)
[](LICENSE)
[](https://www.python.org/downloads/)
[](https://docs.astral.sh/uv/)
**Read-only Azure cost & FinOps audit, delivered as a Claude Code skill.** Snapshots an Azure tenant via the `az` CLI, evaluates against published Microsoft + FinOps Foundation rules, and produces a written analysis (`report.md` + `findings.yaml`) suitable for forwarding to a portfolio CTO.
> Built for the **20-minute Azure cost review** of a freshly-acquired portfolio company — every finding grounded in a citable authority (Azure Well-Architected Framework, Cloud Adoption Framework, Advisor recommendation reference, FinOps Foundation, Retail Prices API).
---
## Who this is for
- **PE operating partners and portco CTOs** running cost audits post-acquisition or pre-investment, who need a forward-able artefact in week 1, not week 4.
- **FinOps practitioners** who want a starting point for a structured Azure cost review with citation-grounded thresholds rather than vibes.
- **DevOps / platform engineers** who want a read-only inventory of cost waste in a tenant they've inherited.
- **Claude Code users** who want a real-world example of a skill with a proper persona, knowledge corpus, and architectural firewall.
If you need a remediation tool, this isn't it. The skill is **read-only by architectural guarantee** (33 forbidden write verbs, enforced at the subprocess boundary, 23 unit tests). It produces investigations, not actions.
---
## What it produces
A `report.md` that opens with headline GBP savings range, top-3 quick wins, top-3 strategic recommendations, then severity-grouped findings. Plus a flat `findings.yaml` for any downstream consumer. See [`docs/example-report.md`](docs/example-report.md) for a full sanitised sample.
```
# Azure cost review — 2026-04-29T14-41-29Z
- Total estimated monthly savings: £97 – £316 / month
- Findings by severity: Critical 0 · High 0 · Medium 12 · Low 136 · Info 60
## Top 3 quick wins
1. Orphaned managed disk: ...-containerRootVolume — £31–£38/mo (severity Medium, confidence High)
2. Orphaned managed disk: ...-osDisk — £4/mo (severity Medium, confidence High)
3. Unattached Standard public IP: pip-test-natgw-01 — £2–£3/mo (severity Medium, confidence High)
```
Every finding cites the `knowledge/*.md` document grounding it. Savings figures are GBP retail-rate ceilings — reservations and negotiated discounts are explicitly not netted out. Severity (Critical → Info) and confidence (High / Medium / Low) are separate axes, both reported.
---
## Architectural guarantees
- **Read-only is absolute.** The core `azcli.py` wrapper refuses 33 write verbs (`update`, `delete`, `create`, `set`, `add`, `remove`, `assign`, `start`, `stop`, `restart`, `deallocate`, `tag update`, `policy assignment`, …) at the subprocess boundary. Verified by 23 dedicated unit tests. Safe to run against production without a change-management window.
- **Knowledge corpus is hardcoded, versioned, citable.** Each `knowledge/*.md` ships with frontmatter (canonical URL, retrieval date, content SHA-256, `cited_by` list) and verbatim quotes the rule it grounds. The analyser refuses to run a rule whose declared `knowledge_refs` are absent. No live web fetches at runtime.
- **Two skills, one shared core.** `azure-cost-investigator` (FinOps persona) ships in v1; `azure-security-investigator` (security persona, same architecture) is a reserved namespace today and ships in v2. Skills never import each other.
- **GBP currency.** Retail Prices API queried with `currencyCode=GBP`; reports format figures as £.
- **Subscription scope.** Iterates every subscription the signed-in identity can access. `--subscription` and `--exclude` flags narrow scope.
- **Severity ≠ confidence.** A Medium-severity orphan disk (deterministic) and a Medium-severity oversized VM (CPU-only heuristic) are reported with different confidence levels so a reviewer knows where to push back.
See [`docs/architecture.md`](docs/architecture.md) for the full rationale.
---
## Layout
```
packages/
azure-investigator-core/ # shared: auth, az wrapper, snapshot, pricing, schema, knowledge loader
azure-cost-investigator/ # cost / FinOps skill (v1) — 11 rules, 11-doc knowledge corpus
azure-security-investigator/ # reserved namespace; ships in v2 (stub today)
scripts/
install_skill.sh # symlinks each SKILL.md into ~/.claude/skills/
refresh_knowledge.py # maintainer-only: re-fetch knowledge sources, surface drift
docs/
architecture.md # one-page: why two skills + one core
contributing-a-rule.md # how to author a new cost rule
example-report.md # sanitised sample report.md output
```
---
## Requirements
- **Python 3.11+** (3.12 also tested in CI).
- **[`uv`](https://docs.astral.sh/uv/)** for the Python toolchain. Install via `curl -LsSf https://astral.sh/uv/install.sh | sh` or your platform's package manager.
- **[Azure CLI (`az`)](https://learn.microsoft.com/cli/azure/install-azure-cli)** logged into the tenant you want to analyse.
- **[Claude Code](https://claude.com/code)** if you want to use the skill experience (the CLI works without it).
- *(optional)* The `reservation` extension if your tenant has reservations: `az extension add --name reservation`.
### Required Azure permissions
The signed-in `az` identity needs **read access** to whatever you want analysed. The minimum set:
| Scope | Built-in role | What it enables |
|---|---|---|
| Subscription | **Reader** | All resource-graph collectors (vms, disks, public_ips, nics, snapshots, app_service_plans, app_services, sql, storage_accounts, resources, tags, advisor) |
| Subscription | **Monitoring Reader** *(or Reader is usually enough)* | `vm_metrics`, `consumption` |
| Reservation order / billing scope | **Reservations Reader** | `reservations` collector + utilisation merge |
If a role is missing the corresponding collector emits a structured error in `manifest.yaml` and the rules that depend on it downgrade to Info findings — the run never aborts. You can re-run after granting the role.
---
## Quickstart
```bash
git clone https://github.com/mindaugasnakrosis/azure-costs-analyzer.git
cd azure-costs-analyzer
uv sync --all-packages
bash scripts/install_skill.sh # only needed if you want it as a Claude Code skill
```
Then, against your tenant:
```bash
az login
uv run azure-investigator init # writes ~/.config/azure-investigator/config.yaml
uv run azure-investigator doctor # verifies environment + corpus
uv run azure-investigator pull # snapshots every accessible subscription (5–15 min)
uv run azure-cost-investigator analyse latest
```
Outputs land next to the snapshot manifest:
| OS | Snapshot root |
|---|---|
| Linux | `~/.local/share/azure-investigator/snapshots//` |
| macOS | `~/Library/Application Support/azure-investigator/snapshots//` |
| Windows | `%LOCALAPPDATA%\azure-investigator\snapshots\\` |
Per snapshot:
```
manifest.yaml per-collector status, identity, subscriptions
subscriptions// raw `az` payloads (one JSON per collector family)
pricing/ snapshot-time price cache (reproducible reports)
report.md ← user-facing artefact
findings.yaml ← machine-readable findings
```
---
## Using as a Claude Code skill
After `bash scripts/install_skill.sh`, restart Claude Code (or run `/skills`). Then trigger the skill with a natural-language prompt — Claude reads `SKILL.md`, decides this skill matches, and drives the CLIs for you.
Example prompts that should trigger:
- *"Run an Azure cost review on my tenant. Use the latest snapshot."*
- *"Where is money being wasted in this Azure subscription? Walk me through the top 3 quick wins."*
- *"Do a 20-minute FinOps assessment of the production subscription. Cite the knowledge documents you're relying on."*
The skill will (in order): check `azure-investigator doctor` → decide whether to `pull` or reuse `latest` → run `azure-cost-investigator analyse` → narrate `report.md` to you, lifting the verbatim assumptions from each `SavingsRange` and citing the `knowledge/*.md` grounding each finding.
If you want to drive the engine directly without going through Claude Code, just use the CLI verbs above — the skill is optional.
---
## CLI surface
```bash
# core
azure-investigator init
azure-investigator doctor
azure-investigator pull [--subscription ...] [--exclude ...] [--collector ...]
azure-investigator snapshot ls
azure-investigator snapshot show
azure-investigator schema [finding|snapshot]
# cost skill
azure-cost-investigator analyse [--rule ...] [--exclude-rule ...] [--no-show]
azure-cost-investigator report [--format md|json] [--output PATH]
azure-cost-investigator knowledge list
azure-cost-investigator knowledge show
azure-cost-investigator schema [finding|report]
# security skill — v2 stub
azure-security-investigator analyse # prints "v2 — not implemented", exits 2
```
No mutating verbs. No `apply`, `remediate`, `fix`, `delete`. The naming is part of the read-only contract.
---
## Authorities the cost skill grounds itself in
| Authority | Used by |
|---|---|
| [Azure Well-Architected Framework — Cost Optimization pillar](https://learn.microsoft.com/en-us/azure/well-architected/cost-optimization/principles) | Strategic narrative, dev-vs-prod SKU mismatches |
| [Microsoft Cloud Adoption Framework — resource tagging](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/resource-tagging) | Untagged-resources rule, governance findings |
| [Azure Advisor — cost recommendation reference](https://learn.microsoft.com/en-us/azure/advisor/advisor-reference-cost-recommendations) | Mirroring the canonical taxonomy of cost findings |
| [Azure Advisor — VM / VMSS shutdown + resize logic](https://learn.microsoft.com/en-us/azure/advisor/advisor-cost-recommendations) | Verbatim P95 CPU + outbound thresholds for idle / oversized VMs |
| [FinOps Foundation framework](https://www.finops.org/framework/) | Inform / Optimize / Operate phases; reservation utilisation threshold |
| [Azure Retail Prices REST API](https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices) | All GBP savings figures |
The full corpus is 11 in-repo `.md` files at `packages/azure-cost-investigator/src/azure_cost_investigator/knowledge/`. List with `azure-cost-investigator knowledge list`; read individual docs with `azure-cost-investigator knowledge show `.
---
## Cost rules implemented in v1
| Rule | Severity (typical) | Confidence | Authority |
|---|---|---|---|
| `orphaned_disks` | Medium | High | Microsoft "unattached disks" + Advisor |
| `unattached_public_ips` | Medium / High | High | Standard SKU billing + Basic SKU retirement |
| `stopped_not_deallocated_vms` | Critical | High | Advisor + VM lifecycle |
| `idle_vms` | Medium | Medium | Advisor P95 CPU < 3% (verbatim) |
| `oversized_vms` | Medium | Low | Advisor user-facing target P95 ≤ 40% |
| `unused_app_service_plans` | High | High | App Service plan billing model + Advisor |
| `old_snapshots` | Medium | High | Cool tier 90-day minimum + Advisor |
| `underused_reservations` | Medium | Medium | FinOps Foundation 80% / 30-day |
| `dev_skus_in_prod` | Medium | Medium | WAF Principle 2 + CAF tagging |
| `untagged_costly_resources` | Low | High | CAF tagging schema + FinOps Inform phase |
| `legacy_storage_redundancy` | Low / Medium | Medium | Storage redundancy + WAF cost pillar |
To add a twelfth, see [`docs/contributing-a-rule.md`](docs/contributing-a-rule.md). The discipline is *knowledge document first, then code, then tests* — and the analyser refuses to run a rule whose `knowledge_refs` are missing.
---
## Troubleshooting
**`az login active` fails in `doctor`.** Run `az login` and confirm you can run a read like `az account show -o table`. The skill never runs `az login` for you.
**`disks` collector errors with `the following arguments are required: --resource-group`.** Some Azure CLI builds reject `az disk list` without `-g`. The collector falls back to per-resource-group enumeration; if every per-RG call fails the `orphaned_disks` rule downgrades to an Info finding instead of running blind. Updating the Azure CLI usually clears it.
**`reservations` collector returns "all 'utilisation unknown'".** The `reservation` extension may be missing or out of date. Run `az extension add --name reservation` (or `az extension update --name reservation`) and re-pull. The next snapshot will merge `avgUtilizationPercentage` from `az consumption reservation summary list` onto each reservation record.
**`consumption` collector times out.** Cost Management can be slow on large subscriptions. The collector uses a 600s per-call timeout. If it still times out, narrow the pull with `--collector` to skip `consumption` for the first run; you can re-pull later for that one collector.
**`analyse` errors with `KnowledgeRefMissing`.** A rule cites a knowledge document that's not in the corpus. Either the file was renamed (update the rule's `KNOWLEDGE_REFS`) or you ran from a partial install (`uv sync --all-packages` from the repo root re-establishes the corpus).
**`bash scripts/install_skill.sh` reports `warning: ... exists and is not a symlink`.** A previous install left a real file at `~/.claude/skills//SKILL.md`. The script backs it up with a timestamped suffix and then symlinks; the warning is informational, not an error.
---
## Tests
```bash
uv run pytest # 128 passed, 1 skipped (~1s)
# optional: run the smoke test against a real snapshot
AZURE_INVESTIGATOR_SMOKE_SNAPSHOT=/path/to/snapshot \
uv run pytest packages/azure-cost-investigator/tests/test_real_snapshot_smoke.py
```
The smoke test runs every rule against a real on-disk snapshot. It silently skips without the env var so CI / fresh checkouts stay green without a tenant.
Lint + format:
```bash
uv run ruff check .
uv run ruff format --check .
```
CI runs all three on push and on PR against `main`, on Python 3.11 and 3.12.
---
## Roadmap
- **`azure-security-investigator` (v2).** Same core, same architectural firewall, security persona. Knowledge corpus will quote Microsoft Cloud Security Benchmark and CIS Microsoft Azure Foundations Benchmark verbatim.
- **Outbound-network metric collection** for VMs. Would lift `idle_vms` confidence from Medium to High by completing Microsoft's full Advisor shutdown criterion.
- **`PricingClient` wired into rule output** for per-finding Retail Prices API lookups (currently uses packaged GBP/instance bands as ceilings).
- **FinOps Foundation `/framework/phases/`** verbatim quotes (currently a `TODO: refetch` block in `knowledge/finops-framework.md`).
- **Resource-graph-based pull mode** as a faster alternative for whole-tenant snapshots (currently per-RG enumeration for disks).
If you have an authority, a metric, or a rule you'd want grounded — open an issue. The pattern of "verbatim quote → citing rule → testable threshold" is reusable for anything with published thresholds.
---
## Contributing
See [`docs/contributing-a-rule.md`](docs/contributing-a-rule.md) for adding a new cost rule. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the broader contribution flow (issues, PRs, code review).
If you find a security issue (especially anything that could let the read-only firewall be bypassed), see [`SECURITY.md`](SECURITY.md) for responsible-disclosure instructions.
---
## License
MIT — see [`LICENSE`](LICENSE).