Skip to content

dasl-/gcp-waste

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gcp-waste

GCP Idle Resource Finder — identify underutilized Google Cloud resources to reduce cloud spending.

Scans Compute Engine VMs, Persistent Disks, Bigtable clusters, and Cloud Storage buckets, querying metrics from Cloud Monitoring to determine idleness based on configurable criteria.

Contents

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

For development:

pip install -e ".[dev]"

Authentication

Authenticate with Google Cloud before running:

gcloud auth application-default login

Required IAM Permissions

  • monitoring.viewer — metrics access
  • compute.viewer — VM and disk listing
  • bigtable.viewer — Bigtable listing
  • storage.viewer — bucket listing

Missing permissions are detected and reported with remediation hints.

Usage

# Scan a single project
gcp-waste scan -p my-project

# Scan multiple projects matching a regex
gcp-waste scan -p "myorg-.*-dev"
gcp-waste scan -p "^prod-"

# Filter by resource type
gcp-waste scan -p my-project -t compute

# Custom config, JSON output, sorted by name
gcp-waste scan -p my-project -c config.yaml -o json -s name

# Interactive HTML report
gcp-waste scan -p my-project -o html > report.html

# Hide low-cost resources
gcp-waste scan -p my-project --min-cost 100

# Multiple output formats to files (table always shown on stdout)
gcp-waste scan -p my-project -o csv,html --output-path report

# High concurrency with quota project to avoid rate limits
gcp-waste scan -p ".*-dev" -j 16 --quota-project my-project

CLI Options

Flag Short Default Description
--project -p required GCP project ID or regex pattern
--type -t all Resource type: all, compute, persistent_disk, bigtable, storage
--config -c built-in defaults Path to config YAML
--output -o table Output format: table, json, csv, html (comma-separated for multiple, requires --output-path)
--output-path Base file path for output files (extension added per format)
--sort -s cost Sort by: cost, name, type, project, location, created
--min-age Only scan resources older than N days
--idle-days Require idleness for N consecutive days
--min-cost Hide resources with estimated yearly cost below this amount (dollars)
--concurrency -j 4 Max parallel workers for API calls
--quota-project GCP project for API quota (avoids default 180 req/min limit)
--pricing-backend lookup Pricing backend: lookup, bigquery, or custom dotted.module.ClassName
--bigquery-billing-table Fully-qualified BigQuery table for billing export (required for bigquery backend)
--html-readme-uri URI to link as README in the HTML output title
--verbose -v false Verbose output

Configuration

Copy the example config and customize:

cp config.example.yaml config.yaml

Idleness Criteria

Each resource type has configurable criteria that determine whether a resource is idle:

Compute VMs:

  • low_cpu — average CPU utilization below threshold (default: 5%)
  • low_network — average network throughput (sent + received) below threshold (default: 1000 bytes/sec)
  • low_egress — average egress (sent only) throughput below threshold (default: 1000 bytes/sec)
  • low_memory — average memory usage below threshold (default: 10%, requires Ops Agent)

VMs that have been up for less than min_age_days are skipped (not enough metric data).

Persistent Disks:

  • low_disk_read — average read throughput below threshold (default: 1000 bytes/sec). No data (e.g. unattached disks) is treated as idle.

Bigtable:

  • low_read_bytes — average read throughput below threshold (default: 1000 bytes/sec)

Storage:

  • low_read_bytes — average egress throughput below threshold (default: 1000 bytes/sec)

Criteria Modes

Control how criteria combine to determine idleness:

  • "all" — all criteria must match (AND)
  • "any" — any criterion can match (OR)
  • "all(low_cpu, low_network)" — only listed criteria are evaluated; unlisted are skipped
  • "any(low_cpu, low_network)" — any of the listed criteria can match; unlisted are skipped

Blocklist

Exclude known-good resources from scan results using exact names or glob patterns:

blocklist:
  my-project:
    compute:
      - "prod-web-*"
      - "critical-db-01"
    storage:
      - "backup-*"

Other Config Options

# Exclude projects matching these regex patterns
exclude_projects:
  - ".*-sandbox"
  - "test-.*"

# Hide resources with estimated yearly cost below this amount
min_yearly_cost: 50.0

See config.example.yaml for full documentation of all options.

BigQuery Pricing

The default lookup pricing backend uses hardcoded rate tables for cost estimates. For actual costs based on your billing data, use the bigquery backend with a detailed usage cost billing export table.

Setup

  1. Enable billing export to BigQuery with Detailed usage cost data enabled.
  2. Note the fully-qualified table name (format: project.dataset.gcp_billing_export_resource_v1_XXXXXX_YYYYYY_ZZZZZZ).
  3. Install the BigQuery dependency: pip install -e ".[bigquery]"

Usage

# Via CLI flag
gcp-waste scan -p my-project --pricing-backend bigquery \
  --bigquery-billing-table "my-project.my_dataset.gcp_billing_export_resource_v1_AAAAAA_BBBBBB_CCCCCC"

# Or set in config.yaml to avoid repeating:
#   bigquery_billing_table: "my-project.my_dataset.gcp_billing_export_resource_v1_AAAAAA_BBBBBB_CCCCCC"
gcp-waste scan -p my-project --pricing-backend bigquery

The backend queries a 26-day window (30 days ago to 4 days ago, excluding recent unsettled data) and annualizes the costs. Resources not found in the billing export fall back to lookup table estimates.

HTML Output

The -o html format produces a self-contained HTML file with an interactive table (powered by Tabulator). Features:

  • Sortable columns — click column headers
  • Filter bar — regex filtering on project/name/location/reasons, type dropdown, min cost, date range
  • Shareable URLs — filter/sort state encoded in the URL hash fragment
  • Live cost total — updates as you filter
  • Clickable links — resource names link to GCP Console
  • Diff/compare — compare two reports to see what changed (see below)
gcp-waste scan -p "myorg-.*" -o html > report.html
gcp-waste scan -p my-project -o html --html-readme-uri="https://wiki/runbook" > report.html

Comparing Reports

The HTML output includes a built-in diff feature for comparing two reports side-by-side. This is useful for tracking changes over time — e.g., which idle resources were cleaned up, which are new, and whether costs shifted.

Triggering a comparison:

  • Menu dropdown — click the hamburger menu (☰) in the top-right. If the report is served from a web server, sibling .html files in the same directory are auto-discovered in a dropdown.
  • Browse button — pick any local HTML report file from disk.
  • Shareable URL — append #compare=old_report.html to the URL to load a comparison automatically.

Visual markers:

Marker Meaning
Green left border Added — resource is in the new report but not the old
Pink left border (strikethrough, faded) Removed — resource was in the old report but not the new
Yellow left border Cost changed — resource exists in both, cost differs by >25%

The summary bar updates to show a cost breakdown: total, added, removed, and changed amounts. The URL hash updates with the comparison state so the exact diff view can be shared as a link.

Scaling to Many Projects

Rate Limits

The Cloud Monitoring API has a default quota of 180 requests/min/user when using Application Default Credentials. When scanning many projects concurrently, use --quota-project to route API quota through your own project (which typically has a much higher limit):

gcp-waste scan -p ".*" -j 16 --quota-project my-project

File Descriptor Limits

High concurrency across many projects opens many gRPC connections simultaneously. On macOS the default file descriptor limit (256) may be too low, causing Too many open files errors. Raise it before running:

ulimit -n 2048 && gcp-waste scan -p ".*" -j 16 --quota-project my-project

To make this permanent, add ulimit -n 2048 to your ~/.zshrc or ~/.bashrc.

Development

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=gcp-waste

Project Structure

src/waste/
  cli.py              # CLI entry point (Typer)
  config.py            # YAML config loading (Pydantic)
  models.py            # IdleResource, ScanResult dataclasses
  output.py            # Table/JSON/CSV formatters (Rich)
  html_template.py     # Interactive HTML output (Tabulator JS)
  monitoring.py        # Cloud Monitoring API wrapper
  pricing.py           # Cost estimation (lookup tables)
  bigquery_pricing.py  # Cost estimation (BigQuery billing export)
  checkers/            # Resource type scanners
    base.py            # Abstract base checker
    registry.py        # Checker registry
    compute.py         # Compute Engine VMs
    persistent_disk.py # Persistent Disks
    bigtable.py        # Bigtable clusters
    storage.py         # Cloud Storage buckets
  criteria/            # Composable idleness criteria
    base.py            # Criterion and CriteriaGroup
    cpu.py, egress.py, network.py, memory.py, disk.py, requests.py, access.py
  vendor/              # Vendored JS/CSS for HTML output
  utils/
    permissions.py     # Permission checking with remediation hints

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages