Every cloud bill we audit has the same shape: 30 to 50 percent of spend is on resources that are either idle, oversized, or provisioned for a peak that happens twice a year. Customers hear this number and assume we're exaggerating. We're not. After looking at several hundred environments, it's almost universal.

This is not a moral failing. It's the predictable result of how cloud procurement actually happens — someone needs something to work by Friday, nobody wants to get paged at 3 am, and the cost of "one size bigger" feels cheaper than the cost of being wrong. Multiply that decision across a year and you've built a portfolio of expensive guesses.

Here's how we actually fix it.

1. Measure Before You Touch Anything

The single biggest mistake is resizing instances based on vendor recommendations from AWS Compute Optimizer or Azure Advisor without looking at the underlying data. Those tools use 14-day windows, which is often too short to catch monthly batch jobs or quarterly reporting workloads. They also tend to report on CPU and memory in isolation, ignoring IOPS and network throughput.

Before you change anything, collect at least 30 days of:

CPU utilization at 1-minute resolution (not 5-minute averages)
Memory working set, not allocated memory
Disk IOPS and throughput p95 and p99
Network bytes in and out, both sustained and burst

If you can't see that data today, fix your telemetry before you fix your bill. Making decisions on bad data is worse than making no decisions.

2. Right-Size the Instance Family, Not Just the Size

This is where most cost optimization projects leave money on the table. Moving from an m5.2xlarge to an m5.xlarge is the obvious move. Moving from m5 to m7g (ARM/Graviton) on AWS saves another 20 percent on top of that, and for most generic workloads the migration is zero code. Azure has similar generation jumps. GCP's Tau T2A instances behave the same way.

The hard part is knowing which workloads tolerate ARM. Most .NET 8 apps do. Most Java, Python, Node, and Go apps do. Legacy .NET Framework does not. Anything with native compiled dependencies needs a build pipeline change. Test before you cut over; don't trust the benchmarks.

3. Reserved Capacity for the Floor, On-Demand for the Peak

The rule we give customers: whatever your workload runs 24/7, buy that as a 3-year reserved instance or savings plan. Everything above that floor, pay on-demand. This typically cuts compute spend by 40 to 60 percent on the committed portion with zero performance impact.

The common mistake is committing to too much. Companies look at their peak usage, sign a 3-year commitment for that level, and then discover they only hit peak for 2 hours a day. Now they're paying reserved pricing for on-demand workloads. The math stops working.

Buy reservations for your p10, not your p90. Let the spiky stuff be on-demand.

4. Stop Autoscaling for Problems Autoscaling Can't Fix

Autoscaling is the reflexive answer to any cost question in cloud consulting, and it's wrong about half the time. Autoscaling helps when your load varies dramatically and your application startup time is short enough to respond to that variance. It does not help when:

Your workload is steady-state but oversized (you need right-sizing, not autoscaling)
Your bottleneck is the database, not the app tier (autoscaling the app tier adds more database connections to a database that's already at 80% CPU)
Your cold-start time is longer than your traffic spike (you'll never scale fast enough)
Your workload runs in containers with 30-second startup but a 10-second traffic spike

We've seen teams spend months building elaborate autoscaling rules when the honest answer was "run three fixed nodes." Don't build complexity you don't need.

5. Kill the Zombies

Every environment has dead stuff. Unattached EBS volumes, old snapshots, idle load balancers, NAT gateways from a VPC that was deleted years ago but left a dangling reference, test environments from a customer that churned 18 months ago, container images in ECR that nobody has pulled since 2022.

Our standard cleanup sweep finds:

5 to 15 percent of EBS spend on unattached volumes
10 to 30 percent of S3 spend in buckets nobody remembers owning
At least one $200/month NAT gateway serving zero traffic
Log retention set to "forever" on data that's useful for 30 days

None of this is glamorous. It's the equivalent of cleaning out your garage. It pays for itself in the first month and then keeps paying every month.

6. Build Cost Visibility Into the Workflow

The final step is cultural. If engineers don't see the cost of the resources they're creating, they'll create expensive ones. The fix is to surface cost at the point of decision:

Tag every resource with a team, environment, and project at creation time, enforced by policy (AWS SCPs, Azure Policy, GCP Organization Policy)
Give each team a monthly cost report for resources they own
Set up anomaly detection so a 3x spike gets a Slack alert the same day, not 30 days later on a finance report
Include cost in code review for infrastructure-as-code changes

The goal isn't to make engineers into accountants. It's to make cost a visible property of their decisions, the same way latency and error rate already are. Teams that see cost data make better decisions within weeks.

What This Actually Looks Like in Practice

For a typical mid-market customer running 200-500 VMs across a mix of production, dev, and test:

Month 1: Telemetry collection, tagging cleanup, zombie sweep. Usually cuts 10-15% immediately.
Month 2: Right-sizing recommendations, reserved instance purchase for the committed floor. Another 20-30%.
Month 3: Instance family migration where feasible, storage tier optimization, reserved capacity tuning. Another 10-15%.

Total is usually a 40-55% reduction in cloud spend with no performance regression. None of it is magic. All of it is measurement followed by small, evidence-based decisions.

Three Takeaways

Most cloud waste is unmeasured, not unfixable. The problem is almost never "we can't reduce this." It's "we can't see it clearly enough to know what to reduce."
Reserve the floor, burst the peak. One rule, huge savings, no performance impact for steady-state workloads.
Cost needs to be a first-class metric next to latency and errors. If engineers only see it on a quarterly finance review, the damage is already done.

Cloud Resource Allocation: Stop Over-Provisioning, Start Measuring

1. Measure Before You Touch Anything

2. Right-Size the Instance Family, Not Just the Size

3. Reserved Capacity for the Floor, On-Demand for the Peak

4. Stop Autoscaling for Problems Autoscaling Can't Fix

5. Kill the Zombies

6. Build Cost Visibility Into the Workflow

What This Actually Looks Like in Practice

Three Takeaways

Talk with us about your infrastructure

On-Premise Infrastructure

Private Cloud

Public Cloud

AI & Automation

Cloud Resource Allocation: Stop Over-Provisioning, Start Measuring

1. Measure Before You Touch Anything

2. Right-Size the Instance Family, Not Just the Size

3. Reserved Capacity for the Floor, On-Demand for the Peak

4. Stop Autoscaling for Problems Autoscaling Can't Fix

5. Kill the Zombies

6. Build Cost Visibility Into the Workflow

What This Actually Looks Like in Practice

Three Takeaways

Talk with us about your infrastructure