Skip to main content
Cloud

Serverless Computing: Eight Lessons from Real Workloads

Eight things we learned running serverless in production — the good, the expensive, and the surprising.

John Lane 2023-07-15 6 min read
Serverless Computing: Eight Lessons from Real Workloads

Serverless has moved past its hype cycle and into the boring part of its life, which is when the real lessons show up. We have built serverless systems, migrated customers off serverless systems, and kept plenty of customers happily running serverless at modest scale. Here are eight lessons — some counterintuitive — from that work. If you are evaluating serverless for a new project or re-evaluating an existing one, these are the things we wish more teams knew before they committed.

1. Cost Crosses Over Sooner Than You Think

Serverless is advertised as "pay for what you use" which sounds like it should always be the cheapest option for low to moderate traffic. It is not. The cost per request on Lambda, Cloud Functions, or Cloud Run is substantially higher than the cost per request on a well-utilized VM or container. The break-even where a reserved instance becomes cheaper is often lower than most people assume.

Our rough heuristic: if a function runs more than 10 to 15 percent of the wall clock (meaning the equivalent of roughly 2 to 4 hours per day of sustained activity), a small reserved VM or container will usually be cheaper. For steady-state services that run all day, serverless can be 3 to 5x the cost of an equivalent container.

When it still wins on cost

Bursty, unpredictable, or truly sporadic workloads — webhook handlers, scheduled jobs that run a few times a day, low-traffic APIs. The cheapest infrastructure is the one that charges you nothing when there is no work. Just do the math before you assume it applies to your workload.

2. Cold Starts Are Better, Not Gone

Cold start performance has improved dramatically on AWS Lambda (SnapStart, provisioned concurrency, ARM Graviton), and Cloud Run with startup CPU boost handles most web traffic fine. But a cold Node.js function loading a heavy framework still takes 1 to 3 seconds to respond, and a cold Java function loading Spring Boot can take 5 to 15 seconds. "Cold start" is not a single number — it is a function of runtime, dependency footprint, and configuration.

The strategy that works: measure your actual cold start distribution in production, not in a benchmark. Use provisioned concurrency or min-instances for latency-sensitive paths. Keep dependency trees small. Avoid heavy frameworks in runtimes that are sensitive to startup cost.

3. Observability Is Harder, Not Easier

The pitch for serverless included "no infrastructure to monitor." The reality is that you still need observability, and the distributed, ephemeral nature of serverless makes it harder, not easier. A request that flows through API Gateway, three Lambda functions, a Step Function, SQS, and finally DynamoDB is seven places to correlate logs, seven places where a trace can break, and seven places where a cost optimization decision needs to be made.

What actually works

Structured logging with correlation IDs from the entry point, distributed tracing with OpenTelemetry or X-Ray, and a commitment to treating the serverless stack as a distributed system that needs first-class observability, not as a simplified one that does not. Teams that skip this step spend twice as long debugging production and cannot confidently ship.

4. Local Development Experience Matters More Than You Think

Local development for serverless is still worse than for containers. LocalStack, SAM Local, and the various framework emulators help, but none of them are perfect replicas of production behavior. Teams that try to develop entirely locally hit surprises in production; teams that try to develop entirely against deployed cloud resources pay in iteration speed and in cloud bills.

The middle path that works: run the business logic as plain functions that can be tested locally, keep the cloud-specific glue thin and testable against real cloud resources in a dev environment, and invest in CI that deploys to ephemeral per-branch environments for integration testing. This is more work than container-based development, not less.

5. Function-per-Endpoint Is Usually a Mistake

The "canonical" serverless pattern is one function per endpoint. For small APIs with a handful of routes, this is fine. Past a dozen or two endpoints, it creates an operational mess: dozens of functions to deploy, dozens of IAM roles to manage, dozens of cold start profiles, and dozens of places where dependency updates need to land.

The monolith-in-a-function pattern

Run your whole web app as a single function, using a framework that handles routing internally (Express, FastAPI, Spring, etc.), and deploy it on Lambda or Cloud Run. You lose the theoretical benefit of per-endpoint scaling, which almost never matters in practice, and you gain operational simplicity that absolutely does. This is how we run most of our customer serverless APIs.

Traditional databases do not love serverless traffic patterns. A Lambda function that connects to Postgres on every invocation will exhaust the connection pool under moderate load. RDS Proxy helps, but adds latency and cost. Aurora Serverless v2 is better but still has scale-up latency. The serverless-native options (DynamoDB, Cloud Firestore, Cosmos DB) handle the traffic pattern well but come with data modeling constraints that bite hard if you did not know about them going in.

The lesson: pick your database intentionally. If you are committing to serverless compute, commit to a database that matches. Retrofitting Postgres into a heavy Lambda workload can be done but it will not be the thing that makes your architecture shine.

7. Vendor Lock-In Is Real, Even When You Think You Avoided It

Every serverless framework promises portability. None of them actually deliver it in practice. Lambda's event shapes, IAM integration, VPC model, and service integrations are subtly different from Cloud Functions, which are different again from Azure Functions. A "cloud-agnostic" serverless codebase usually works well on the provider it was built for and needs meaningful rework to run anywhere else.

This is not necessarily a problem — most customers are not moving between clouds — but it is worth being honest about. If your procurement story depends on multi-cloud portability, serverless is one of the harder ways to deliver on it.

8. It Shines for Exactly the Workloads You Would Expect

After all the caveats: serverless is excellent for the workloads it was designed for. Webhook receivers, scheduled jobs, event processing from queues, file processing triggered by uploads, internal tools with sporadic usage, and glue code that ties SaaS products together. For these, serverless is genuinely the right answer more often than not. We still deploy new serverless functions every week for customers doing exactly this kind of work.

Where it does not shine

Steady-state APIs with predictable traffic, latency-critical paths, workloads with heavy shared state, and anything that needs fine-grained control over the runtime environment. For those, containers or VMs remain the better choice.

What We Actually Build

The honest breakdown of what our customers run on serverless versus containers looks something like this: event-driven glue, scheduled batch, and sporadic APIs are serverless. Authenticated web APIs serving real traffic are containers on ECS, Cloud Run, or AKS. Background workers processing queues are containers (or Lambda if the queue is bursty and the work is quick). The hybrid is almost always cheaper and simpler than either extreme.

Three Takeaways

  1. Serverless cost breaks even lower than you think. Run the math on your actual traffic before you commit; steady-state workloads almost always belong on containers.
  2. Operational discipline still matters. Observability, CI, and local dev are not easier on serverless; plan for them explicitly.
  3. Match the database to the compute. Serverless functions against a traditional connection-based database is a well-known failure mode and has well-known mitigations, but it is not free.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →