Skip to main content
System Design5 min read

The Real Cost of Microservices

Photo of André Ferreira

André Ferreira

Senior Full-Stack Software Engineer • Mar 2026

Microservices do scale. The part that gets skipped in the pitch is whether your problem is scale, or whether you are buying someone else's story.

People will tell you they fix scaling, let teams move independently, and speed up deploys. Sometimes that is true. They also drag in infra work, distributed debugging, and on-call shapes that nobody put in the original estimate. I have watched teams burn months on plumbing while product velocity flatlined.

So start with the boring question: do you need this yet?

When the answer is no

Plenty of startups and small companies reach for microservices because the industry talks about them, not because their actual pain is the kind distributed systems are built for.

If the product still changes every week, if getting everyone aligned is still cheap, and if you are not sure the business will look the same next year, you are probably not in the chapter where microservices pay rent. A monolith you can reason about beats a constellation you cannot trace. Vertical scaling often lasts longer than the hot takes suggest.

The real trade-offs

AspectMonolithMicroservices
DeploymentOne deployable, one pipelineMany images, orchestration, coordination
DebuggingOne stack, logs in one placeTraces and log pipelines across services
Data consistencyACID in one databaseSagas, eventual consistency, compensating flows
Operational scalingStart by scaling the box; it can go further than folklore claimsMostly horizontal; more machines, more moving parts

That table is the polite version. The rest of the bill looks like this: you end up caring about Redis, Kafka, Jaeger, Prometheus, or whatever your shop standardizes on. If you want real ownership per service, you need real on-call per boundary. Deploys stop being "restart the app" and start being canaries, feature flags, and coordination. And the classic afternoon killer: service A sent a shape service B did not expect. How long until you find it?

When it actually matters

Microservices earn their keep when the problem really has seams: clear boundaries in load, latency, team ownership, or law. Not when leadership wants the diagram to look serious.

If your graphs show one slice of the work (CPU, disk, queues) maxing out while the rest of the system yawns, and you honestly need to scale that slice alone, splitting can make sense. It does not print free throughput. Every synchronous hop adds network, serialization, and new ways to fail. Chain A calls B calls C and you often move down in useful work per dollar until you redesign how work moves.

Tail latency is the same story. Naive splits add hops; p99 usually gets uglier. The win is when you can pull heavy or flaky work off what the user waits on: async handoffs, cache, a separately scaled edge. If the hot path still fans out across half the fleet in lockstep, you made latency worse.

Sometimes more than one team or product line truly cannot ship on one release train, and that coupling costs real money. That is Conway's law doing its job: boundaries should follow how you deliver, not how you wish the benchmarks looked.

And sometimes the law matters literally: data or processing has to sit in different trust or legal zones, and one fat binary cannot comply.

If none of that matches your week, calling the system "distributed" is still premature optimization.

When to migrate (if you must)

Stick with a monolith until the pain is measured and keeps showing up, not until the architecture review feels dull.

Deployments have to be slow or risky enough that they actually block product, and you can point to how often work stalls, how big the blast radius is, or how often one change forces a coordinated release. If you can already ship small and often without drama, you do not need microservices to fix cadence.

Latency and throughput need to show up in traces and load tests, not in hallway opinions. The database or a hot module has to be the real constraint, and you should have tried the cheap stuff first: indexes, caching, bigger iron, fixing the queries. Extracting a service is what is left when those stop being honest options.

Organizational coupling is the same signal again, just louder: a team cannot ship without waiting on someone else's schedule, and you can name what that costs. At that point splitting ownership can beat another process workshop.

Most mature systems I have seen look like a monolith with a few peeled-off edges: async jobs, analytics, integrations that fail in their own way. One boundary at a time, when the data says so. Not because the slide deck needs more boxes.

The checklist

Before you commit, be honest:

  • Do we have throughput or latency numbers that show the monolith is the bottleneck?
  • Do we have multiple teams that actually need independent deploys, not just independent Jira boards?
  • Do we have budget and people who know how to run this in production?
  • Does the team understand what they give up when consistency stops being "one transaction"?
  • Do we already run logs, traces, and metrics in prod and use them?

If most of those are "no," stay on the monolith until something changes.


Simplicity beats sophistication until the problems you have are the sophisticated kind.

Engineering log, in your inbox.

Notes on full-stack delivery, Web3, and cloud architecture—same themes as the blog, without the noise.