EngineeringJun 18, 2026

How we cut API latency by 60% in three weeks

A look at the cold-read bottleneck that was quietly slowing every request, and the caching layer we built to fix it.

LP
Lena ParkStaff Engineer
7 Min Read
Server racks in a data center

Our monolith was fast enough, until it wasn't. As traffic grew, p95 response times crept past a second and our dashboards lit up during every morning spike. The culprit was not the code we expected.

Finding the bottleneck

Tracing a single slow request end to end revealed that nearly every call hit the database for data that almost never changed. We were paying for cold reads on settings, feature flags, and plan limits thousands of times a minute.

“The fastest query is the one you never make. Caching is not an optimization, it is a design decision.”

The fix

We introduced a small read-through cache in front of the hot tables with a per-key TTL and explicit invalidation on writes. The rollout was gradual, table by table, so we could measure the impact at each step.

  • p95 latency dropped from 980ms to 390ms
  • Database load fell by roughly half at peak
  • No change to application code beyond the cache wrapper
EngineeringPerformanceDatabases
Share