How we cut API latency by 60% in three weeks
A look at the cold-read bottleneck that was quietly slowing every request, and the caching layer we built to fix it.
Our monolith was fast enough, until it wasn't. As traffic grew, p95 response times crept past a second and our dashboards lit up during every morning spike. The culprit was not the code we expected.
Finding the bottleneck
Tracing a single slow request end to end revealed that nearly every call hit the database for data that almost never changed. We were paying for cold reads on settings, feature flags, and plan limits thousands of times a minute.
“The fastest query is the one you never make. Caching is not an optimization, it is a design decision.”
The fix
We introduced a small read-through cache in front of the hot tables with a per-key TTL and explicit invalidation on writes. The rollout was gradual, table by table, so we could measure the impact at each step.
- p95 latency dropped from 980ms to 390ms
- Database load fell by roughly half at peak
- No change to application code beyond the cache wrapper