Caching Strategies: When to Cache, Where and For How Long

In 2023 an electronics e-commerce site faced a problem: catalog page with 10,000 products loaded in 4.5 seconds. Every request hit database, joined 6 tables, sorted results. At 1000 concurrent users database crashed, response time grew to 15 seconds. Added Redis caching with 5-minute TTL — page started opening in 80 milliseconds. Database load dropped from 800 queries per second to 12. Conversion grew 23%.

Caching solves three main problems: reduces response time, decreases database and external API load, cuts infrastructure costs. But wrong caching creates new ones: users see stale data, memory fills with garbage, invalidation becomes nightmare.

When to Cache: Data Access Patterns

First question isn't "how to cache" but "should you cache at all". Caching makes sense when data is read multiple times but changes rarely. Classic example — e-commerce product catalog. Same product viewed by hundreds of users per hour, but price and availability update once per several minutes or hours.

This is measured by read/write ratio. If data is read 10 times more often than written — caching will give noticeable effect. At 100:1 ratio effect will be dramatic. But if data is read and written roughly equally often, cache becomes source of problems — constant invalidation, inconsistency risk, additional complexity without real benefit.

Take WordPress blog. Articles read thousands times per day, but new ones published once per few days. Comments appear dozens times per day under popular posts. Here it's obvious what to cache: articles themselves (TTL several hours or days), article list on main page (TTL 5-10 minutes), comment count (TTL 1-2 minutes). But new comment submission form makes no sense to cache — every request is unique.

Access pattern matters. If 80% requests go to 20% of data (Pareto principle), these hot data must be in fast cache always. Real e-commerce project statistics: 15% of products (bestsellers, new arrivals, sales) generate 78% of views. These 1,500 products out of 10,000 should be in memory permanently.

Change predictability is also critical. Currency rates update every few minutes during business hours — can cache with 2-3 minute TTL. Weather forecast updates once per hour — TTL 50-55 minutes is safe. Stock prices change every second — caching for minutes is meaningless, need milliseconds or WebSocket with real-time updates.

Data size affects decision. Small JSON object 2-5 KB makes sense to cache almost always. But if you're going to cache query result returning 50 MB data for each user, calculate memory costs. At 10,000 active users that's 500 GB just for cache. Cheaper to optimize query or split data into parts.

Multi-Level Caching: From Browser to Database

Modern web application uses caching at several levels simultaneously. Each level solves its tasks and has its limitations.

Browser cache — first line of defense. Static resources like CSS, JavaScript, images, fonts should cache in browser aggressively. Header Cache-Control: public, max-age=31536000, immutable tells browser to keep file for year and never check its freshness. Key point — file versioning through hash in name: styles.a3f2c1b.css. Changed file — hash changed, browser loads new version automatically.

Statistics show proper browser cache setup reduces requests to server by 60-80% for repeat visits. Site that on first visit makes 120 requests and loads in 3.2 seconds, on second visit makes 25 requests and loads in 0.4 seconds. Users happy, server unloaded.

But HTML pages harder to cache in browser. Dynamic content changes, personalization for each user. Here usually use Cache-Control: no-cache which forces browser to check freshness on each request through ETag or Last-Modified header. If content unchanged, server responds 304 Not Modified without body — saves traffic and time.

CDN cache — second level. Content Delivery Network like CloudFlare, Fastly, AWS CloudFront caches content on servers close to users worldwide. User in Tokyo gets data from Tokyo datacenter in 15 milliseconds instead of 180 milliseconds from Frankfurt. Multiply by thousands requests — time savings huge.

CDN ideal for static but can cache dynamic content too. E-commerce main page updates once per 5 minutes — can cache in CDN with s-maxage=300. API endpoint returning category list (changes rarely) — s-maxage=3600. Important to understand CDN cache is shared for all region users, so can't cache personalized content.

Practical case: media site with news. Articles cached in CDN for 10 minutes. On new article publication API call made for cache invalidation of specific URL. Result: 95% requests handled by CDN, only 5% reach origin server. Infrastructure handles traffic spike 20x larger without origin server scaling.

Application-level cache — third level on application side. Here use in-memory solutions like Redis or Memcached. Redis especially popular thanks to rich data structure set, persistence and clustering. Access speed — sub-millisecond, 10,000-100,000 operations per second on single instance.

What to cache in Redis? Complex database query results, user sessions, external API call results, partially rendered HTML fragments, rate limiting limits, real-time counters and statistics. Key to efficiency — cache exactly what's expensive to recalculate or load.

Example from practice: social network caches user feed in Redis. Feed generation requires database queries for user friends, their posts, likes, comments — 15-20 queries. From database takes 200-400 milliseconds. From Redis — 2-5 milliseconds. Feed updates when friends publish posts, so TTL not needed — use explicit invalidation.

Database query cache — fourth level inside database itself. MySQL and PostgreSQL can cache query results automatically. But this makes sense only for truly identical queries. Query SELECT * FROM products WHERE category_id = 5 will cache, but SELECT * FROM products WHERE category_id = 6 is different cache key.

Problem is any table change invalidates entire query cache for that table. In high-load systems with frequent writes query cache can hurt performance due to constant invalidation. PostgreSQL removed built-in query cache altogether in favor of application-level caching. Modern recommendation — disable database query cache and use Redis.

Local in-process cache — fifth level right in application process memory. Libraries like node-cache for Node.js or cachetools for Python allow keeping data in process RAM. Access speed — nanoseconds, can't be faster. But size limited by process memory and data not shared between application instances.

Used for tiny data needed very often and practically never changing. Application configuration, country and city references, code-to-value mappings. Loaded on startup, used entire process lifecycle. Example: web server with 10 worker processes keeps 200-country reference in each process memory — 200 × 10 = 2000 records at ~1 KB = 2 MB. Negligible memory, instant access.

TTL Strategies: How Long to Cache

Time To Live is record lifetime in cache until automatic deletion. Too short TTL — cache works inefficiently, constant source data access. Too long TTL — users see stale data.

Static content — maximum TTL. CSS, JS, images with versioning in filename can cache for year: max-age=31536000. File will never change at this URL, new version gets new URL. Fonts, icons, logos — similarly year or more.

Relatively stable content — TTL from hours to days. Blog articles can cache for 24 hours, once-daily update not critical. Product pages in B2B shop where prices change rarely — 6-12 hours. Information pages like "About" or "Shipping" — week easily.

Case from practice: news site caches articles with gradation by publication time. Fresh articles (up to 2 hours) — TTL 2 minutes, news may update or supplement. Articles 2-24 hours — TTL 15 minutes. Articles older than day — TTL 6 hours. Archive articles older than month — TTL 24 hours. This balances freshness and load.

Moderately dynamic content — TTL from minutes to hour. Product list in e-commerce category changes when new products added or stock runs out. TTL 5-10 minutes acceptable for most shops. Main page with "Hits", "New", "Sales" blocks — 3-5 minutes.

Highly dynamic content — TTL seconds or no TTL at all. Shopping cart updates on every user action — can't cache with TTL, only explicit invalidation. Online chat — messages must appear instantly. Stock quotes — data becomes stale in seconds.

Adaptive TTL — smart strategy accounting for real change frequency. If data updated hour ago, next update probably not soon — TTL can increase. If data updates every 5 minutes for last hour, better reduce TTL.

Adaptive TTL formula: TTL = (current_time - last_change_time) × coefficient. Coefficient usually 0.5-2.0 depending on freshness criticality. Data unchanged for day gets TTL several hours. Data updated 2 minutes ago gets TTL 1-4 minutes.

Code implementation requires storing last change time with data. On cache read check if TTL expired, on miss load from source with timestamp, calculate new TTL and save to cache. Additional complexity pays off with reduced stale data at same hit rate.

Caching Patterns: cache-aside, write-through, read-through

Cache-Aside (Lazy Loading) — most common pattern. Application first tries reading from cache. If no data (cache miss) — loads from source and puts in cache for next time. If data exists (cache hit) — returns from cache.

Advantage in simplicity and fault tolerance. If cache crashes, application continues working though slower — just every request goes to database. Disadvantage is first request after start or after TTL expiration is slow, plus possible thundering herd problem when multiple requests simultaneously discover cache is empty and all hit database.

Thundering herd protection — use locks or "first request loads, others wait" mechanism. In Redis this done through SETNX (set if not exists) with short TTL. First request successfully sets loading flag, executes database query, puts result in cache, deletes flag. Subsequent requests see flag and wait short time or retry.

Node.js code with cache-aside pattern looks roughly like:

async function getProduct(productId) {  const cacheKey = `product:${productId}`;    // Try reading from cache  let product = await redis.get(cacheKey);    if (product) {    return JSON.parse(product); // Cache hit  }    // Cache miss - load from database  product = await db.query('SELECT * FROM products WHERE id = ?', [productId]);    // Put in cache for 10 minutes  await redis.setex(cacheKey, 600, JSON.stringify(product));    return product; }

Write-Through — on write data saved to cache and database synchronously. Write request waits until data written to both. Guarantees cache always contains current data, but each write slower due to double operation.

Used when critically important cache never contains stale data, and when writes relatively few compared to reads. Example: configuration management system where changes rare but must apply instantly everywhere.

Disadvantage obvious — if cache crashes, writes start failing too. Requires either high cache availability (Redis Cluster), or fallback logic which on cache unavailability still writes to database and just logs cache error for later investigation.

Write-Behind (Write-Back) — data written to cache immediately, to database asynchronously with delay. Write returns quickly, actual database save happens later in batches. This gives maximum write speed and allows grouping many small writes into large batches.

Risk is cache data can be lost if cache crashes before writing to database. Therefore write-behind used only for non-critical data or paired with persistent cache like Redis with AOF or RDB snapshots.

Classic example — counters and metrics. Product view counter increment writes to Redis immediately, to database syncs once per minute in batch for all products. If Redis crashes, lose minute of counters — usually acceptable.

Read-Through — similar to cache-aside but data loading logic from source encapsulated inside caching system itself. Application simply asks cache for data, and cache decides to load from database if needed. Requires cache knows how to load data, usually through callback or configuration.

Advantage is caching logic centralized and application doesn't contain if/else for cache hit/miss. Disadvantage in less flexibility and additional cache configuration complexity. In practice used less than cache-aside.

Cache Invalidation: Two Hardest Problems in Programming

Phil Karlton said: "There are only two hard things in Computer Science: cache invalidation and naming things". Invalidation truly hard because need to understand when data became stale and remove it from all cache levels.

TTL-based invalidation — simplest approach. Data lives certain time and automatically deletes. Doesn't require additional logic, but may show stale data until TTL expires. For many scenarios this acceptable.

Event-based invalidation — on data change explicitly delete related cache. Updated product price — deleted product page cache and product list cache in category. Published new article — deleted main page cache and article list cache.

Complexity is need to track all dependencies. Product change can affect dozens of cached pages: product card, category lists, search results, related products, recommendations. Forget to delete one cache — users see inconsistent data.

Tag approach helps. Each cache record assigned tags: product:123, category:5, brand:apple. On product change invalidate all records with tag product:123. On category change — all with category:5. Redis doesn't support tags natively, but can implement through sets: store for each tag set of keys, on invalidation read set and delete all keys.

Version-based invalidation — cache key includes data version. product:123:v5 instead of just product:123. On product change version increases to v6, old cache simply ignored. New requests create cache with new version. Old records die by TTL.

Advantage — don't need explicitly delete old cache, no race condition when one process deletes cache while another simultaneously writes stale data there. Disadvantage — memory occupied by both old and new versions until TTL expires.

Dependency tracking — on cache record creation fix all data sources that went into it. For example product page depends on: product itself, its category, brand, related products, reviews. On any dependency change cache invalidates.

This powerful but complex to implement. Need separate dependency table, database change tracking mechanism (triggers or event log), process that reads changes and invalidates related caches. In very dynamic systems overhead can be significant.

Cache stampede problem — special invalidation case. When popular cache expires, multiple simultaneous requests discover its absence and all go to database simultaneously. Database gets sharp load spike, may crash.

Solution — probabilistic early expiration. Few seconds before TTL expiration start with small probability updating cache asynchronously. Probability grows closer to expiration. Result is cache updates smoothly before expiring, stampede doesn't happen.

Formula: if (expiration_time - current_time) < (TTL × random(0, 1) × beta_coefficient) then update asynchronously. Beta usually 1-3. At beta=1 update starts when 0-100% TTL remains randomly, average at half. Larger beta means earlier update start.

Practical Cases and Metrics

E-commerce: product catalog. PostgreSQL database with 50,000 products. Catalog loaded in 2.3 seconds — query joined products, categories, brands, warehouse stock, prices. Implemented three-level caching:

  1. Redis caches individual products for 10 minutes
  2. Redis caches product lists by category for 5 minutes
  3. CDN caches HTML category pages for 2 minutes

Result: 92% requests served from cache. Load time dropped to 180 milliseconds. Database load reduced from 450 queries per second to 35. Saved about $2000 monthly on database hardware upgrade. Conversion grew 18% thanks to speed.

Invalidation: on product change through admin panel event sent which deletes product cache in Redis and makes PURGE request to CDN for this product pages. On price or stock change only product cache updates, category pages update by TTL in 2-5 minutes — acceptable delay.

SaaS dashboard: metrics and charts. Dashboard shows charts for last 7, 30, 90 days. Each chart — complex query with aggregation of thousands records. Query took 1.5-4 seconds depending on period.

Solution: query results cached in Redis for 5 minutes. First user opening dashboard pays 4 seconds, all following get result in 50 milliseconds. At 100 users opening dashboard per hour savings (100 × 4 seconds) - (1 × 4 seconds) = 396 seconds database computation per hour, 9,504 seconds (2.64 hours) per day.

Additionally implemented pre-warming: cron task every 4 minutes makes requests to popular dashboards before TTL expires. For VIP clients their dashboard always hot in cache, they never wait.

Mobile app API. App makes request to /api/feed returning personalized post feed. Each user sees unique feed based on subscriptions, so CDN cache doesn't fit. Database query took 250-600 milliseconds.

Cache personalized feed in Redis with key feed:user:{userId} for 3 minutes. On new post publication author added to queue, background worker invalidates cache of all this author's subscribers feeds asynchronously. For popular blogger with 50,000 subscribers this takes 5-10 seconds, but since asynchronous — nobody waits.

Result: API response time dropped to 80-120 milliseconds (Redis + JSON serialization). For users checking feed rarely (once per day) — still fast thanks to cache. For active users checking every 10 minutes — cache updates by TTL, they see new posts with maximum 3-minute delay.

Monitoring cache metrics. Critically important to monitor cache effectiveness otherwise you won't understand if it works at all:

Hit rate — percentage of requests finding data in cache. Formula: hits / (hits + misses) × 100%. Good hit rate 80-95% depending on scenario. Below 70% — cache works poorly, TTL too short or data too unique. Hit rate 99% can also be bad — possibly TTL too long and users see stale data.

Latency — cache access time. Redis should respond in 1-5 milliseconds at p99 (99th percentile). If time grows to 50-100 milliseconds — network problems, Redis overload, too large values in cache. Need investigate through Redis SLOWLOG command.

Memory usage — memory consumption. Redis should have 20-30% memory headroom for normal operation. If memory close to limit, Redis starts eviction (old record removal), hit rate drops. Either need more memory, or reconsider what's cached and for how long.

Eviction rate — how many records Redis deletes due to memory shortage. Good indicator — close to zero. If eviction rate high, means more data than memory, need increase memory or reduce TTL so old data deletes faster.

Miss latency — cache miss processing time (when have to go to database). If miss latency 500 milliseconds and hit latency 2 milliseconds, even at 90% hit rate average latency will be 10% × 500 + 90% × 2 = 51.8 milliseconds. If miss latency grows to 2 seconds due to database problems, average latency becomes 201.8 milliseconds — users notice degradation.

Common Mistakes and How to Avoid

Caching personal data in shared cache. Developer caches user profile with key user:profile forgetting userId should be in key. Result: all users see profile of whoever logged in first after cache expiration. Serious data leak.

Solution: always include userId or other user identifier in cache key for personal data: user:profile:{userId}. Code review should catch such bugs at PR stage.

Caching errors. On database error application caches error: null or empty array with 5-minute TTL. Next 5 minutes all requests get empty result though database already recovered. Users think data disappeared.

Solution: cache only successful responses. On error don't write to cache at all, return error to application. Or cache with very short TTL (5-10 seconds) to recover quickly after failure.

Infinite cache size growth. Application caches search results with key based on query string: search:{query}. Each unique query creates new record. Over time millions unique queries fill Redis memory completely.

Solution: use eviction policy in Redis — maxmemory-policy allkeys-lru removes least recently used keys when memory runs out. Or limit what's cached — for example cache only frequent queries (made 3+ times).

Dogpiling / thundering herd. Popular cache expires, 1000 simultaneous requests discover cache absence and all go to database. Database crashes under load.

Solution: use locks through Redis SETNX as described above, or probabilistic early expiration so cache updates before expiring. Or stale-while-revalidate pattern: return old expired cache to user immediately, launch update asynchronously in background.

Inconsistency between cache levels. Updated data in database, invalidated Redis cache, but forgot about CDN. CDN continues serving old version for another 10 minutes.

Solution: centralized invalidation logic cleaning all cache levels. On data change event sent, event handler invalidates Redis and makes PURGE request to CDN. Or use versions/ETag so CDN itself understands data became stale.

Conclusions

Caching isn't optional optimization but critical part of any high-load application architecture. Proper caching gives 10-100x performance improvement, reduces infrastructure costs, improves user experience. Wrong caching creates data leaks, shows stale information, complicates debugging.

Key principles: cache data with high read/write ratio (10:1 and higher), use multi-level caching from browser to database, select TTL based on data freshness criticality, monitor hit rate and latency, think through invalidation at design stage not post-factum.

Start cache implementation with measurements. Profile application, find slow queries executed often, calculate read/write ratio. Start with most obvious candidates — static data, expensive query results, external APIs. Implement gradually, monitor metrics, iterate.

Remember cache is compromise between performance and consistency. Absolute consistency with cache doesn't exist, always window between source update and cache update. This must be accepted and system designed with eventual consistency in mind where acceptable.