The Optimal Caching Problem

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

The optimal caching problem can be solved with a greedy algorithm. A cache is a small fast memory.

The Problem

The problem consists in processing a sequence of "page requests". When a page that does not exist in cache is requested, the page must be brought from the big slow memory into the fast memory, and that constitutes a cache miss, or a "fault". As long as there is sufficient space left in cache, a fault triggers a simple page load. When the cache fills up, a fault causes an existing page to be evicted. The question is what page should be evicted from the cache to minimize the number of fault on the long term.

The Bélády Theorem

The theorem says that a natural greedy algorithm ("furthest-in-the-future") is an optimal algorithm for the optimal caching problem, minimizing the caching misses. "Furthest in the future" algorithm evicts the element that will be requested furthest in the future. This algorithm is not actionable because we - of course - do not know the future, so the algorithm is not implementable. However, this is a useful result to know because serves as a guideline for practical algorithm.

The Least Recently Used (LRU) eviction algorithm looks in the past and assumes that what was requested recently will also be requested soon, and what has not been requested for a long time, will continue to not be requested. LRU is assume to approximate "furthest-in-the-future". In consequence, we want recent elements to stay in the cache, and we evict the element that was requested least recently. If this behavior predicts the data behavior, it is says that the data has "locality of reference".

Also see:

LRU Cache

The theorem is also useful because it can used as an idealized benchmark for caching algorithms. Once the caching application starts to collect data and we have hindsight, data can be analyzed and we can tell how well we actually did versus how well we did if we had known the future. If the current caching algorithm does well, it's a good choice, otherwise we could work on devising a different caching algorithm.