Agentultra - Least Frequently Used Cache

Update 2026-01-27:

There was a mistake in the complexity analysis and some dead links that got fixed. This implementation is naive and has a complexity of O(n). Thank you to my good friend Simon for reading and pointing them out!

Back with another post of data structures! This time, the Least Frequently Used cache, also in C++23. In the last post we walked through the Least Recent Used Cache. That cache uses a policy that evicts keys that haven’t been used in the longest amount of time. The Least Frequently Used Cache is also a cache but with a different policy: evict the key that has been accessed the least.

Getting Started

Let’s setup a simple main function to demonstrate how we expect to use this cache:

#include <cassert>
#include <cstdint>
#include <print>

int main() {
  LFU<std::string, uint64_t> freq_cache(3);

  freq_cache.set("foo", 11);
  freq_cache.set("bar", 22);
  freq_cache.set("baz", 33);

  // Simulate a client accessing the "foo" key twice...
  freq_cache.get("foo");
  std::optional<uint64_t> foo = freq_cache.get("foo");
  foo = freq_cache.get("foo");

  assert(foo);

  // And access the "bar" key once
  std::optional<uint64_t> bar = freq_cache.get("bar");

  assert(bar);

  // Leaving the "baz" key accessed 0 times, we add "qux"
  freq_cache.set("qux", 44);

  std::optional<uint64_t> baz = freq_cache.get("baz");

  // "baz" should have been evicted
  assert(baz.has_value() == false);

  return 0;
}

This won’t compile yet so we’ll now define the LFU type, a constructor, and some accessor functions to get started:

// .. new imports
#include <map>
#include <vector>

template<typename K, typename V>
struct LFU {
    size_t capacity;
    std::map<K, V> cache;
    std::vector<std::pair<K, uint64_t>> keys;

    LFU(size_t capacity) : capacity(capacity), cache(), keys() {}

    size_t size() {
        return cache.size();
    }

    size_t capacity() {
        return capacity;
    }
};

We use a std::map as we did with the Least Recently Used Cache for our keys and values. As before, we’ll also need an ancillary data structure to keep track of the policy. In this case we’re going to use std::vector.

Heaps

In order to know how frequently a key has been used, meaning read or updated, we need to keep a counter for that key. You might think to use a std::map to keep this information. If you did though, when it came time to evict the key you’d have to search the whole map to find the key with the lowest counter value.

Instead this cache is going to use a heap. This structure keeps our keys in the appropriate order we want so that the least element is always in a position where we can access it in O(1) time.

C++23 doesn’t have a heap data structure in its standard library. It does however have the make_heap function which can turn a container of orderable elements into a heap. You can even decide where to place the elements by using a max heap, the default, or a min heap by providing a comparison function that orders the elements.

In our case, since constructing an LFU starts with an empty vector, our heap is already sorted. Instead we will use the std::push_heap function, which is used in a similar manner, in order to insert into our heap. Inserting elements this way keeps the heap property intact: the right element will always be where we want it.

This is why our definition uses a std::vector<std::pair<key, uint64_t>> for the keys field instead of a std::map: the former can be iterated in order.

The trade off with this structure is that updating the key counter is going to require searching through the vector, an O(n) operation in the worst case.

Implementing the Cache

Let’s go ahead and implement the get function we defined back in main:

// .. new imports
#include <algorithm>

// Updating LFU...
template <typename K, typename V> struct LFU {
    // ...

    std::optional<V> get(K key) {
        if (cache.contains(key)) {
            // Adding asserts here to maintain our invariants while
            // we're developing this structure.
            //
            // Here we're saying that if the key exists in the cache,
            // it also exists in the heap
            assert(std::find_if(keys.begin(), keys.end(), [&key](auto k) { return k.first == key; }) != keys.end());

            V value = cache.at(key);

            update_count(key);

            return value;
        }

        return {};
    }

private:
  void update_count(K key) {
    auto k = std::find_if(keys.begin(), keys.end(),
                          [&key](auto k) { return k.first == key; });

    // This is the slightly inefficient part, updating our
    // count in the heap requires a linear search through the
    // elements
    if (k != keys.end()) {
        (*k).second++;
    }

    // Elided here... but if the assertions are optimized out
    // of a production build, what should this code do here if
    // the key isn't found in the heap?
  }
};

So every time we successfully hit the cache for a key, we count that as “accessed” and increment the counter accordingly.

We also need to update the counter when setting the key as well as checking for eviction:

template <typename K, typename V> struct LFU {
    // ...

  void set(K key, V value) {
    if (cache.contains(key)) {
        cache[key] = value;
        increment_count(key);
    } else {
        if (size() == capacity) {
            // Just more assertions to make sure things are in the
            // right state.
            assert(!keys.empty());

            // Here the eviction key is always in the right position,
            // making this an O(1) operation
            K evict_key = keys.back().first;

            cache.erase(evict_key);

            cache[key] = value;

            keys.emplace_back(std::pair(key, 0));
        } else {
            cache[key] = value;

            keys.emplace_back(std::pair(key, 0));
        }
    }

    // Updating this heap here is an O(log n) operation to maintain
    // the heap property
    std::push_heap(keys.begin(), keys.end(), [](auto a, auto b) {
        return std::max(a.second, b.second);
    });
  }

  // ...
};

Conclusion

For this implementation the get and set operations on our structure have a time complexity of O(n)/O(n) respectively due to update_count.

We could make an O(1) time complexity implementation by increasing the amount of memory used to store several doubly-linked lists but I wanted to explore use of std::push_heap/std::make_heap. Although if you need to constrain the amount of memory used this might be a useful implementation for you.

This implementation also doesn’t consider how to “break ties” when two elements have the same frequency value. Often LFU implementations will use the LRU policy to break such ties. You could try implementing that yourself using this implementation as a base.

In practice, LFU caches are good when you traffic patterns require consistent access to popular elements.

However the trade-off with LFUs are that they can be more complex than LRU caches to implement. Especially if you want to match the time performance of an LRU.

Happy hacking out there.