To Cache or not to Cache?

Cache

Can be your best friend or your worst nightmare, what do I mean by that?

There are several options today for how you chose to implement cache.

  • Server Side (think Fusion IO cards or similar)
  • Storage Read Cache
  • Storage Write Cache

All types of caching are susceptible to sizing. Once the cache is full you still need go to disk.
Hitachi found this on their old enterprise replication product True Copy. As replication requirements got larger they were prone to cache punctures as during a network outage or if the replication was not sized correctly the cache that was holding the

Server Side:

In a shared storage environment you will have many hosts connecting to one or more storage controllers. These may boot from the storage controller or they may have internal disk but then mount LUNS or Volumes from the shared storage.

With Server Side cache, you typically install either a PCIE DRAM based card(s) or some SSD within each physical host.

Depending on what you chose to implement these will be read and/or write capable and are able to be tuned quite well based on your requirements.

Server Side cache has some challenges however.
Imagine you have a large VM farm with multiple varied workloads which is typical to most enterprises.
If you don’t add the same amount of cache to each server, you run the risk in the event of a host failure of the workloads that are pinned to the accelerated (cached) hosts not being able to provide the required level of services if they are moved to a non-accelerated host (using DRS as an example).

Server Side is good if you want to pin specific workloads to specific hosts such as VDI or a specific database

Storage Read Cache:

At NetApp we sold a lot of this, because until NetApp had Hybrid Aggregates this was the best way to accelerate VM workloads. If you incorporate data reduction techniques like de-duplication you can use a relatively small amount of cache to accelerate a lot of servers.

The problem is, its a read cache and it was also quite small.

With that said however, NetApps Flash Cache was great as it could be tweaked for things like Meta Data lookups which would help you accelerate some tasks like large block video indexes which you wouldn’t normally associate with cache (the video files not the index).

NetApp had a great product called Flash Accel that would interact between the Server Side and the Storage Side caches to determine the best places to accelerate but this was sadly pulled off the market.

The downside of most Read Caches like Flash Cache is they use volitile memory, or non-persistent memory to storage the reads so if you happen to restart your controller you have to re-load the cache. If time is of the essence, then this could be a big problem.

Read Caches are also almost always best used for small block random IO.

Storage Write Cache:

This is where it gets interesting as a Write Cache is almost always also used as a read cache and this most certainly will be non-volatile to sustain a power outage, so will likely be a SSD.

Write Caches will also typically be used for small block random over-writes (blocks that have recently been written to HDD) so not all write IO will be accelerated.

Write Caches are also typically larger than Read Caches and are a lot more flexible however they still suffer from size.

The problem with arrays like some of the hybrids out there that utilize any Write Cache for acceleration is that once that cache is full, your back to disk and if you have a tendency to use slow disk like SATA then your go from hero to zero very quick.

If your have a non-uniform IO size that doesn’t fit nicely into the stripe size then you could rapidly eat up cache and be down to disk before you know it.

Whats best:

Cacheing was introduced to fix a physics problem, disk.

If you don’t have disk, but instead use a form of non-volatile persistent storage like SSD then you are less likely to need a cache as it is technologies like SSDs that storage vendors are using as cache typically anyway.

A lot does come down to the storage operating environment and how it is implemented as some are more efficient than others.

So, think about what and where you need to accelerate or look to an All Flash Array like Pure Storage where you don’t have to think as much about how you architect your data storage needs.