Category Archives: Storage Concepts

Storage concepts

Storage Architecture Concepts.

RAID Architectures

RAID – Redundant Array of Independent Disks

Typically drives of varying sizes can be used in a RAID configuration, however they will also size to the lowest drive format / geometry.

If you want to understand RAID concepts in more details, you can check the WIKI and various other online places rather than have me repeat it all here. Modern All Flash storage arrays that are less concerned with performance will primarily go with a dual or triple parity RAID for HA / Resilience

RAID 0

Good for performance , but no fault tolerance – striping

RAID 1

Good for read performance – mirroring

RAID 2

Forget RAID 2 – I have never in 20+ years used this

RAID 3

Forget RAID 3 – I have never in 20+ years used this

RAID 4

Forget RAID 4 – I have never in 20+ years used this

RAID 5

Good for performance. Block level striping with a single parity bit distributed over all the drives in the group. A minimum of 3 drives is required, 2 data and 1 parity but typically around 5-7 drives would be used as it allows for 1 failed drive but you don’t want to build your stripe too wide. Typically used with performance based 10K HDDs.

RAID 6

Good for resilience, similar to RAID 5, but has an additional parity bit (2). Write performance is traded off for redundancy as for every write you to also write 2 data bits so read performance could be 3x better than write performance. Often used for very wide stripes such as 18+2 or even wider when you are trying to maintain performance with data resilience.

Non RAID Architectures

JBOD

Just a Bunch Of Disks.

SPAN / BIG

Similar to RAID, but just a concatenation of drives to be used in a single volume and drives can be varying sizes, which is not good from a performance stand point.

MAID

Massive Array of Idle Drives, typically used for Write Once, Read Occasionally. Not a performance tier.

ERASURE CODING

Erasure Coding and RAID can sometimes be confused. You can use Erasure Coding in a similar way as RAID, however data is distributed and encoded (and can also contain parity data). Erasure Coding will use less capacity than RAID.

Erasure Coding is also used for things like distributed applications and Object storage providing redundancy at scale.

Data Architecture concepts

There are two main Data Architecture Concepts that I generally refer to.

DATA CENTRIC

Data centric refers to an architecture where data is the primary and permanent asset, and applications come and go. In the data centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone

DATA DRIVEN

Datadriven. When a company employs a “datadriven” approach, it means it makes strategic decisions based on data analysis and interpretation. A datadriven approach enables companies to examine and organise their data with the goal of better serving their customers and consumers