ZebEC

Bridging erasure coded storage to hot data.

The ZebEC erasure code library is based on a discrete and exact version of the random Mojette Transform. The Mojette transform is by nature a non-systematic code and the parity chunks have a larger size (1 + ε) than corresponding systematic chunks (k), where epsilon is ε > 0 making the parity chunks (m) containing more information than data chunks.

As an example of systematic code, Reed-Solomon (RS) codes run with optimal performance when no erasure is needed but suffers severely during operation when erasure is necessary. As consequence, due to the unpredictable performance of Reed-Solomon, until today erasure code has only been suitable for cold data storage and applications where performance is of less importance.

The ZebEC is also a systematic erasure code library compatible with today’s Reed-Solomon code installations  leveraging SIMD instructions. The implementation of the Mojette transform in ZebEC has been optimized for both encoding and decoding to handle intensive I/O applications, often referred to as hot data.

ZebEC is designed to excel in random pattern access for small block sizes, which is data block sizes for hot data in general-purpose file systems such as XFS, ZFS, Btrfs and ext4. This is also payload  sizes for the next generation object storage that can be used hot data as well as cold data.

In contrast to Reed-Solomon erasure code versions that require much out of the CPU, ZebEC is designed to be highly performant also on CPU’s without advanced acceleration features and deliver excellent results even on less potent CPUs, but take full advantage of modern CPUs features when present.

ZebEC is portable between different hardware platforms which means that it will now be possible to use in all different architectural layers such as data centers, client applications and edge devices.

The ZebEC erasure code is rate-less. This means that it is possible to set any redundancy level to a specific use case for optimal functionality, and add or reduce the redundancy level without noticeable performance impact when tiering the data from hot to cold storage or vice versa.

Throughput 4+2 (1 erasure)

Throughput 20+4 (1 erasure)

Test machine configuration: Intel® Xeon® Processor E5-1620 v3, 3.50 GHz with 16GB RAM