BLOGS: My COW Blog Adobe Blog Editing Technology After Effects Final Cut Entertainment


Boosting disk mass without steroids

Our lab has reviewed new JBOD from WD.

What is good about big JBODs?

The brand new Western Digital Ultrastar Data102 (102-disks with 12TB HDDs) provides a lot of storage capacity. When developing the JBOD, WD built on previous experience with the 60-disk Ultrastar storage platform.
Among such giants, Ultrastar Data102 came out to be exceptionally balanced in terms of size and performance.
So, why does one need such large disk pools when hyperconvergent systems are becoming increasingly popular globally?
Tasks where storage size substantially exceeds computing capacity may turn out extremely costly to the client. Here are just several scenario examples:
1. Replication Factor 2 and 3, which is used for building scale-out systems based on several petabytes of data, is rather an expensive solution. Shared-disk solutions have much better cost structures in that case.
2. Intense sequential read/write operations force the cluster node to go beyond the local storage, which leads to such problems as long-tail latency. In this case, one must be extremely careful when building the network.
3. Distributed systems are ideally suitable for such tasks as “many applications work with many files” while they are rather mediocre for writing and reading from a tightly coupled cluster, especially in the N-to-1 write pattern.
4. In case of tasks like “increase the video archive depth twice”, it is far cheaper to procure a large JBOD instead of doubling the number of servers in the cluster.
5. Using external data storage systems based on JBOD, we can clearly provide our priority applications with relevant storage size and performance by allocating certain disks, cache, and ports while retaining the required degree of flexibility and scalability.
Western Digital engineers that developed the Ultrastar Data102 storage platform perfectly understood withal the pitfalls of utizling hard disk drives in an enclosure. The Ultrastar Data102 demonstrates good vibration and cooling levels while its power consumption is absolutely in line with real data storage requirements.

What is good about Western Digital Ultrastar Data102 storage platform?

We are well aware that scalability of module systems is limited by controller capabilities and that the network always causes delays. However, such systems are less costly in terms of IOps, GBps, and storage TBs.

There are two things why RAIDIX engineers came to love the Ultrastar Data102:
1. The Ultrastar Data102 not only allows placing of more than 1PB of data in a 4U enclosure, but it is also very fast. It stands on equal footing with most all-flash solutions: 4U, 1PB, 23 GBps — these are good indicators for a disk array.
2. Ultrastar Data102 is easy to operate — it does not require such tools as a screwdriver.
Our QA team hates screwdrivers so much that they started affecting their dreams. Once they learned that Western Digital was creating a 102-disk monster, they imagined that they would have to handle 408 tiny screws; our nearby shop completely ran out of strong alcohol.
In fact, all their fears were in vain. Western Digital took care of this and found a new disk locking method that makes maintenance simpler. Disks are attached to the chassis with anchor grips — without any bolts or levelling screws. All disks are mechanically isolated with elastic screws on the back panel. The new servo drive firmware and accelerometers perfectly compensate for vibration.

Ultrastar Data102 unveiled

The box contains the basket frame stuffed with disks. The basic set is 24 disks, while the solution is scaled with sets of 12 disks. This ensures correct cooling and allows mitigating vibration in the most efficient way.

Incidentally, it is due to development of two supporting technologies — IsoVibe and ArcticCool — that the new Ultrastar Data102 came to light.
IsoVibe consists of the following components:
1. Specialized disk firmware, which runs servo drives via sensors and reduces vibration levels predictively.
2. Vibration-isolated connectors on the server back panel (see Figure 1).
3. Specialized disk locking that do not require any screws.

Figure 1. Vibration-isolated connectors

Temperature is the second largest factor that kills hard drives. With average temperature of over 55°C, the MTBF will be twice as low as estimated.
Poor cooling mainly affects multi-disk servers and large disk shelves. Oftentimes, the temperature in back disk rows is 20°C higher than in disks located closer to the cold aisle.
ArcticFlow® is Western Digital’s patented shelf cooling technology (see Figure 2 below) that creates additional air ducts inside the chassis, allowing to move cool air to back disk rows directly from the cold aisle while bypassing front rows.

Figure 2. ArcticFlow flow chart

There is a separate cold air flow for cooling input/output modules and power units.
This results in a perfect thermal map of the operating shelf (see Figure 3). The temperature differential between the front and back disk rows is only 10°C. The hottest disk is 49°C, with the temperature in the cold aisle being 35°C. Cooling each disk requires 1.6W — twice as low as in a similar chassis. Ventilators are quieter, vibration is lower while disks survive longer and work faster.

Figure 3. Ultrastar Data 102 thermal map

Given the power supply budget of 12W per disk in the Ultrastar Data102 storage platform, it is possible to create a hybrid configuration: out of 102 disks, 24 disks can be SAS SSDs. They can not only be installed and used in hybrid mode, but it is also possible to set up SAS Zoning and assign it to the host that requires all-flash access.
The Ultrastar Data102 storage platform contains rackmount rails. To install a JBOD this size, one would need some physically fit engineers. The challenges they would face are:
• The ready-assembled shelf weighs 120 kg. Without the disks, it weighs 32 kg.
• In this case, the deep rack starts from 1,200 mm.
• Plus, there are SAS and power cables.
JBOD locking and cabling are designed to enable non-disruptive maintenance. Vertical installation of the input/output module (IOM) is also noteworthy.

Let’s have a look at the system. The front panel looks simple and neat (see figure 4).

Figure 4. Ultrastar Data 102. Front view

One of Ultrastar Data102 key features is that the IO modules are installed on top (see figure 5, 6 and 7).

Figure 5. Ultrastar Data 102

Figure 6. Ultrastar Data 102. Top view

Figure 7. Ultrastar Data 102. Top view without drives

In the back, Ultrastar Data102 has 6 SAS 12Gb ports for each IO module (see figure 8). Hence, backend throughput totals 28,800 MBps.
The ports can be used both for connecting to hosts and partially for cascading. There are two power supply ports (80+ Platinum rated 1600W CRPS).

Figure 8. Ultrastar Data 102. Back view


As mentioned above, the Ultrastar Data102 is not only huge, but it is also fast! Western Digital obtained the following test results when respectively 6 and 12 servers are connected:

12 servers
Sequential load

Read = 24.2GB/s max @ 1MB (237 MB/s per HDD max)
Write = 23.9GB/s max @ 1MB (234 MB/s per HDD max)

Random load
Read = 4KB with the queue depth of 128: >26k IOps
Write = 4KB with the queue depth of 1–128: >45k IOps

6 servers
Sequential load

Read = 22.7GB/s max @ 1MB (223 MB/s per HDD max)
Write =22.0GB/s max @ 1MB (216 MB/s per HDD max)

Random load
Read = 4KB with the queue depth of 128: >26k IOps
Write = 4KB with the queue depth of 1–128: >45k IOps

Figure 9. Parallel load from 12 servers

Figure 10. Parallel load from 6 servers

Software Management

There are two options for software control of the Ultrastar Data102 storage platform:
1. Via SES
2. Via RedFish
SES (SCSI Enclosure Services) is a standard for management of environmental factors such as temperature, power, voltage, etc.
RedFish allows the user to find components by turning on LEDs, obtain information about their “health”, and update the firmware.Please note that the Ultrastar Data102 storage platform supports T10 Power Disabling (Pin 3) for disabling power and resetting individual disks. This feature is useful when your disk freezes up the entire SAS bus.

Typical configurations

To use the Ultrastar Data102 capabilities in the most efficient way, one needs RAID controllers or software. That is where RAIDIX software is most helpful.
To create а fault-tolerant data storage system, it takes two storage nodes and one or more baskets with SAS disks. Still, if we are not planning to implement protection against node failure or replicate data, we can just connect only one server to the basket and use SATA disks.
Two-controller configuration
The controller function for RAIDIX-based data storage systems can be played virtually by any x86 server platform, including Supermicro, AIC, Dell, Lenovo, HPE, and many others. We are constantly working on new hardware certification and porting our code to various architectures (for example, Elbrus and OpenPower).
For instance, let’s take aSupermicro 6029P-TRT platform and try to achieve the maximum throughput and compute density. For purposes of server sizing, let’s use the PCI-E bus on which we will install backend and frontend controllers.
We will also need controllers to connect the Ultrastar Data102 storage platform — at least two AVAGO 9300-8e controllers. Alternatively, that may be a pair of 9400-8e controllers or a single 9405W-16e controller. Still, we would need a full-fledged x16 slot for the latter.
The next component is a synchronization channel slot — Infiniband or SAS (for tasks where throughput and delays are not critical, it is possible to get by with synchronization via basket without a dedicated slot).
Surely, we would also need no less than a pair of slots for host interfaces.
To summarize, each controller must have at least five x8 slots (without room for further scaling). To build inexpensive systems with performance of 3-4 Gbps per node, we can well get by with only two slots.

Controller configuration options

The configurations below are subject to maximum performance and functioning of all the options available. For this exact task, specifications can be substantially reduced.
1U configuration works for cases where high storage density is crucial but full fault-tolerance of controller is not demanded. 2U configuration provides the highest level of fault-tolerance and keep integrity of workflows and business processes.
Both type of system are based on Supermicro servers — one of the most popular industry standard hardware.

Supermicro 6029P-TRT

Controllers are placed on two 2U 6029P-TRT servers. These servers are not that rich in PCI-E slots, but they are equipped with a standard motherboard without risers. These boards will surely have Micron NVDIMM modules to protect the cache from power outage.
Let’s use Broadcom 9400 8e for disk connection. Dirty cache segments will be synchronized via IB 100Gb.

Here is an approximate flow scheme:

Figure 11. Configuration based on Supermicro 6029P-TRT

We developed the following system configuration:

Supermicro 2029BT-DNR

If we want to leave more free space in the server room, we can take the Supermicro Twin server (for example, 2029BT-DNR) to play the storage controller function. Such systems have three PCI-E slots and one IOM module each. There is the required Infiniband among IOMs.

Here is an approximate flow scheme:

Figure 12. Configuration based on Supermicro 2029BT-DNR

Configuration for this is listed below:

1U platform

Often there are use cases where high storage density is crucial while full fault-tolerance of controller is not demanded. In this case, we take 1U as a basis and connect to the system sufficient quantity of disk platforms.

Scale-Out system

Our last exercise is to build a scale-out system based on HyperFS, using two types of controllers — one for data storage and the other one for metadata storage.
1. The storage controller function will be done by SuperMicro 6029P-TRT.
2. To store metadata, let’s use several SSD storage devices in the Ultrastar Data102 storage platform. We will combine them into RAID and provide access to MDC via a SAN. It is possible to connect up to four JBODs to one storage system in cascade.

Overall, 8 PB of data is placed in one deep storage rack with one common namespace.

Here is an approximate connection flow scheme:

Figure 13. Scale-out system configuration


Handling big data volumes, especially with write-intensive patterns, is an extremely complicated task for data storage systems. The classical solution would be to procure a shared-nothing scale-out system. The new Western Digital Ultrastar Data102 storage platform, in combinaton with RAIDIX software, allows for creation of a data storage system with several PB capacity and performance of a few dozen GBps. for lower cost when compared to using scale-out systems. We recommend having a look at this solution.

Posted by: Serge Plat on Jul 2, 2018 at 9:17:56 am SAN, NAS, Scale-out

This is my COW Blog!


July 2018 (1)


show more
© 2019 All Rights Reserved