In general, disk or disk arrays have the best performance in a single host connection scenario. Most operating systems are based on exclusive file systems, which means a file system can only be owned by a single operating system. As a result, both the operating system and application software optimize data read and write for the disk storage system based on its characteristics. This optimization aims to reduce physical seek times and decrease disk mechanical response times. The data requests from each program process are handled by the operating system, resulting in optimized and orderly data read and write requests for the disk or disk array. This leads to the best performance of the storage system in this setup.
For disk arrays, although an additional RAID controller is added between the operating system and the individual disk drives, current RAID controllers primarily manage and verify disk fault tolerance operations. They do not perform data request merging, reordering, or optimization. RAID controllers are designed based on the assumption that data requests come from a single host, already optimized and sorted by the operating system. The controller’s cache only provides direct and computational buffering capabilities, without queuing data for optimization. When the cache is quickly filled, the speed immediately declines to the actual speed of the disk operations.
The RAID controller’s primary function is to create one or more large fault-tolerant disks from multiple disks and improve the overall data read and write speed using the caching feature on each disk. The read cache of RAID controllers significantly enhances the disk array’s read performance when the same data is read within a short time. The actual maximum read and write speed of the entire disk array is limited by the lowest value among the host channel bandwidth, controller CPU’s verification calculation and system control capabilities (RAID engine), disk channel bandwidth, and disk performance (the combined actual performance of all disks). Additionally, mismatching between the optimization basis of the operating system’s data requests and the RAID format, such as the block size of I/O requests not aligning with the RAID segment size, can significantly impact the disk array’s performance.
Performance Variations of Traditional Disk Array Storage Systems in Multiple Host Access
In multiple host access scenarios, the performance of disk arrays declines compared to single host connections. In small-scale disk array storage systems, which typically have a single or redundant pair of disk array controllers and a limited number of connected disks, the performance is affected by the unordered data flows from various hosts. This leads to increased disk seek times, data segment header and tail information, and data fragmentation for read, merge, verification calculations, and rewriting processes. Consequently, the storage performance decreases as more hosts are connected.
In large-scale disk array storage systems, the performance degradation is different from that of small-scale disk arrays. These large-scale systems use a bus structure or cross-point switching structure to connect multiple storage subsystems (disk arrays) and include large-capacity caches and host connection modules (similar to channel hubs or switches) for more hosts within the bus or switching structure. The performance largely depends on the cache in transaction processing applications but has limited effectiveness in multimedia data scenarios. While the internal disk array subsystems in these large-scale systems operate relatively independently, a single logical unit is only built within a single disk subsystem. Thus, the performance of a single logical unit remains low.
In conclusion, small-scale disk arrays experience a performance decline due to unordered data flows, while large-scale disk arrays with multiple independent disk array subsystems can support more hosts but still face limitations for multimedia data applications. On the other hand, NAS storage systems based on traditional RAID technology and using NFS and CIFS protocols to share storage with external users through Ethernet connections experience less performance degradation in multiple host access environments. NAS storage systems optimize data transmission using multiple parallel TCP/IP transfers, allowing for maximum shared speed of around 60 MB/s in a single NAS storage system. The use of Ethernet connections enables the data to be optimally written to the disk system after management and reordering by the operating system or data management software in the thin server. Therefore, the disk system itself does not experience significant performance degradation, making NAS storage suitable for applications requiring data sharing.
Post time: Jul-17-2023