Huawei Announces New AI Storage Products in the Era of Big Models

[China, Shenzhen, July 14, 2023] Today, Huawei unveiled its new AI storage solution for the era of large-scale models, providing optimal storage solutions for basic model training, industry-specific model training, and inference in segmented scenarios, thus unleashing new AI capabilities.

In the development and implementation of large-scale model applications, enterprises face four major challenges:

Firstly, the time required for data preparation is long, data sources are scattered, and aggregation is slow, taking about 10 days for preprocessing hundreds of terabytes of data. Secondly, for multi-modal large models with massive text and image datasets, the current loading speed for massive small files is less than 100MB/s, resulting in low efficiency for training set loading. Thirdly, frequent parameter adjustments for large models, along with unstable training platforms, cause training interruptions approximately every 2 days, necessitating the Checkpoint mechanism to resume training, with recovery taking over a day. Lastly, high implementation thresholds for large models, complex system setup, resource scheduling challenges, and GPU resource utilization often below 40%.

Huawei is aligning with the trend of AI development in the era of large-scale models, offering solutions tailored for different industries and scenarios. It introduces the OceanStor A310 Deep Learning Data Lake Storage and the FusionCube A3000 Training/Inference Super-Converged Appliance. OceanStor A310 Deep Learning Data Lake Storage targets both basic and industry-level large model data lake scenarios, achieving comprehensive AI data management from data aggregation, preprocessing to model training, and inference applications. The OceanStor A310, in a single 5U rack, supports industry-leading 400GB/s bandwidth and up to 12 million IOPS, with linear scalability up to 4096 nodes, enabling seamless cross-protocol communication. The Global File System (GFS) facilitates intelligent data weaving across regions, streamlining data aggregation processes. Near-storage computing realizes near-data preprocessing, reducing data movement, and improving preprocessing efficiency by 30%.

The FusionCube A3000 Training/Inference Super-Converged Appliance, designed for industry-level large model training/inference scenarios, caters to applications involving models with billions of parameters. It integrates OceanStor A300 high-performance storage nodes, training/inference nodes, switching equipment, AI platform software, and management and operation software, providing large model partners with a plug-and-play deployment experience for a one-stop delivery. Ready to use, it can be deployed within 2 hours. Both training/inference and storage nodes can be independently and horizontally expanded to match various model scale requirements. Meanwhile, FusionCube A3000 utilizes high-performance containers to enable multiple model training and inference tasks to share GPUs, increasing resource utilization from 40% to over 70%. FusionCube A3000 supports two flexible business models: Huawei Ascend One-Stop Solution and the third-party partner one-stop solution with open computing, networking, and AI platform software.

Huawei’s President of the Data Storage Product Line, Zhou Yuefeng, stated, “In the era of large-scale models, data determines the height of AI intelligence. As the carrier of data, data storage becomes the key foundational infrastructure for AI large-scale models. Huawei Data Storage will continue to innovate, providing diversified solutions and products for the era of AI large models, collaborating with partners to drive AI empowerment across a wide range of industries.”


Post time: Aug-01-2023