Can AMD Erosion...

  • 2022-10-13 14:45:48

Can AMD Erosion Next-Generation Server Market Reverse Decline?

While Intel and AMD are playing back and forth on desktop and mobile CPUs, AMD has maintained a steady growth trend in server processor market share, with more and more cloud service providers and data centers switching to the AMD Yes camp, breaking the third quarter of this year. Market share record, reaching 16%. After launching its third-generation scalable processors in the first half of this year, Intel recently revealed more information about its next-generation server processor, Sapphire Rapids.

Under the circumstance that Arm is eyeing and RISC-V rookies are rising, the competition has also entered a white-hot stage. Since the introduction of the ZenCPU architecture, AMD's total market share has gradually caught up. While Intel and AMD are playing back and forth on desktop and mobile CPUs, AMD has maintained a steady growth trend in server processor market share, with more and more cloud service providers and data centers switching to the AMD Yes camp, breaking the third quarter of this year. Market share record, reaching 16%. Although Intel still holds more than 70% of the market, this advantage seems difficult to maintain under various pressures.

After launching its third-generation scalable processors in the first half of this year, Intel recently revealed more information about its next-generation server processor, Sapphire Rapids. Given the amazing improvements we've seen with the Intel 7 process on the 12th-generation consumer desktop processor Core, will the next-gen Xeon processors also be a blockbuster when they're released next year?

Considering that Sapphire Rapids will be released next year and a new generation of AMD EPYC processors will be released soon, Intel hasn't released much information about general computing performance. But from the modular chip designs released by Intel at the Innovation2021 and LinleyFallProcessor conferences, it can also be seen that SapphireRapids is not an ordinary generation.

Like the Sapphire generation of Core Duo, Sapphire Rapids also introduced support for PCIe 5.0, which further improves the DDIO and QOS capabilities of the processor. In addition, CXL1.1 and the new UPI2.0 are also supported.

Memory is also the most bottleneck-prone component in server and data center applications. As can be seen from the ADUC814BRU chip schematic diagram, Sapphirrapids integrates 4 memory controllers to support 8-channel DDR5 memory. Intel also provides support for its Optane persistent memory, the Optane300 series, as memory backing, which can also be used for storage. Intel will also launch a version that supports HBM, offering much higher memory bandwidth than 8-channel DDR5. This version will provide two HBM modes, one is HBMFlat mode, which supports HBM+DDR5 hybrid or only HBM mode. The other is the HBM cache mode, which is similar to L4 cache and uses HBM as a backup cache for DRAM.

AI acceleration in general computing

As the proportion of artificial intelligence in server workflows gradually increases, artificial intelligence computing power has become an inevitable parameter of every server processor, which has also become an inseparable topic when Intel promotes SapphireRapids. In the third generation of strongly scalable processors, Intel has built-in deep learning accelerators and AVX-512 vector extensions to support inference and training on int8 and bfloat16 data. In Sapphirerapids, Intel has added two new acceleration engines AMX (Advanced Matrix Extensions) and DSA (Data Streaming Accelerator).

AMX is a new accelerator that supports slicing operations and an instruction set extension specially prepared by Intel for tensor operations. The extension consists of two parts: slice and accelerator. A slice consists of 8 2D register files that support basic data operators such as load, clear, and set constants. Each register file can be up to 1KB in size, but designers can also reduce its size according to their own algorithms. In addition, Intel also confirmed that Linux version 5.16 will officially add support for AMX.

Intel has only released the TMUL accelerator (Sliced Matrix Multiplication Unit) so far, but AMX is an architecture that can continue to scale. New accelerators can be added in the future, or existing TMUL accelerators can be improved to achieve higher performance, thereby expressing more work under a single instruction and a single micro-op, saving power consumption for fetch, decode, and OOO. In the test, the same Sapphirapids processor uses AMX instructions and AVX-512VNI respectively, AMX runs 7.8 times faster than the latter.

When using the AVX-512 in the past, it was common to see the frequency drop after the power went up, and many people are also worried if the new AMX will have a similar situation. Intel confirmed that AMX will have no frequency jitter under fast, automatic, proper power control.

In high-performance storage, connectivity, and processing-intensive applications, people are always looking to free up processor cores to improve overall performance. Intel introduced the DSA data stream accelerator, which can transfer data from I/O-attached devices such as CPU cache and DDR memory. The goal is to provide higher overall system performance for data transfer and transformation operations, freeing up CPU cycles for other more advanced functions. According to the data given by Intel, in the virtual switch application of OpenvSwitch, the CPU usage is reduced by 39%, and the data transfer performance is improved by 2.5 times.

Can the strongest replace the GPU?

As we all know, today's server market is no longer dominated by CPUs. Whether it is speech recognition or image processing, the artificial intelligence computing power brought by the GPU permeates every scene. The GPU is the primary hardware device for any deep learning alchemist. In Intel's description of SapphireRapids, it says its AI performance is 30 times higher than the previous-generation IceLake chip. Can such a big boost replace the GPU?

The comparison object provided by Intel for SapphireRapids is NVIDIA's A30GPU. In the image classification inference of ResNet-50v1.5, the output speed of a single A30 is 15411 images per second, while the output speed of two SapphireRapids is 24000 images per second. This figure seems to have a big advantage, even close to 29,855 A100 GPUs per second, but the high-end Sapphirapids model (40+ cores) was used in the test, and the power consumption and price are far more than the A30.

Therefore, at this stage, servers still using heavy AI workloads will not be migrated from existing GPU or ASIC architectures. However, Sapphirrapids themselves are not just special products for AI. The reason why x86 CPUs strive to improve artificial intelligence performance is also for the increasingly common light artificial intelligence scenarios such as the combination of general computing and artificial intelligence.

summary

Taking back AMD's market share won't be easy, and Intel is no longer facing the duopoly situation at the beginning of this century. Sapphirapids can be said to be the first server product after changing coaches, processes and architectures. If not a blockbuster, many customers may be taken away by AMD's more cost-effective Zen4D and Zen5 in the future. Faced with the challenges of artificial intelligence such as Arm, RISC-V, GPU, and ASIC, if Intel still wants to maintain the dominance of x86, he must also speed up the development of its own xe server GPU and continue to expand the ecosystem of artificial intelligence accelerators under x86.