Hotchips in 2020 can bring us

  • Detail

What inspirations can hotchips in 2020 bring us?

hotchips is called a symbol on high performance chips, which is held every August. Unlike other conferences, which focus on academic research frontiers, hotchips is a grand event in the industry, focusing on the latest products or products under research of major processor design companies. IBM, Intel, AMD, arm, etc. are all frequent guests of hotchip conferences. Hotchips enables practitioners to understand the development trend of the industry

due to the impact of the epidemic, hotchips2020 will be carried out instead. The originally expensive cost of attending the meeting was reduced to $100. The organizer also kindly provides playback function, which can be watched without staying up late. In addition to not being able to communicate face-to-face with industry leaders, the effect of attending the meeting was very good. At present, the playback channel of hotchip is still open

hotchips2020 plans a total of eight sections, covering server processors, mobile processors, edge computing and sensing, GPU and game architecture, FPGA and configurable architecture, network and distributed systems, machine learning training and machine learning inference. The planning of section is still relatively traditional

hotchips does not have a collection of papers, but only slides. A total of 25 speeches were delivered during the two-day meeting, of which 23 were from industry and only 2 were from academia (from Harvard University and Zurich University of technology respectively), and both were related to AI accelerators. In addition, the conference also accepted 11 posters, mainly academic achievements. The conference also arranged two keynote speeches, "no transformer left behind" by Intel architect Raja M. koduri and "AI research at scale opportunities on the road ahead" by deepmind outstanding engineer Dan belov

I pay more attention to three sessions related to processors: server processors, mobile processors, and edge computing and sensing

server processor

section1 has four speeches, involving Intel's Icelake SP architecture, IBM's power10 processor and z15 processor, and Marvell's thunderx3 processor

Icelake SP architecture uses 10nm+ process and sunny Cove macrocore. Compared with cascade lake, the IPC of ice lake has increased by 18%, mainly due to the large increase of various resources, such as out of order execution windows, physical registers, L1 cache, etc. The main content of the speech focuses on the system architecture of Icelake sp. the main improvements include: scalable power consumption control and system management mechanism, which can reduce the complexity of SOC management and improve response time; Adjust the cache mechanism, shorten the response time of cache and improve the communication bandwidth; Improve the effect of dvfs from the aspects of circuit, algorithm and architecture. Intel has shortened the voltage and frequency adjustment time of the processor core to 0, which reflects the foundation of Intel's circuit optimization

power10 is the next generation architecture of IBM. It is expected to be officially released at the end of the year, using 7Nm technology. Each chip can integrate 16 cores, and each core supports 8 threads. Compared with the previous generation products, the performance improvement is still achieved through resource expansion, and the resources of some modules are even expanded by four times (TLB, l2cache, etc.). IBM spent nearly half of its time introducing its poweraxon interface and open memory interface, which are characterized by high bandwidth and compatibility. Both interfaces can achieve 1tb/s bandwidth. The poweraxon interface can form a 16 chip cluster with other chips, can be connected to the ASIC acceleration chip cluster, and can also be connected to the blade server cluster; Omi interface can be connected to SCM, DRAM and gddr DIMM. The compatibility of interconnection bus provides strong scalability for the system composed of power10

thunderx3 is Marvell's next generation arm server processor. It is estimated that there are 60 to 96 cores, and each core supports 4smt. The processor core adopts arm v8.3 core and supports some v8.4/v8.5 features. Compared with thunderx2, thunderx3 has made improvements in resources and algorithms. The contribution of optimization in various aspects to performance was shown in the speech, among which the more important optimization included execution ability, rob resources, branch prediction algorithm, cache capacity and so on. The on-chip interconnection of thunderx3 adopts a three ring structure, with 5 nodes connected to each half ring and 4 cores connected to each node

z15 is the latest server architecture at present. For detailed technical details, please refer to IBM's paper and introduction on isscc2020, and also refer to my isscc2020 processor tour written at the beginning of the year

server oriented processors are designed to improve performance. In order to achieve high performance, we must improve the throughput of the processor pipeline and improve the throughput of memory access and interconnection. In order to achieve this goal, the most effective implementation method is to integrate more resources and make processors larger and larger with the promotion of manufacturing technology

mobile processor

section2 has two speeches, namely, ryzentm 4000 of AMD and tiger lake of Intel. Both processors are for desktops and notebooks. Compared with server processors, mobile processors need to integrate graphics processing capabilities and more peripheral interfaces

the two speeches are more like product launches, with few technical details introduced. In the speeches, only a small space was used to introduce the microarchitecture of the processor core. Most of the features introduced in the speech are at the SOC system architecture level, such as interconnection structure, supported peripheral interfaces, power management methods, etc

edge computing

section3 has three speeches, namely, brother Flathead's xuantie 910, arm's cortex-m55 and ethos-u55, and a Bayesian inference accelerator designed by Harvard University

xuantie 910 is the only processor selected for hotchips in China. I recommend you to see the paper "xuantie-910: a commercial multi core 12 stage pipeline out of order 64 bit high performance risc-v processor with vector extension" published by brother Pingtou on isca2020

arm's cortex-m55 and ethos-u55 are processor cores and AI accelerators for the AI terminal market. Arm hopes that the combination of these two IPS can accelerate the development of SOC for AI specific application scenarios. The main content of the speech focuses on ethos-u55 and system integration

the speech of Bayesian inference accelerator is entirely related to accelerator design. It is beyond the scope of this article

keynote speech

the keynote speech on the first day of the conference was brought by Raja M. koduri, chief architect of Intel, "left no transformer behind"

the speech first looked forward to the future, pointing out that the number of intelligent interconnection devices will reach 100b, and the data will reach 175zb. Super intelligence is needed to deal with massive data and businesses

then, the speech pointed out that the design of processors can still provide huge space for business. On the one hand, optimize the software and fully tap the ability of hardware; On the other hand, manufacturing processes (such as 3D stacking) can still provide huge space for hardware design, improve chip integration and reduce power consumption

the following speech introduces Intel's vision for the future. Intel believes. In the future, general-purpose processors will not dominate the world, and a variety of computing resources (CPU, GPU, AI accelerator, FPGA, etc.) need to be used together. It is necessary to call the interface through layered abstraction. The scope of hardware will not only include the design of processors or special chips, but also include drivers, OS, virtual machine samples to withstand normal stress and middleware. The scope of software is rich and colorful business software and application software. Intel will build a complete system including process packaging, computing unit architecture, storage, interconnection, security and software. If this vision plan succeeds, Intel will transform from a chip manufacturer to a computing infrastructure provider (this is the new infrastructure)

The most important enlightenment given by the keynote speech is that the development of processors and software is driven by actual business, and the design of processors should be driven by software. When the processor design has been determined, software can be used as an adhesive to adapt the processor and business. However, in the process of processor design, the relationship between software and hardware should not be reversed

this year, I have the opportunity to continuously pay attention to the processor design introduced by the industry in the three conferences of ISSCC, ISCA and hotchips, and there are some personal one-sided and indirect

system architecture vs microarchitecture

for the processor field, many concepts need to be clarified. The first thing to be emphasized is the micro architecture and system architecture. Although they are all architectural designs, they are very different. Microarchitecture refers to the architecture design of the internal pipeline of the processor, that is, the process of "fetch - execute - transmit - execute - write back - retire". The scope of system architecture is much larger than that of microarchitecture. System architecture originally refers to the architecture of the whole computer system, including CPU, bus, peripherals and so on. With the improvement of chip integration, a single chip can also realize a complete system, that is, system on chip (SOC). The system architecture of SOC includes memory system, interconnection structure, accelerator, system management and so on. Therefore, the focus and coverage of system architecture and microarchitecture are very different

before 2010, there was a golden age of microarchitecture, and various concepts of microarchitecture emerged in endlessly. However, in the past 10 years, the design of processor microarchitecture has gradually converged, and a CPU microarchitecture characterized by superscalar and out of order execution has been formed, as well as typical features such as multi-core cache, branch prediction, and simultaneous multi-process. Not only high-performance CPUs, but also processors in embedded and edge computing scenarios are beginning to have these characteristics. The processor microarchitecture introduced in hotchips2020 conforms to this feature. The microarchitecture of each processor is very similar, and even the key scheduling and prediction algorithms are the same. The High Tech Department of the Ministry of science and technology organized the acceptance expert group to listen to the report of the project leader on the performance of the project. The architecture parameters of the benchmarking products are also very close

this situation is the inevitable result of the processor's pursuit of versatility. If we make a coordinate axis according to the universality and specificity of integrated circuits, then the processor discussed now is the most common among all kinds of integrated circuits. With the participation of software, the processor can complete most of the work in the world. For the requirements of versatility, the micro architecture of the processor needs to meet the support of various types of programs, and at the same time, it also loses the power to optimize the application of features. As Intel mentioned in its speech, Icelake is a processor that is well balanced for all types of services and applications

after 2010, driven by the two concepts of "multi-core, multi-core" and "heterogeneous", the research of system architecture has become a hot spot. The differences between the system architectures of various processors are still obvious

for server processors, the design goal is mainly to sprint for high performance. When the microarchitecture optimization space is reduced, we need to fully tap thread level parallelism (TLP) to obtain higher performance. Therefore, the server processor is mainly multi-core isomorphic integration, and improves the interconnection bandwidth through the ring or mesh interconnection structure, and then improves the effective memory access bandwidth by optimizing the cache protocol. It is also an obvious trend to integrate accelerators for some specific business scenarios into server processors

for mobile end processors, the design goal is to provide the most energy-efficient solution for multi-functional scenarios. Common architectures

Copyright © 2011 JIN SHI