Although AMD has been very good in the processor market in recent years, especially several cost-effective dual-core, three- and four-core processors listed to meet the eye of consumers, but also earned the silver, but with the introduction of the Core i series , AMD's processors do not have any advantages in performance, especially under Intel's "Tick-Tock" strategy, AMD has never had the initiative in the desktop processor area, while AMD playing in the processor field "Tianji horse racing" The strategy has also gradually emerged in decline. In the face of Intel's powerful technical strength and leading market share, AMD is a bit uncomfortable. Therefore, AMD can only hope to change the situation that has been oppressed by Intel once again with technological innovations. At the end of August of this year, AMD released a lot of information on the next generation of Bulldozer processor (Chinese codename “bulldozer”) to the outside world. Obviously, AMD will hope that all the Bulldozer processor!
Kernel design fully modular
AMD's "Bulldozer" is a new generation of processor architecture introduced after the K10. For AMD, this is a fundamental change in the AMD processor architecture since K7. The core architecture and functional quotient are larger than those of the K10. change. Compared to the current AMD mainstream processor architecture, the Bulldozer core's biggest highlight is the introduction of a modular design, which makes it easier for the processor to control costs on a more flexible basis. We all know that AMD's current core structure, whether it is a quad-core, a three-core or a dual-core, is solidified. It adopts a unified quad-core (or more) physical structure and is based on a market price strategy. Selective shielding of the kernel. This is difficult for AMD to control the cost of the product to win the cost, but also to the development of more core products in the current architecture almost to update the layout of the design substrate.
The modular design introduced by the Bullbozer processor just solves this problem. Bulldozer in the core design of each two cores constitute a separate unit (called "core module", two physical cores are integrated into a module), such as dual-core processors directly need to integrate a module on it, even eight The nuclear processor also only needs to integrate four modules, which makes product production and cost control more flexible. The two cores of the processor core module each have a level 1 cache, but the shared level 2 cache and prefetch, decoding unit, all The "core modules" share 8MB of L3 cache and the North Bridge module.
So how to define the number of AMD processor cores? The new architecture of the future quad-core processors refers to four such modules, or four computing cores? AMD's reply to this is: "Every bulldozer module that has a double-integer core is treated as a separate unit." Obviously, AMD can avoid the simple core number problem here, and more emphasis on two or two. The composition of the organic whole, so in the face of bulldozer architecture processor when we can say it is a four-core, eight-core, it can be said to be a two-module, four-module. Only in the future AMD may not have odd core processors.
Efficient cluster multithreading architecture
Another new element in the Bulldozer architecture is the use of cluster-based multithreading. Bulldozer's kernel module is a processing component that can run two threads at the same time. Two cores can execute two threads that do not interfere with each other at all. It is somewhat similar to Intel's dual-core processor's hyper-threading odd number.
Although dual-core, multi-threading, and Bulldozer are the same in terms of parallel execution of threads, the partitioning of the kernel is quite different. Multi-threading is a technology that runs multiple worker threads at the same time in a single processing core, and is different from the multi-processor technology of CMP chips. The latter is to increase the processing capacity of the system by integrating multiple processing cores. Now, it is mostly mainstream. The processor is using CMP technology, and the "hyper-threading technology" of processors such as the Pentium 4 and Core i7 is odd multi-threaded, while the Bulldozer is based on the clustered multi-threaded architecture, Cluster-Based Multi-threading : CMT, also known as multi-cluster multithreading technology.
In Intel's Hyper-Threading scheme, Hyper-Threading is used to duplicate the state of the processor architecture. There is no additional set of hardware execution units in the core to handle multi-threading. It only increases the data stored in the processor threads. The number of units, when the hardware execution unit is idle, sends this data to it while increasing processor utilization. This design has certain disadvantages. For example, it uses only one instruction window to handle the scheduling, execution, and retirement of two threads. This is not efficient. This is like having only one management dispatcher on the production line. It is difficult for one person to handle two tasks at the same time. In this case, production line failure sometimes occurs, and the performance of the processor significantly decreases when the processor encounters this condition.
Compared to traditional hyper-threading or dual-core technology, Bulldozer's philosophy of designing a clustered architecture is to make dual-core modules more efficient in multi-threaded operations. Bulldozer adds extra execution units to each module. Each module has the ability to subdivide a large task into multiple parallel tasks. These production lines can be arbitrarily integrated as needed without affecting the performance of the entire assembly line. Therefore, the effectiveness of CMT technology is higher than the traditional multi-threaded solution. According to AMD, a single "bulldozer module" can achieve about 80% improvement in multithreading performance, and the number of transistors used does not seem to be more than Intel's ultra-threaded odd number, which is a very encouraging achievement.
According to the roadmap, the Bulldozer architecture will release four-core, six-core, and eight-core versions, where the quad-core certificate performance is approximately 10% to 35% higher than the similar-frequency Phenom II X4. It should be noted that CMT is not AMD's unique technology. For example, Sun and Oracle developed the Niagara/Niagara2 (Ultra-SPARC T1/UltraSPARC T2) server processor. In particular, Niagara2 adopts a similar thread design as Bulldozer.
Stronger floating-point and integer computing performance
One of the big reasons why Bulldozer's clustered multithreading architecture can achieve such high performance is that AMD has increased the CPU's operating unit - two threads in each module have independent integer arithmetic units, only floating Point cells are shared, which is known as having "clusters" of two certificate operations units. Bulldozer's design, which separates the two threads' integer operations, can more effectively improve the processor's performance in basic applications. Because Intel's Nehalem architecture's hyper-threading odd number is that two hardware threads share three groups of operation units, conflicts are inevitable, and Bulldozer is two hardware threads that exclusively share four integer operation units, and the performance improvement will be more obvious. According to the structure diagram of the Bulldozer execution unit published by AMD, there are 4 groups (2 units per unit) of integer execution units responsible for integer operations and load/store units responsible for geological calculations. (In K7/K8/K10, there are only 3 such units, which corresponds to the ability of K7/K8/K10 to process 3 macros per cycle.)
It should be noted that Bulldozer this dual "cluster" also allows AMD in product design has a higher flexibility, the future processor rating can be achieved by controlling the number of "clusters": For example, the Bobcat processor is Cut out the Bulldozer of an integer "cluster". In addition, each module of Bulldozer also has two 128Bit FMAC (multiplication and accumulation operation) pipelines. Zhejiang can meet the needs of Bulldozer CITIC to join the AVX instruction set extension. This instruction set extension contains a large number of 128Bit multimedia instruction sets.
Process "two jumps"
With the doubling of the number of cores, processors have also imposed higher requirements on the production process. Therefore, in recent years, semiconductor manufacturers have also adopted a variety of letter technologies to use "More Moore" to maintain the Moore's Law. In the application of processor production technology, Intel has been walking in front of AMD, Core Duo has adopted 32nm process has been almost half a year, but AMD's products are still at 45nm! This situation will change with the birth of Bulldozer, Bulldozer will use 32nm process. It is expected that the 32nm process will begin trial production in the third quarter of 2010. The disease will provide AMD with production capacity in 2011. The 28nm process will be launched in the fourth quarter of 2010, while the ultra-low-power 28nm process will be ranked in the first quarter of 2011. Will use HKMG technology. This means that AMD will complete the "two-step jump" of product production process upgrade in 2010, evolved from 45nm to 32nm and then quickly evolved again to 28nm, recovering the time that Intel had fallen.
Previously, AMD has demonstrated the wafer map of their 28nm products, and it seems that the new technology seems to have been very close to the final availability. If we can really achieve a two-step jump in technology, then the long-disturbed AMD process process problems may be greatly improved, AMD will come out of the current dilemma into a new era of development.
The interface is upgraded again
Bulldozer architecture will use the new AM3 + interface, with 941 pins (AM3 is 938-pin, AM2+/AM2 is 940-pin), unlike the current 938-pin Socket AM3 interface, the advantage is that it can support DDR3 1866 memory and advanced technology. It should be noted that AM3+ is AMD's last-generation Pin Grid Array (PGA) package. Later, it will use a contact grid array (LGA), and will use the LGA AF1 new interface when the Fusion Fusion processor comes. How much is 1591? For compatibility, according to AMD, the AM3+ new interface processor cannot be used in the current AM3 socket hosting, but now the AM3 interface processor can be used in the future AM3+ new socket motherboard. In fact, AMD initially considered allowing the Bulldozer to use the AM3 interface, but then realized that a choice must be made to continue to provide AM3 and lose some of the new features of the new architecture, or upgrade the interface to bring better performance and functionality? Ultimately, AMD chose the latter for long-term benefits.
The Bulldozer processor will be used first for the server. The first chips are expected to be server processors code-named "Interlagos," with between 12 and 16 cores, expected to be released next year. For the desktop server market, Bulldozer will have three versions of 4/6/8 cores, with a three-level cache capacity of 8MB, support for DDR3 1866, and the first Zambbezi desktop processor with Bulldozer architecture will become AMD. The core of a generation of high-end desktop platform Scorpius.
Conclusion:
Should AMD's future trust be shouldered, could Bulldozer have the ability to "pull" Intel to the throne? This still needs time to prove. Taking into account Bulldozer this architecture itself is optimized for server applications, Bulldozer may bring a strong shock wave, but in the mainstream desktop area, Bulldozer architecture to defeat Intel's Core i is a long way to go. However, if the Bulldozer kernel is combined with the GPU to form a new generation of Fusion processors, its market competitiveness should be further improved, because Intel's GPU disadvantage is much greater than AMD's CPU disadvantages, in addition Bulldozer is floating The weakness in point processing can still be compensated by the powerful GPU part. In short, the transition to the Bulldozer architecture means that AMD's trend toward diversification, but also marks the transition from the multi-core era to system-level integration.
Link:
YIKESHU
store:YIKESHU shop