AMD Zen: Ryzen Revisited (Post-CES)

It has been almost three weeks. I intended to write a follow-up article about AMD's Zen architecture once CES 2017 had finished. My original article has been read over 2,000 times (600+ times within the first 24 hours) which is pretty amazing for a website that's just starting up. This article will be slightly more in-depth as I not only want to discuss Zen-related things, but also I'd like to address a few concerns over the previous article.

Let's Address the Concerns

It seems the largest concern of the article was relating to the use of a Cinebench R15 score from a highly-overclocked FX-8350 to obtain theoretical results for an AMD Ryzen processor. I want to make it absolutely clear here that the reason behind using such a result, was to gather a performance-per-clock metric. It was never to imply that Zen will overclock to this level. Nor was it to imply a reputable estimate of Zen's performance in this benchmark. Cinebench scales very well with core count and clock frequency, which is the reason it was used to achieve this metric.

This brings me onto the next concern — the supposedly leaked Cinebench R15 result from which this metric was obtained. People have expressed their concerns that it might in actual fact be a result produced by the Sandy Bridge-based Intel Xeon E5-2660. You can check the Cinebench R15 scores for the Xeon over at HWBOT and you can definitely see that it fits in with the claim. Whether the leaked result is legitimate or not, I do want to express that I knew beforehand that it could be a false result, but I decided to go ahead and use it for the article with an open mind. However, the main reason it was used, was because it fitted in very well with the FX-8350 result mentioned above, irrespective of the system cooling. Again, it was used to produce a performance-per-clock metric, and nothing more.

The final concern was regarding my calculations used to achieve theoretical power draws for Ryzen. The graph from the previous article has now been fixed. Thank you for bringing that to my attention, and I apologize for the confusion.

With that out of the way, we can move on to the good stuff.

AMD at CES 2017: Overview

AMD was present at this year's CES event, so let's go through what was (and wasn't) shown.

On the processor side of things, we did have more octa-core Ryzen demos on display. In fact, the exact same systems from AMD's New Horizon live event in December, returned. Sixteen AM4-based motherboards were also announced and displayed in nicely-presented systems; all of which were powered on and free to explore. I'm not going to go into details on every single board individually, but I will summarize what was shown a little later. Things were just as exciting on the graphics front as GCN's latest iteration, dubbed Vega, was a major player at CES.

Now, on to what wasn't shown. I'd be lying if I said I wasn't disappointed that there were no overclock demonstrations for Ryzen. But then, there were no XFR demonstrations either. Perhaps this is something AMD wants to keep under its sleeve for a little later in the quarter.

There's just one quick note which applies to all demonstrations below (and is why I'm putting it here), and that is that all Ryzen-based systems used for the demonstrations are running on validation-stage motherboards with reference designs. Retail motherboards will be much sharper.

AMD at CES 2017: Ryzen and Battlefield 1

CES 2017 saw the return of the two demonstration systems that AMD used back in December at its New Horizon event. Like before, they were both equipped with NVIDIA GeForce GTX TITAN X graphics cards based on the Pascal architecture; one system was sporting an octa-core Ryzen processor clocked at 3.40 GHz, while the other utilized an octa-core Intel Core i7-6900K. The turbo features of Ryzen were disabled while Turbo Boost for Intel's processor was enabled, just as it would be out of the box. The graphical settings were turned all the way up to ultra, with a resolution of 3840 × 2160 (4K/2160p). This is identical to the setup used in December.

The decision to use a top-of-the-line graphics card here is an obvious one. No other graphics card on the market, save for the GTX 1080 (and 1080 Ti when released), can push 60 frames/sec comfortably and consistently in this game, on these settings. Therefore, the use of the TITAN ensures any bottlenecks would derive from the processor, and not the graphics card. Battlefield is a good game to demonstrate multi-threaded performance of high-core-count chips such as these because its Frostbite 3 engine can utilize them all very efficiently.

The resulting performance of both systems is what you would expect. Both systems were pushing up to the 60 frames/sec limit set on the configurations. It's easy to nitpick here, but the reality is that while both systems were fluctuating in frame rates as the game world changed, the AMD Ryzen-based system did seem to offer a somewhat negligible 2 frames/sec more on average, versus the Intel system.

Differences in the game world can easily be accountable for this as there was no linear identical benchmark, so I'm going to gloss over it. It's also hard to tell just how much of a help the additional 100 MHz of the i7-6900K's Turbo Boost really was with a frame limiter present. Needless to say, from what was shown, neither system was a poor performer here.

AMD at CES 2017: Ryzen and Live-Streaming Dota 2

Another fairly impressive demonstration showcased at CES was focused on live-streaming gameplay of Dota 2 to Twitch. Just as the Battlefield 1 demonstration, both systems utilized the exact same components at the highest possible settings, with no optimizations for either system. The decision to render via the processor rather than the graphics card was made because processor encoding gives better results with less loss of quality, and as a result, is more time-consuming. Regardless, neither processor should have choked here, and certainly, neither did.

What I can say, however, is that the Ryzen processor provided noticably less frame dropping and virtually no delay, when compared to the i7-6900K system. AMD states that this is due to Zen's "Infinity Fabric," which is the marketing term for some very interesting high-performance interconnect, present between each of the processor's cores. This interconnect allows the processor to send and receive data from each independent core at a quicker rate (100 GiB/s) than both AMD's and Intel's current implementations.

Furthermore, this very interconnect will make use of some open-source standards, and will be present in all Zen-based processors, APUs and GCN-based graphics cards, including Summit Ridge, Raven Ridge and Vega. This allows me to transition smoothly into the next topic — Vega!

AMD at CES 2017: Vega and Doom [2016] (Featuring Ryzen)

In addition to AMD's brand new processor architecture, we also saw the company's latest graphics architecture on show at CES this year too. The demonstration saw the pairing of an octa-core Ryzen processor, with an unknown model of Vega graphics card, playing Bethesda's latest Doom game, released last year. As with the Battlefield 1 demonstration, the graphical settings were set to ultra, with a resolution of 3840 × 2160 (4K/2160p). Frame rates were comfortably above 60 frames/sec; I recorded an average frame rate of 70 frames/sec, and a maximum of 82 frames/sec.

On the architectural side of Vega, it's a signficiant step up from Polaris and prior generations of GCN. There's an improved and scalable memory architecture, next-generation compute units (NCUs) with support for 8-bit and 16-bit floating-point instructions, a new pixel engine, and a redefined geometry pipeline.

The aim with Vega is to execute more instructions per clock cycle — up to four times as many as Polaris, in fact. It will achieve this by maximizing the throughput of each 16-bit-wide compute unit (CU). Currently, each Polaris CU requires four clock cycles to complete an instruction, whether it be 4 bits, 8 bits or 16 bits in width. Vega's more sophisticated NCUs can group together multiple instructions to improve throughput. That is, one 16-bit NCU can now execute up to four 4-bit-wide instructions, or two 8-bit-wide instructions, in the same amount of time.

AMD at CES 2017: The Motherboards

Sixteen AM4-based motherboards were shown off at this year's CES. This shows that there's genuine interest in the platform from AMD's motherboard partners. Of the five vendors encompassing this total, ASUS had the least options on show with only a single board. In contrast, the likes of ASRock, Gigabyte and MSI all had four a piece. ASUS's somewhat small showing at the event will be alleviated once Ryzen hits retail, as AMD has projected around fifty different motherboard models to become available in time for the release. Here are the sixteen boards which were displayed at the event:

  • ASRock A320M Pro4
  • ASRock AB350 Gaming K4
  • ASRock X370 Gaming K4
  • ASRock X370 Taichi
  • ASUS B350M-C
  • Biostar X350GT3
  • Biostar X350GT5
  • Biostar X370GT7
  • Gigabyte GA-A320M-HD3
  • Gigabyte GA-AB350-Gaming 3
  • Gigabyte GA-AX370-Gaming 5
  • Gigabyte GA-AX370-Gaming K5
  • MSI A320M Pro-VD
  • MSI B350 Tomahawk
  • MSI B350M Mortar
  • MSI X370 Xpower Gaming Titanium

AMD at CES 2017: The Chipsets and Chipset I/O

AMD also provided details for the chipsets that will accompany the AM4 socket to make up the consolidated platform that AMD has touted for a while now. There are six chipsets, spread out across four tiers — small-form-factor, essential, mainstream, and enthusiast.

Small-form-factor chipsets will be supplied with motherboards using smaller standards such as mini-ITX and microATX. This tier comprises of three chipsets — A300, B300 and X300. Only the X300 chipset will provide the ability to overclock and multiple graphics card support. All small-form-factor chipsets omit I/O features and will instead only offer I/O provided by the processor. This means that systems with these chipsets will be limited to two SATA-III ports, and four USB 3.1 (5 Gb/s) ports.

Each of the remaining tiers cover a single chipset each. The A320 chipset will be utilized for low-end systems and is part of the essential tier. As with the A300 and B300 chipsets, the A320 chipset will also lack overclocking and dual graphics card ability. AMD is positioning this chipset to be used with dual- and quad-core Bristol Ridge APUs. All Zen-based Ryzen processors are exempt from this chipset. Next up, we have the B350 chipset covering the mainstream tier; this will arguably be the most popular choice, but ultimately that will depend on the pricing for motherboards equipped with the final chipset, the X370. This one represents the top-of-the-line enthusiast tier, and will be exclusive to Zen-based processors. Both the B350 and X370 chipsets support overclocking, but the latter will be required if you plan on having more than one graphics card in your system.

The two SATAe connections can be configured into four SATA-III ports (two per connection), or two additional PCIe 3.0 lanes. If configured as two PCIe lanes, they can be paired with two other general-purpose PCIe lanes to form a single PCIe 3.0 ×4 connection.

Here is a comparison table of all six chipsets:

Tier Chipset I/O PCIe Lanes Multi-GPU Support SATA RAID Overclocking
Enthusiast X370 2 × USB 3.1 (10 Gb/s)
6 × USB 3.1 (5 Gb/s)
6 × USB 2.0
4 × SATA-III
2 × SATAe/PCIe 3.0
8 × PCIe 2.0 Yes 0 / 1 / 10 Enabled
Mainstream B350 2 × USB 3.1 (10 Gb/s)
2 × USB 3.1 (5 Gb/s)
6 × USB 2.0
2 × SATA-III
2 × SATAe/PCIe 3.0
6 × PCIe 2.0 No 0 / 1 / 10 Enabled
Essential A320 1 × USB 3.1 (10 Gb/s)
2 × USB 3.1 (5 Gb/s)
6 × USB 2.0
2 × SATA-III
2 × SATAe/PCIe 3.0
4 × PCIe 2.0 No 0 / 1 / 10 Disabled
Small-Form-Factor X300 Yes 0 / 1 Enabled
B300 No 0 / 1 Disabled
A300 No 0 / 1 Disabled

AMD at CES 2017: Processor I/O

AMD also kindly showcased exactly what Bristol Ridge and Summit Ridge will offer in terms of I/O options. For those who may be confused by the I/O configurations, essentially AMD is offering dedicated PCIe 3.0 lanes for NVMe connections, but these lanes also double as potential SATA-III connections or general-purpose PCIe lanes.

This means that a top-end AM4-based system will have 28 PCIe lanes in total; 20 of those provided by the processor. (Note that this doesn't include the four PCIe 3.0 lanes used for communication between the processor and the chipset.) 16 lanes being provided by the processor is in-line with Intel's mainstream Skylake processors (LGA1151 [H4]), while the 8 lanes given by the chipset also aligns identically with what Intel's enthusiast platform (LGA2011-3 [R3]) offers. This is a strange mash-up of numbers. Needless to say, AMD appears to be attacking Intel with a platform that sits in-between its mainstream and enthusiast tier products.

Tier Processor Cores/Threads Cache I/O PCIe Lanes
Enthusiast Ryzen 3/5/7
Summit Ridge
4C / 4T
4C / 8T
6C / 12T
8C / 16T
2 MiB L2 + 8 MiB L3
2 MiB L2 + 8 MiB L3
3 MiB L2 + 16 MiB L3
4 MiB L2 + 16 MiB L3
4 × USB 3.1 (5 Gb/s)
2 × SATA-III + 2 × PCIe 3.0
or
2 × SATA-III + NVMe (via 2 × PCIe 3.0)
or
NVMe (via 4 × PCIe 3.0)
16 × PCIe 3.0
Mainstream
A8/A10/A12
Bristol Ridge
4C / 4T 2 MiB L2 4 × USB 3.1 (5 Gb/s)
2 × SATA-III + 2 × PCIe 3.0
or
2 × SATA-III + NVMe (via 2 × PCIe 3.0)
or
NVMe (via 4 × PCIe 3.0)
8 × PCIe 3.0
Athlon X4
Bristol Ridge
Essential
A6
Bristol Ridge
2C / 2T 1 MiB L2 4 × USB 3.1 (5 Gb/s)
2 × SATA-III + 2 × PCIe 3.0
or
2 × SATA-III + NVMe (via 2 × PCIe 3.0)
or
NVMe (via 4 × PCIe 3.0)
8 × PCIe 3.0

A Note on Processor Coolers

Can you use your current processor cooler for AM4 motherboards? That will depend on what cooler you have. AMD is working with its partners to ensure that as many pre-existing coolers are compatible with the new platform as possible. That being said, the company did unveil several models which will be compatible upon Ryzen's release:

  • AMD Wraith (of course!)
  • Corsair H60
  • Corsair H100i
  • Corsair H110i
  • EK Water Blocks (EKWB) will have AM4 products available for customized watercooling solutions
  • Noctua NH-D15

The State of Overclocking and Multi-GPU Systems

While Intel has done its best to create market segmentation with special K-suffix models to denote unlocked multipliers for overclocking, AMD has apparently stuck to its original strategy from the FX processors. This is something I had predicted from the previous article. So, that's right. Every single Ryzen processor features an unlocked multiplier for very easy overclocking. This is excellent news.

However, as the chipset table above shows, you'll need to make sure you purchase a motherboard with an overclocking-capable chipset; namely, X300, B350 or X370. This actually makes more sense than Intel's methodology because you won't need to worry about buying a specific processor, and additional performance will always be there, if you ever need it. As I stated earlier, I expect the B350 chipset to be the most prevalent consumer choice, if not the X370, and you're less likely to upgrade your motherboard versus your processor, considering the socket compatibility between all of AMD's future processors. Therefore, I have no problem with AMD's decisions here.

To those of you who will be wanting more than one graphics card in your system, you'll definitely be on the lookout for X300- or X370-equipped motherboards. With that said, it appears that AMD has seen NVIDIA's recent approach to multiple GPU configurations, and has decided that they won't go down the same route. In an interview with PC World (27:34), AMD's Raja Koduri explains that the PC platform is all about choice and flexibility; there will be consumers who absolutely want the best of the best, and for this reason, AMD will continue to support 3-way CrossFireX, in addition to 2-way SLI with the AM4 platform.

AMD vs. Intel: Instructions-per-Clock Comparison

One of the long-standing questions on people's minds has been about where the Zen architecture will sit in regards to its IPC versus past and present architectures of both AMD and Intel. Suffice to say, I'm fairly confident with my calculations, and I believe I have Zen's comparable instruction throughput fairly accurate.

All data below was calculated using Cinebench R10, R11.5 and R15 single-thread benchmark scores in conjunction with UserBenchmark single-core integer and floating-point performance scores. Core clock frequencies were set to 3.00 GHz, and Hyper-Threading technology was untouched on the Intel processors. Please note that the following results represent a best-case scenario, suitable for most typical workloads. Instruction-specific workloads, most noticeably for AVX2, will be considerably lower on IPC, versus the Intel options.

Worth a mention, are AMD's Husky (Llano) and K10 architectures, and Intel's Penryn architecture. In the table below, they would sit at 122.9%, 123.4% and 123.7%, respectively. They were omitted because I'm not as confident on those figures, but there are the figures that I have if you're curious.

Architecture Baseline(s) Relative IPC Increase
vs. AMD Bulldozer vs. Previous Architecture
AMD Bulldozer FX-8150 100.0%
AMD Piledriver A10-6790K
FX-8350
108.6% + 8.6%
AMD Steamroller A10-7870K 115.0% + 5.9%
AMD Excavator Athlon X4 845 126.4% + 9.9%
Intel Nehalem i7-975 XE 137.8%
Intel Westmere i7-980X XE 137.8% + 0.0%
Intel Sandy Bridge i7-2600K
i7-3930K
164.1% + 19.1%
Intel Ivy Bridge i7-3770K 170.2% + 3.7%
Intel Haswell i7-4770K 186.4% + 9.5%
Intel Broadwell i7-5775C 195.7% + 5.0%
AMD Zen Ryzen SR7 8C/16T 195.9% + ~ 55.0%
Intel Skylake i7-6700K 200.4% + 4.0%
Intel Kaby Lake i7-7700K 200.4% + 0.0%

Reevaluating AMD Ryzen's Performance with Cinebench R15

Cinebench R15 Multi-Threaded Comparison

Using the table above, we can revisit the Cinebench R15 performance comparison from the previous article. This time, we're working with trusted real-world figures; figures that were obtained from the very same benchmark.

With that said, it entirely depends on how you want to interpret the IPC table above. You can draw several conclusions of Zen's performance, depending on which baseline you use. I'll provide the Cinebench R15 multi-threaded results for all tested processors to the right for comparison. The calculations will also be kept compact as I'm aware that this article is already getting to be pretty long.

Baseline 1: AMD FX-8150 — 95.9% IPC Improvement Over Bulldozer (Including SMT)

/* (8C/16T @ 3.90 GHz) */
552 cb × 1.959 = 1081 cb

Baseline 2: AMD FX-8350 — Canard PC's 35% Thread Improvement (1 × Piledriver Module vs. 1 × Zen Core)

/* (8C/16T @ 4.10 GHz) */
640 cb ÷ 4 modules = 160 cb per Piledriver module
160 cb × 1.35 = 216 cb per Zen core
216 cb × 8 = 1728 cb

Baseline 3: AMD FX-8350 — Canard PC's 60.4% Performance Improvement (Computational Workloads Graph)

/* (8C/16T @ 4.10 GHz) */
640 cb ÷ 4 modules = 160 cb per Piledriver module
160 cb × 1.604 = 257 cb per Zen core
257 cb × 8 = 2056 cb

Baseline 4: AMD Athlon X4 845 — 55% IPC Improvement Over Excavator (Including SMT)

/* (8C/16T @ 3.60 GHz) */
(314 cb × 2 = 628 cb) × 1.55 = 973 cb

Baseline 5: AMD A12-9800 — 55% IPC Improvement Over Excavator (Including SMT)

/* (8C/16T @ 4.10 GHz) */
(334 cb × 2 = 668 cb) × 1.55 = 1035 cb

Baseline 6: Mysterious Cinebench R15 Multi-Thread Result of 1188

/* (8C/16T @ 3.15 GHz) */
1188 cb

I should note that baseline 6, the mysterious Cinebench score of 1188, is assumed to be running without AMD Turbo Core and XFR active. This is because it's believed to be an A0 stepping part, if legitimate. All known A0 stepping parts have had their turbo states disabled. Now, let's normalize these results at 3.30 GHz, 3.50 GHz, and 3.70 GHz. These frequencies represent Ryzen's likely all-core turbo states.

Normalizing the Scores: Baseline 1

1081 cb × (3300 MHz ÷ 3900 MHz) =  915 cb /* (8C/16T @ 3.30 GHz) */
1081 cb × (3500 MHz ÷ 3900 MHz) =  970 cb /* (8C/16T @ 3.50 GHz) */
1081 cb × (3700 MHz ÷ 3900 MHz) = 1026 cb /* (8C/16T @ 3.70 GHz) */

Normalizing the Scores: Baseline 2

1728 cb × (3300 MHz ÷ 4100 MHz) = 1391 cb /* (8C/16T @ 3.30 GHz) */
1728 cb × (3500 MHz ÷ 4100 MHz) = 1475 cb /* (8C/16T @ 3.50 GHz) */
1728 cb × (3700 MHz ÷ 4100 MHz) = 1559 cb /* (8C/16T @ 3.70 GHz) */

Normalizing the Scores: Baseline 3

2056 cb × (3300 MHz ÷ 4100 MHz) = 1655 cb /* (8C/16T @ 3.30 GHz) */
2056 cb × (3500 MHz ÷ 4100 MHz) = 1755 cb /* (8C/16T @ 3.50 GHz) */
2056 cb × (3700 MHz ÷ 4100 MHz) = 1855 cb /* (8C/16T @ 3.70 GHz) */

Normalizing the Scores: Baseline 4

973 cb × (3300 MHz ÷ 3600 MHz) =  892 cb /* (8C/16T @ 3.30 GHz) */
973 cb × (3500 MHz ÷ 3600 MHz) =  946 cb /* (8C/16T @ 3.50 GHz) */
973 cb × (3700 MHz ÷ 3600 MHz) = 1000 cb /* (8C/16T @ 3.70 GHz) */

Normalizing the Scores: Baseline 5

1035 cb × (3300 MHz ÷ 4100 MHz) = 833 cb /* (8C/16T @ 3.30 GHz) */
1035 cb × (3500 MHz ÷ 4100 MHz) = 884 cb /* (8C/16T @ 3.50 GHz) */
1035 cb × (3700 MHz ÷ 4100 MHz) = 934 cb /* (8C/16T @ 3.70 GHz) */

Normalizing the Scores: Baseline 6

1188 cb × (3300 MHz ÷ 3150 MHz) = 1245 cb /* (8C/16T @ 3.30 GHz) */
1188 cb × (3500 MHz ÷ 3150 MHz) = 1320 cb /* (8C/16T @ 3.50 GHz) */
1188 cb × (3700 MHz ÷ 3150 MHz) = 1395 cb /* (8C/16T @ 3.70 GHz) */

Cinebench R15 Conclusions v2

Regardless of whether that Cinebench result leak holds legitimacy or not, there's a clear pattern here. Ryzen is shaping up to sit in-between Ivy Bridge and Haswell (at worst), and in-between Haswell and Broadwell (at best). This is pretty much where I expect it to sit, depending on AVX2 usage. Cinebench has some shady history with Intel's compiler, but R15 is to be considered the most balanced version of the benchmark yet. Intel's compiler is also the most favorable for AMD when it's utilized properly. That's why it remains the most common compiler, despite what happened years ago. The issue isn't with the compiler, but rather software developers; therefore, stick to benchmarks that are understood, widespread with growing result databases, and benchmarks which are as transparent as possible.

Cinebench R15 also doesn't take advantage of any instruction set extension beyond SSE2. There's no FMA3, FMA4, AVX1 or AVX2 dependency. Therefore, there's absolutely no reason why Zen should be underperforming here, and also no reason for unnecessary "Intel has the advantage because of AVX2" remarks.

What I want to finish the benchmarking on, is that despite the widespread distaste for the supposed Cinebench leak, it actually appears to give better end results for Zen than the validated real-world scores that four of the baselines above are based on. Baseline 2 looks to be the most accurate, given how Zen performs in other workloads, but only time will tell. Lastly, a final word on baseline 3; though exciting, it's merely an anomaly. Consider where the deca-core i7-6950X Extreme Edition sits in that graph; Zen isn't reaching it out of the box. That's a given.

Of course, I hope for better end results like everyone else, but even at this level, I'm content.

When Is AMD Launching Ryzen?

That's a good question. Previous claims of a CES launch were very much incorrect, as I thought they would be. The rumored January 17 was the same story.

Now we have another date to add to the speculation list; albeit this one has some foundation to it. On February 27 in San Francisco, California, the annual Game Developers Conference (GDC) week starts. AMD is expected to debut Ryzen on February 28 at the event. This somewhat makes sense as AMD is very active in the gaming industry; in fact, you could say that this is AMD's most successful market. Backing this, are all of the demonstrations shown at both the New Horizon event in December, and at this year's CES event.