My Conclusions on AMD's Zen Architecture, Based On Canard PC's Review

I've been reading the AMD Zen architecture thread over at SemiAccurate for the past eight or so months now. While I generally enjoy reading the opinions of others on such a topic, often there will be times where important information is seemingly left out of any sources that are brought up by users participating in the thread. I would happily interact with the community, but forum registrations are disabled.

Because of this, and the fact that so many people are voicing their conclusions from issue 31 of Canard PC's printed publication, I felt the need to do the same.

Notes Concerning the Tested AM4 Platform and Ryzen Sample

As per the publication, there are some important notes that need to be taken into consideration with these results:

  • The sample used to accertain these performance metrics, was an engineering sample with stepping A0, on a platform with unfinalized features. The sample part number is 2D3151A2M88E4.
  • The A0 stepping contains two IPC-inhibiting bugs; one relating to the micro-ops cache, and one relating to the SMT implementation.
  • Despite these present bugs, the features were enabled. Evidently, this is going to hinder performance by an unknown percentage.
  • The AM4 platform currently has issues detecting SSDs and also failed to detect an NVIDIA GeForce GTX 1080.
  • The Ryzen sample's memory controller suffers from instabilities with system memory at data rates above DDR4-2400, despite the fact that this specific sample supported DDR4-2666.

AMD AM4 (Ryzen SR7-x) vs. AMD AM3+ (FX-8370) and Intel LGA2011-3 (i7-5960X/i7-6900K)

The following tables compare this specific AMD Ryzen sample with AMD's previous high-performance architecture (Piledriver), and Intel's Broadwell architecture.

Canard PC's AMD Ryzen SR7-x Sample
Cores and Threads
  • 8 cores (16 threads)
Core Clock Frequency
  • 3.15 GHz
Turbo Core Clock Frequencies
  • 3.50 GHz (1 – 3 cores)
  • 3.40 GHz (4 – 7 cores)
  • 3.30 GHz (8 cores)
Low-Power Clock Frequency
  • 550 MHz
Cache Configuration
  • 8 × 64 kiB L1 instruction
  • 8 × 32 kiB L1 data
  • 8 × 512 kiB L2
  • 2 × 8 MiB L3
System Memory Configuration
  • Up to dual-channel DDR4-2400
  • Up to 38.4 GiB/s
Introductory Price
  • ~ $499 USD
AMD FX-8370
Cores and Threads
  • 8 cores (8 threads)
Core Clock Frequency
  • 4.00 GHz
Turbo Core Clock Frequencies
  • 4.30 GHz (1 – 4 cores)
  • 4.10 GHz (5 – 8 cores)
Low-Power Clock Frequency
  • 1.40 GHz
Cache Configuration
  • 4 × 64 kiB L1 instruction
  • 8 × 16 kiB L1 data
  • 4 × 2 MiB L2
  • 8 MiB L3
System Memory Configuration
  • Up to dual-channel DDR3-1866
  • Up to 29.856 GiB/s
Introductory Price
  • $199 USD
Intel Core i7-6900K
Cores and Threads
  • 8 cores (16 threads)
Core Clock Frequency
  • 3.20 GHz
Turbo Core Clock Frequencies
  • 3.70 GHz (1 – 2 cores)
  • 3.50 GHz (3 – 8 cores)
Low-Power Clock Frequency
  • 1.20 GHz
Cache Configuration
  • 8 × 32 kiB L1 instruction
  • 8 × 32 kiB L1 data
  • 8 × 256 kiB L2
  • 20 MiB L3
System Memory Configuration
  • Up to quad-channel DDR4-2400
  • Up to 76.8 GiB/s
Introductory Price
  • $1,089 USD
Intel Core i7-5960X Extreme Edition
Cores and Threads
  • 8 cores (16 threads)
Core Clock Frequency
  • 3.00 GHz
Turbo Core Clock Frequencies
  • 3.50 GHz (1 – 2 cores)
  • 3.30 GHz (3 – 8 cores)
Low-Power Clock Frequency
  • 1.20 GHz
Cache Configuration
  • 8 × 32 kiB L1 instruction
  • 8 × 32 kiB L1 data
  • 8 × 256 kiB L2
  • 20 MiB L3
System Memory Configuration
  • Up to quad-channel DDR4-2133
  • Up to 68.256 GiB/s
Introductory Price
  • $999 USD
Intel Core i7-6850K
Cores and Threads
  • 6 cores (12 threads)
Core Clock Frequency
  • 3.60 GHz
Turbo Core Clock Frequencies
  • 3.80 GHz (1 – 2 cores)
  • 3.70 GHz (3 – 6 cores)
Low-Power Clock Frequency
  • 1.20 GHz
Cache Configuration
  • 6 × 32 kiB L1 instruction
  • 6 × 32 kiB L1 data
  • 6 × 256 kiB L2
  • 15 MiB L3
System Memory Configuration
  • Up to quad-channel DDR4-2400
  • Up to 76.8 GiB/s
Introductory Price
  • $617 USD

The Testing Conditions

The article explains exactly what the conditions were for testing:

Canard PC's Test Configuration
  • AMD Ryzen SR7-x ES (specs)
  • 2D3151A2M88E4
System Memory Configuration
  • 16 GiB dual-channel DDR4-2400
  • Up to 38.4 GiB/s
  • 3 TB Seagate Barracuda 7200.14
  • 7200 rpm
Graphics Card
  • AMD Radeon Fury X
  • 4096–256–64 configuration
  • 1050 MHz core clock frequency
  • 1000 MT/s memory clock frequency
  • 512 GiB/s memory bandwidth
Performance Baseline
  • Intel Core i5-6600K (specs)
Canard PC's Software Suite
Computational Performance
  • HandBrake encoding H.264 @ 1080p
  • HandBrake encoding H.265 @ 4K
  • WPrime
  • PovRay 3.7
  • Blender 3D
  • 3DSMax 2015 / Mental Ray
  • Corona Benchmark
Video Game Performance
  • Far Cry 4
  • GRID: AutoSport
  • Battlefield 4
  • Arma III
  • X³: Terran Conflict
  • The Witcher 3: Wild Hunt
  • Anno 2070
Power Consumption
  • The measurement was taken with an ampere clamp on the ATX 12 V connector, while the system was at full load.

Performance Evaluation: Computational Workloads

Performance Evaluation: Computational Workloads

At first glance, this is by far Ryzen's best showing, and most definitely puts AMD back within touching distance of Intel; something that hasn't happened for quite a while. The Ryzen sample is trailing the i7-6900K by 14.6%, inclusive of its multiple deficits (frequency, cache and A0 bugs). If we adjust Ryzen's clock frequency up to the all-core turbo frequency of the i7-6900K, that gap is reduced by almost half to 8%.

Given the close proximity of the i7-6800K with its 6 cores at an almost identical all-core turbo frequency to Ryzen, it's evident here that the bugs present in AMD's silicon are hindering performance somewhat.

The Piledriver-based FX-8370 is handily beaten by AMD's new offering, by 60.4%. Let's assume that Zen can overclock to the FX-8370's all-core turbo frequency of 4.10 GHz (also shared by the FX-8350), and in a clock-for-clock comparison, Zen is outpacing Piledriver by a collosal 99.2%, while it silently becomes the fastest chip on the graph — a lead on the i7-6900K by 8.4%!

For the more pessimistic among you, you can underclock Piledriver to Ryzen's all-core turbo of 3.30 GHz and obtain the same result.

This performance-per-clock improvement can be replicated if we take the recently leaked Cinebench R15 multi-threaded score of 1188, and we assume it's a legitimate result of an octa-core AMD Ryzen sample at 3.30 GHz on all cores. Now let's put it against the highest Cinebench R15 multi-threaded score for AMD's FX-8350:

/* FX-8350 baseline */
PD_Score = 1196 cb
PD_MHz   = 7649 MHz

/* FX-8350 performance-per-clock */
PD_Score ÷ PD_MHz = 0.156
PD_PPC = 0.156 cb/MHz

/* Zen performance-per-clock gain over FX-8350 */
ZN_GAIN = 1.992
PD_PPC × ZN_GAIN = 0.311
ZN_PPC = 0.311 cb/MHz

/* Zen Cinebench R15 score @ 3300 MHz */
(PD_MHz ÷ ZN_GAIN) × ZN_PPC = 1194
ZN_3300 = 1194 cb

/* Zen Cinebench R15 score @ 4300 MHz */
4300 ÷ 3300 = 1.303
ZN_3300 × 1.303 = 1556
ZN_4300 = 1556 cb

/* Zen Cinebench R15 score @ 5000 MHz */
5000 ÷ 3300 = 1.515
ZN_3300 × 1.515 = 1809
ZN_5000 = 1809 cb

/* Zen Cinebench R15 score @ 7649 MHz */
7649 ÷ 3300 = 2.318
ZN_3300 × 2.318 = 2768
ZN_7649 = 2768 cb

Performance Evaluation: Video Games

Performance Evaluation: Video Games

Now on to a complete contrast. The octa-core Ryzen chip for gaming is suffering severely with its low clock frequencies. This generally applies to all high-core-count processors, including Intel's Haswell-E and Broadwell-E models, as can clearly be seen by the i7-6700K and i7-4790K taking the two top positions in the graph; each sustaining speeds north of 4.00 GHz.

With that said, the i7-6900K manages a 10.4% lead over Ryzen, although I'm fairly sure that this is caused simply by Broadwell's higher clock frequencies and its 4 MiB of extra cache. The results create a steep slope in the graph, which indicates that SMT (including Intel's Hyper-Threading technology) is almost entirely useless, and core counts greater than four go, for the most part, unused. Instead, core frequency is of greater importance.

This is reflected further if we take a look at the games included in this analysis. Out of the seven games included in the test, only Battlefield 4 and The Witcher 3: Wild Hunt are optimized to take advantage of more than four cores. In short, this specific game selection was somewhat geared too much towards quad-core processors to draw any real conclusions on Ryzen's (or Broadwell's) multi-threaded performance.

I do want to stress, however, that these results don't show any weakness in Zen's design whatsoever. Skylake's Core i5 processors are more than capable chips for playing even the most demanding games as of 2016, so anything beyond that is a bonus.

Perhaps worth a mention here is how well the top-level Bristol Ridge APU performs in comparison to the FX-8370.

Lastly, for reference, Ryzen is a clear 32.2% ahead of its predecessor, even with its severe clock deficit. Fixing that, we arrive at 64.3%, which may even be a challenger here for the i7-6700K. In fact, clocking the Ryzen processor at 4.00 GHz to match that of the i7-6700K (with all cores active), we can see that Zen is just a smudge over 0.2% behind the Skylake processor in IPC (albeit, it does need twice the amount of cache to do so).

Power Consumption Evaluation

Power Consumption Evaluation

Samsung's/GlobalFoundries's 14 nm LPP process is really shining through in this graph. At full system load, the Ryzen sample consumes just 93 watts of power. Intel's i7-6900K is 3.1% less efficient at 96 watts, but this is most certainly down to the additional frequency and larger cache. This is in complete contrast to AMD's older Piledriver chip, which consumes 118 watts; a 26.9% higher power consumption versus Ryzen (albeit the transistors are twice the size of Zen's).

What I find particularly interesting, is that a Ryzen chip clocked to FX-8370 frequencies can consume almost the same amount of power, but the performance-per-watt metric climbs by 133.8%.

The Percentage Error of Canard PC's Article

There is an error within the article that I wanted to point out.

  • It mentions a 35% performance gain over the FX-8370 in the computational workloads, and then states that it's in-line with what AMD has been touting, but that is incorrect. Instead, AMD has been consistent in a 40% IPC increase over Excavator, and this figure specifically compares a Zen core to an Excavator core (and not a module as some have incorrectly stated).
  • Speaking of that, I would actually like to know where that 35% comes from. I haven't been able to reproduce it.

On a side note of the gaming performance software suite, I would have liked to have seen some more recent titles in the mix; specifically titles that can take advantage of the enthusiast processors. Two examples that come to mind are Grand Theft Auto V, and Battlefield 1. Especially since the former is present in the software suite used to gather Kaby Lake performance metrics.

My Final Conclusions of AMD Ryzen — Performance

Given the severe handicap placed on the Ryzen chip featured here, I feel that it performs very well. Performance-per-clock is dangerously close to Broadwell and Skylake, and performance-per-watt is also in-line with Intel's offerings. Where Zen will fall flat, is in software which can take advantage of AVX2 instruction set extensions. This is a hardware limitation as each Zen core only contains two 128-bit AVX registers. They can fuse together to perform 256-bit AVX2 instructions, but it's not quite as efficient as dedicated AVX2 registers, which is what Intel has had since Haswell. (For reference, when AVX2 instructions can be used, Haswell can be up to twice as fast as Ivy Bridge.)

With that being said, Intel has somewhat shot itself in the foot with these instructions as it has consistently denied to implement them into its lower-end parts, and therefore the adoption rate has been very slow. (Something Intel is known for, is product segmentation via instruction set extension support.) The list of software supporting AVX2 instructions is small, even three years after their introduction, and the number of programs relying on them is even smaller (I guess Intel's product segmentation has its upsides). Certainly, video games are one common example that don't care for these instructions.

So, what kind of performance can you expect from Zen? It entirely depends on the workload as I'm sure you understand, but the general consesus is that it should be between the i7-6850K and i7-5960X in most scenarios, and between the i7-5960X and i7-6900K in best-case scenarios. Not too shabby at all.

My Final Conclusions of AMD Ryzen — Frequency Scaling

A new feature of Zen that I'm particularly looking forward to, is XFR, but I will state that 5.00 GHz all-core-active overclocks on air cooling, are out of the question. From what I've seen, Zen clocks really well and achieves better clocking than I originally anticipated, but it's still by no means a high-frequency design. As mentioned previously, achieving FX-8370 clock frequencies of around 4.10 to 4.30 GHz should be possible, with a decent aftermarket watercooling setup.

Now, where does that leave quad-core SKUs in terms of stock core clocks and turbo clocks? It's an interesting question. We know from the New Horizon event that octa-core SKUs will be debuting at 3.40 GHz, but there was no mention of quad-core models.

Depending on how yields go, we could see quad-core variants feature identical clock frequencies to their octa-core siblings. If this is the case, Intel will definitely have the edge on AMD. Alternatively, it seems more likely that, as with Intel's architectures, Zen can (and will) feature quad-core SKUs with much better frequencies — at the very least, for single-threaded turbo. This is essential for AMD to climb the proverbial ladder to reach Intel's quad-core Core i7 single-thread performance that video games love so much.

My Final Conclusions of AMD Ryzen — Bug Fixing

AMD has already publicly revealed, as of the New Horizon live event, that stock clock frequencies for production silicon Summit Ridge SKUs, will start at 3.40 GHz. That represents a 7.9% increase over the sample used here to gather the results. Zen's memory controller will also be capable of detecting DDR4-3200 modules, which will give the final product 33% more memory bandwidth than the sample here could take advantage of.

Furthermore, a fix for the bugs present in stepping A0 will also be released once Zen officially arrives. Given that Intel's SMT implementation provides roughly a 20–30% increase in multi-threaded performance (dependant on task, of course), we can assume that AMD's implementation can provide an approximate improvement of equal proportions. It's fair to say that the sample used here isn't providing the full advantage of the technology. As the introduction to the article states, the greatest of performance increases will come from higher clock frequencies, which is also true of Intel's architectures.

My Final Conclusions of AMD Ryzen — Pricing

I couldn't write this article without a word on pricing. While I don't expect AMD to be selling octa-core Ryzen products at prices we've become accustomed to over the past four years, I fully believe that all Zen-based products will undercut Intel pricing by some degree. Expect to be paying:

  • $150–300 — quad-core Raven Ridge APUs (Ryzen RR3).
  • $200–350 — quad-core Summit Ridge CPUs (Ryzen SR3).
  • $400–500 — hexa-core Summit Ridge CPUs (Ryzen SR5).
  • $500–650 — octa-core Summit Ridge CPUs (Ryzen SR7).
  • $650–800 — octa-core Summit Ridge CPUs with the highest stock core frequencies, and also a watercooling kit (Ryzen SR9).

The AM4 platform can support processors up to 150 W, so it will be very interesting to see how the fastest SKUs clock. As with the previous FX generation, all Zen-based processors lacking integrated graphics under the Summit Ridge codename are expected to feature unlocked multipliers, meaning a Black Edition moniker would be pretty much useless as a means of separating the fastest SKUs from the rest of the product stack. It's for this reason that I anticipate an SR9 model range, which is something I feel Intel ruined the opportunity of having for its enthusiast platform (X58 [LGA1366], X79 [LGA2011] and X99 [LGA2011-3]). An Intel Core i9 would have finished off the model range rather nicely, and you would have instantly known that it was the top-of-the-line model range.

What to Watch Out for at CES 2017

I, as I'm sure many others are, am looking forward to seeing what AMD has in store for us at 2017's CES. The event is held annually in Las Vegas, Nevada at the Las Vegas Convention Center. The 2017 event will be held from Thursday, January 5 to Sunday, January 8, and it's where companies typically show off their upcoming products. AMD has historically not been too active during CES events, having only attended two previous years — 2012 with the first-generation Piledriver APUs (Trinity), and 2014 with the company's first-generation Steamroller APUs (Kaveri).

We've been seeing a lot of multi-threaded workload baselines for Zen's performance, but no real mention of any single-threaded workloads. The gaming benchmarks provided by Canard PC's review are the closest we've gotten to such a thing, and, being perfectly honest, the octa-core processors are entirely the wrong tier of processor to be showcasing the true single-thread capabilities of Zen, simply due to their lower clock frequencies.

What do I expect at CES 2017? There are several rumors of a Ryzen chip successfully being overclocked to 5.00 GHz on a single core, while the rest are inactive. That sounds great, but is this something we might see at the event? Hopefully, yes! If not at 5.00 GHz, it would be nice to see Zen overclocked to some degree. We should be nearing final silicon as we get closer to Zen's eventual retail release (likely March), and I'm sure AMD won't miss out on the chance to show us all how well the architecture overclocks, especially if the rumors hold any sort of validity.

Watch out for AMD at CES. I'll most likely write a follow-up article and provide my up-to-date thoughts on Ryzen.