One peculiar thing is that there’s no mention of Zen 4 anywhere. This is pre-camera upgrade for us so you’ll have to bear with the 1080p and choppier shots, but the content contained within is excellent.In the above video you can see the Chief Architect for Zen talk about his work on the fifth installment of AMD CPUs. Working on cleaning our rendering machine storage, we discovered the video never made it to publication, polished off the intro, and re-rendered it. The rest of the video explains L1 writethrough versus writeback cache (if curious about that distinction) and AMD’s shadow tags. What we did here, we burn no additional power for all that increased IPC. You’re doing more work, you’re switching more gates, eating more instructions, running that decoder - burns power. You push your designers, you’re gonna grow power as you push more instructions through the pipe. We talked 52%+ IPC, a rule of thumb with experienced processor architects is that you pretty much pay 1% power for 1% IPC. Those are a couple of the things: Efficient microarchitecture, allocating more power to useful work, and a bunch of other things that got all that IPC enhancement. Naffziger: “What we did on this core is we grew that logic gate percentage by 35%, so now it’s bigger than the other two overhead pieces. Naffziger continued to tie the power consumption to the efficiency and operations execution: We optimized the crap out of those things, made them really small and power efficient, and the net is that when you look at the power breakdown for the core - most processors, you have clock power, sequential power, the little bit that’s the logic gates doing actual work.” They’re kind of like the glue that holds the logic together. We also optimized the sequential elements that move the data in between the logic. We’d worked really hard cutting power out in prior generations, but we got 40% more this time. We invest a ton of engineering to optimize that down and cut 40% out of that clock network. Takes a lot of wires, a lot of big drivers to do that. In a CPU core, these things running over 4GHz, very hard to get the clocks out to all those billions of transistors with picosecond accuracy. “One of the things that I highlighted earlier today is the effort the team put in to squeeze down the overhead power. We’re not moving data around, because that wastes power. We aren’t consistently pushing the data through to the L2, there are some simplifications if you do that, but we added the complexity of a write-back so now we keep stuff way more local. The other thing we did is the write-back L1 Cache. Now we just stuff those micro-ops into the op-cache, all the decoding done, and the hit-rate there is really high, so that means we’re only doing that heavy-weight decode 10% of the time. You have this expensive logic block chunking away. So you pump all these x86 instructions in there, burns a lot of power to decode them all, and in our prior designs every time you encounter that code loop you have to go do it again. I mean, guys make their career doing this sort of thing. Sam Naffziger: “X86 decode, the variable length instructions, are very complex - requires a ton of logic. It gives us that double-whammy of a power savings and a huge performance uplift.” We can actually put them in that cache 8 at a time so we can pull 8 out per cycle, and we can actually cut two stages off that pipeline of trying to figure out the instructions. When you find the first one, you find all its neighbors with it. We actually call it an op-cache because it stores in a more dense format than the past what it does is, having seen once, we store them in this op-cache with those boundaries removed. To do that, generally we’ve had to build deep pipelines, very power hungry to do that. That means to try to get a lot of them to dispatch in a wide form, it’s a serial process. Michael Clark: “One of the hardest problems of trying to build a high-frequency x86 processor is that the instructions are a variable length. uOp cache was a major discussion piece during the pre-launch press briefing, so seeking more detail on the role of a uOp cache in Zen was the first objective: Our first question to Clark was of the Zen operation cache and Ryzen’s micro-op cache.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |