7nm radeon vii graphics card fp645/31/2023 that said they did mention potentially adding more INT hardware either to make it more flexible or faster not sure.Īs to double wide SMID. I think they'd have to liquid cool the 2250mhz version though.Ī 2250Mhz version would have 80FP16 TF, and at least 160INT8 and 320INT4. this running t least and potentially since 7nm is very mature at this point. I mean seriously people AMD is not launching a card with a mere 13.5TF to call it a compute killer card with twice the silicon and lower TF/s than MI60 that'd be nuts. Unfortunately, while it'd be great for scientific workloads, getting native FP64 fully utilized would take a bit of work in normal operations (not all instructions are identical, so there could be just one FP32 instruction taking up a full FP64 SIMD).įP32 output would be 26.972 TFLOPs using 2xFP32 (identical instructions).ħ50Mhz is probably the base clock. It'd be more like a CPU in that way.Īnd, there might be a reason for such low clocks: what if CDNA moves to a double-wide FP64 SIMD? That'd make it 13.48 TFLOPs of FP64! That's much more impressive, and can enable 2xFP32, 4xFP16, along with DOT product 8xINT8 and 16xINT4 for inferencing. Or each of the 8 128-bit HBM2 channels (1024-bit total) per shader engine (2 arrays each) can provide the necessary bandwidth to SoC, as Arcturus doesn't have pixel engines, so instead of routing through ROPs, L2 is simply connected to each module's channels directly (in slices, as usual) so, rather than L2 providing high bandwidth to ROPs for pixel output, it's backing compute ops directly to memory. Generally, SoC bandwidth should equal or exceed total memory bandwidth, unless SoC is simply providing Infinity Architecture links (up to 8 at 25GB/s*2 per link) in this GPU and memory has direct linkage to arrays via the core interconnect and L2 (Vega's HBM2 PHYs are usually linked to GPU through SoC, but this can change).ġ200MHz HBM2 is 1.228TB/s total using 4096-bit, 4-module HBM2 setup.
0 Comments
Leave a Reply. |