Jump to content

Computer Connections


pointertovoid

Recommended Posts

Using the already described technologies, a Pci-e number cruncher card can make money until supercomputers reuse its development effort. Here's a competitor to Nvidia's Hopper (30TFlops on 64b MAdd) and Amd's Instinct M210 (22.3TFlops as Pci-e card), much easier to program efficiently.

64b 1.6GHz scalar cores with FP64 MAdd compute everything. No Sse, Avx nor HT.

  • Mmx-style parallelism welcome for vector 60TFlops FP32, 120TFlops FP16, INT32, Int16, Int8.
  • A core uses the same power and area to multiply 2*2 32b numbers for matrix 120TOps, 4*4 16b 480TOps, 8*8 8b, and accumulate the columns, as complex numbers too.
  • Enough complexity for the instruction set, sequencer, the compiler, the programmer! Consider faster 32b cores taking a few cycles on 64b.
  • Arm, i64, a64, Mips, Risc, Sparc... make little difference.

32 such cores and chips access 1/32 of a 2GB Dram chip in one compute package, as already described. The MI210 and the Hopper offer only 8MB/core at identical Flops. Scaled as the Flops, 64MB Dram make 2GB for a 3GHz Avx256 quad-core, but 8MB make only 1/4 GB behind one Pci-e connector.

The Dram provides 3*read+1*write/MAdd throughput for easy programming, plus the already described data shuffling for Fft and database. The Hopper's Dram reads or writes 0.1 FP64/MAdd and the MI210's 0.05. The cores have registers and L1 but no L2, L3... thanks to snappy Dram. 64 cores at 0.8GHz would cut the latency /4.

The cores communicate over full switch matrices in a compute package. The Dram can integrate several >32*32 matrices or carry switch chips identical to the ones packages for the boards. For instance 8 parallel matrices at 16GT/s transfer one 64b data every core cycle between every core pair simultaneously. Alas, the boards carry fewer lanes, so the compute packages communicate with the outer world over their internal matrices for flexible bandwidth allocation, as in a fat tree network.

The matrices, possibly Asics, communicate per Pci-e to serve between recycled Xeons too. 2GB/s Pci-e 4.0, of if possible Pci-e 5.0. A >48*48 full switch matrix takes little silicon in a Bga400+ and can use smaller packages where fewer lanes suffice. Each matrix connects all compute packages on one module, two matrices route x6 parallel lanes, this carries only 1 word per compute package in 1 cycle, or if spread evenly, 1 word per core in 34 cycles.

The address is sent first, and upon knowing it, the matrices forward the data. Error detection happens later. The smallest message, about 64b, is but bigger than an address. The number of lanes used, sometimes over indirect paths, depends on the message size. Add operation modes to the Pci-e standard.

The connector carries x32 lanes, So-dimm are denser. At least Amphenol sell vertical ones. 64GB/s provide 1 word per compute package in 3.2 cycles, or 1 word per core in mean 100 cycles. Each compute package passes x2 lanes through the connectors: the matrices first spread big messages among the compute packages.

PcieCruncherModule.png.0ee8a14dd8a45bf574a19c069a1b903f.png

18 modules fit on a double-slot Pci-e card to pack 9216 cores, 30TFlops and 0.56TB. The centrifugal blower cools easily 2W per compute package but 576W is much for one card. At the card's center, the blower would be quieter and ease routing but inject 300W heat in the tower.

PcieCruncherFront.png.9bfe0fbb6c0145c975cb10134dd40c67.png

Each matrix connects the 18 modules, 16 matrices route x32 parallel lanes.

PcieCruncherBack.png.f77564a3c979efa36f70985269c2a548.png

Procuring standard Ddr5 2GB chips would cost 1.5kusd according to module price on eBay while the competing cards sell for 10kusd. Stacked Dram doubles the capacity but quadruples the cost, 1GB Dram costs half as much: product line. The tiny core chips are cheap and easily made. The matrices should add little and may well exist already.

Marc Schaefer, aka Enthalpy aka Pointertovoid

Link to comment
Share on other sites


Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...