The GPU is the chip on the card, not the video card itself. Each video card is a bus-master device so each GPU can directly access main memory while running its own code, just like the CPU. Each display driver gets a slice of CPU time to refill the GPU instruction and data pipelines so it can keep running concurrent to other GPUs and the CPU.
Later NT versions have the equivalent of drivers for each CPU/core/hyperthread built into the OS. Windows 9x and earlier do not so can only make use of one CPU core.
ATI didn't fully overcome the one-core CPU bottleneck--the display drivers still use it. But by creating powerful multiple GPU/core subsystems with lots of private, higher performance RAM, they were able to off-load most of the complex graphics rendering from the single, shared CPU.