CS-3 system is built for single node domain scaling to 24 Trillion parameter models. I.e., they claim you can run the same code without hand-written distributed training code to reach 24 Trillion parameter models.
CS-3 system is built for single node domain scaling to 24 Trillion parameter models. I.e., they claim you can run the same code without hand-written distributed training code to reach 24 Trillion parameter models.