AI Hardware
The chips, datacenters, and infrastructure powering the AI revolution.
AMD Instinct MI300X
The MI300X is AMD's challenge to NVIDIA's data centre GPU dominance. Released in December 2023, it carries 192GB of HBM3 memory — 2.4× the H100's 80GB — on a single package, enabling large model inference without multi-GPU memory pooling. Its memory bandwidth of 5.3 TB/s exceeds the H100 by 58%. On inference workloads where memory capacity is the bottleneck, the MI300X is competitive with or better than the H100. The limiting factor is software: NVIDIA's CUDA ecosystem has a decade-long head start, and AMD's ROCm platform, while improving, requires more engineering effort to achieve comparable performance. Microsoft deployed MI300X accelerators in Azure in 2024. AICI tracks the MI300X as the most credible hardware threat to NVIDIA's AI chip monopoly position.
AWS Trainium2
Amazon's Trainium2 is a custom AI training chip designed for AWS, announced in November 2023. Amazon claims it delivers up to 4× the performance and 2× the energy efficiency of its first-generation Trainium chip. Trainium2 is available in EC2 Trn2 instances and in UltraServer configurations that pool up to 64 chips. Like Google's TPUs, Trainium is only accessible via cloud service and is not sold as hardware. Amazon's motivation mirrors Google's: reducing dependence on NVIDIA for the training workloads that underpin AWS AI services. The Neuron SDK provides the software layer. Trainium2 is optimised for the largest model training runs and has been used internally to train Amazon Titan models.
Apple M4
The Apple M4, released in May 2024, is the fourth generation of Apple's unified memory architecture chip, built on TSMC's 3nm N3E process. Its 38-core Neural Engine delivers 38 TOPS (trillion operations per second). What distinguishes the M4 for AI workloads is its memory architecture: the CPU, GPU, and Neural Engine all access the same physical memory pool with no data copying. In the 14-inch MacBook Pro configuration with 32GB unified memory, the M4 can run 70B-parameter language models locally — something that requires a discrete GPU workstation on other platforms. AICI regards local inference capability as significant for privacy-sensitive AI use cases: models running on-device process no data through cloud infrastructure.
Google TPU v5e
Google's Tensor Processing Units are application-specific integrated circuits (ASICs) designed from the ground up for matrix multiplication workloads in neural networks. The TPU v5e, announced in August 2023, is Google's efficiency-focused variant — designed for large-scale training and inference at lower cost per operation than its high-performance sibling, the v5p. TPUs do not compete on raw peak FLOPS with NVIDIA GPUs; they compete on total cost of ownership for specific workloads. Google trains Gemini on TPUs. The existence of a competitive internal accelerator is why Google is less dependent on NVIDIA than Microsoft or Meta — a structural advantage in the AI infrastructure arms race. TPUs are available via Google Cloud but not sold as hardware.
Intel Gaudi 3
Intel's Gaudi 3 accelerator, released in April 2024, is Intel's most competitive AI chip to date. It claims 1.84× the AI compute and 1.5× the memory bandwidth of the H100 SXM5 on specific workloads, with 128GB HBM2e memory. Gaudi 3 is notable for its built-in 24-port 200Gbps Ethernet fabric — it handles inter-chip communication via standard networking rather than proprietary interconnects, which simplifies cluster construction. The competitive question is not peak specs but real-world performance on production workloads, where CUDA optimisation continues to give NVIDIA a material advantage. Intel has struggled to convert technical specifications into market share in AI accelerators.
NVIDIA B200 SXM
The B200 is NVIDIA's Blackwell architecture GPU, announced in March 2024. It represents a generational leap: 20 petaFLOPS of FP4 tensor compute (a new precision format designed for inference), 192GB HBM3e memory, and 8 TB/s memory bandwidth. The Blackwell chip is fabricated at TSMC on the 4NP process and contains 208 billion transistors — the largest chip NVIDIA has built. Two B200 dies are connected via NVLink-C2C to form the GB200 "super chip." NVIDIA's GB200 NVL72 rack — 72 B200 GPUs connected via NVLink — is designed to operate as a single large inference engine, capable of serving a 1.8 trillion parameter model. Demand for B200s drove NVIDIA's market capitalisation above $3 trillion in 2024.
NVIDIA H100 SXM5
The H100 is the chip that defined the AI infrastructure buildout of 2023–2024. Based on the Hopper architecture (80 billion transistors, TSMC 4N process), the H100 SXM5 delivers 3,958 TFLOPS of FP16 tensor compute and 3.35 TB/s of HBM3 memory bandwidth. Its Transformer Engine — hardware specifically designed to accelerate attention mechanisms — made it the GPU of choice for training and serving large language models. A single H100 SXM5 costs approximately $30,000–$40,000. The H100 became a geopolitically significant object: the US government restricted its export to China in October 2022 and tightened restrictions in 2023, making NVIDIA GPU access a proxy for national AI capability. Data centres acquiring H100s in 2023 and 2024 spent billions of dollars — Microsoft, Google, Meta, and Amazon each deployed tens of thousands of units.
NVIDIA H200 SXM
The H200 is NVIDIA's incremental upgrade to the H100, released in late 2023. The compute specifications are identical — it uses the same Hopper GPU die — but the memory system is substantially upgraded: 141GB of HBM3e versus the H100's 80GB, with memory bandwidth increasing from 3.35 TB/s to 4.8 TB/s. For large model inference, where memory capacity and bandwidth are often the bottleneck, this is a meaningful improvement. The H200 can serve larger models or larger batch sizes than the H100 without model parallelism. It became the primary GPU for hyperscale inference workloads in 2024.