AI Hardware

The chips, datacenters, and infrastructure powering the AI revolution.

Chip AMD

AMD Instinct MI300X

The MI300X is AMD's challenge to NVIDIA's data centre GPU dominance. Released in December 2023, it carries 192GB of HBM3 memory — 2.4× the H100's 80GB — on a single package, enabling large model inference without multi-GPU memory pooling. Its memory bandwidth of 5.3 TB/s exceeds the H100 by 58%. On inference workloads where memory capacity is the bottleneck, the MI300X is competitive with or better than the H100. The limiting factor is software: NVIDIA's CUDA ecosystem has a decade-long head start, and AMD's ROCm platform, while improving, requires more engineering effort to achieve comparable performance. Microsoft deployed MI300X accelerators in Azure in 2024. AICI tracks the MI300X as the most credible hardware threat to NVIDIA's AI chip monopoly position.

Chip Amazon Web Services

AWS Trainium2

Amazon's Trainium2 is a custom AI training chip designed for AWS, announced in November 2023. Amazon claims it delivers up to 4× the performance and 2× the energy efficiency of its first-generation Trainium chip. Trainium2 is available in EC2 Trn2 instances and in UltraServer configurations that pool up to 64 chips. Like Google's TPUs, Trainium is only accessible via cloud service and is not sold as hardware. Amazon's motivation mirrors Google's: reducing dependence on NVIDIA for the training workloads that underpin AWS AI services. The Neuron SDK provides the software layer. Trainium2 is optimised for the largest model training runs and has been used internally to train Amazon Titan models.

Chip Apple

Apple M4

The Apple M4, released in May 2024, is the fourth generation of Apple's unified memory architecture chip, built on TSMC's 3nm N3E process. Its 38-core Neural Engine delivers 38 TOPS (trillion operations per second). What distinguishes the M4 for AI workloads is its memory architecture: the CPU, GPU, and Neural Engine all access the same physical memory pool with no data copying. In the 14-inch MacBook Pro configuration with 32GB unified memory, the M4 can run 70B-parameter language models locally — something that requires a discrete GPU workstation on other platforms. AICI regards local inference capability as significant for privacy-sensitive AI use cases: models running on-device process no data through cloud infrastructure.

LocalBuild Community / Various

Consumer LLM Workstation

The emergence of capable open-weight models (Llama 3.3 70B, Mistral Large, Phi-4) has made local AI inference practical on consumer hardware for the first time. A workstation with a high-VRAM discrete GPU — the NVIDIA RTX 4090 (24GB VRAM) or RTX 3090 (24GB VRAM) — can run 13–70B parameter models at useful speeds using tools such as llama.cpp, Ollama, or LM Studio. The Apple M-series unified memory architecture enables even larger models: an Apple Mac Studio with 192GB unified memory can run 405B-parameter models locally. AICI regards local inference as a significant development for AI governance: it enables private, auditable AI use cases that do not expose data to cloud providers, reduces inference costs for organisations with existing hardware, and expands AI capability to jurisdictions or contexts where data sovereignty makes cloud AI impractical.

Chip Google

Google TPU v5e

Google's Tensor Processing Units are application-specific integrated circuits (ASICs) designed from the ground up for matrix multiplication workloads in neural networks. The TPU v5e, announced in August 2023, is Google's efficiency-focused variant — designed for large-scale training and inference at lower cost per operation than its high-performance sibling, the v5p. TPUs do not compete on raw peak FLOPS with NVIDIA GPUs; they compete on total cost of ownership for specific workloads. Google trains Gemini on TPUs. The existence of a competitive internal accelerator is why Google is less dependent on NVIDIA than Microsoft or Meta — a structural advantage in the AI infrastructure arms race. TPUs are available via Google Cloud but not sold as hardware.

Chip Intel

Intel Gaudi 3

Intel's Gaudi 3 accelerator, released in April 2024, is Intel's most competitive AI chip to date. It claims 1.84× the AI compute and 1.5× the memory bandwidth of the H100 SXM5 on specific workloads, with 128GB HBM2e memory. Gaudi 3 is notable for its built-in 24-port 200Gbps Ethernet fabric — it handles inter-chip communication via standard networking rather than proprietary interconnects, which simplifies cluster construction. The competitive question is not peak specs but real-world performance on production workloads, where CUDA optimisation continues to give NVIDIA a material advantage. Intel has struggled to convert technical specifications into market share in AI accelerators.

Datacenter Meta

Meta AI Research SuperCluster

Meta's AI Research SuperCluster (RSC), completed in early 2022, was one of the fastest AI supercomputers in the world at the time of completion. It contains 16,000 NVIDIA A100 GPUs, connected via NVIDIA's InfiniBand fabric and a custom storage system capable of 16 TB/s throughput. Meta trained the Llama model family on RSC. The cluster consumed approximately 20MW of power. Meta announced plans to expand to 35,000 H100 GPUs by end of 2023 and has since announced investments in hundreds of thousands of H100s for the Llama 3 and beyond training runs. AICI tracks the RSC as a reference point for the scale of compute required to train frontier open-weight models — and therefore for the concentration of AI capability among the handful of organisations that can build such infrastructure.

Datacenter Microsoft

Microsoft Azure AI Infrastructure

Microsoft has committed $50 billion in AI infrastructure investment in 2024, with Azure AI infrastructure as the primary beneficiary. Its AI supercomputer — built for OpenAI and for Microsoft's own Copilot services — uses custom-designed clusters of 10,000+ NVIDIA H100 GPUs connected via InfiniBand at 400Gbps. Microsoft also partnered with OpenAI to design custom AI chips (announced 2023) intended to reduce reliance on third-party silicon for inference workloads. Azure's AI infrastructure is the commercial foundation of the Microsoft-OpenAI partnership: OpenAI's models are trained and served on Azure. The scale of this infrastructure investment is the physical correlate of the $13 billion Microsoft has invested in OpenAI.

Chip NVIDIA

NVIDIA B200 SXM

The B200 is NVIDIA's Blackwell architecture GPU, announced in March 2024. It represents a generational leap: 20 petaFLOPS of FP4 tensor compute (a new precision format designed for inference), 192GB HBM3e memory, and 8 TB/s memory bandwidth. The Blackwell chip is fabricated at TSMC on the 4NP process and contains 208 billion transistors — the largest chip NVIDIA has built. Two B200 dies are connected via NVLink-C2C to form the GB200 "super chip." NVIDIA's GB200 NVL72 rack — 72 B200 GPUs connected via NVLink — is designed to operate as a single large inference engine, capable of serving a 1.8 trillion parameter model. Demand for B200s drove NVIDIA's market capitalisation above $3 trillion in 2024.

Datacenter NVIDIA

NVIDIA DGX B200

The DGX B200 is NVIDIA's turnkey AI server, containing 8 B200 GPUs connected via NVLink 5.0 with a total of 1,440GB of HBM3e memory. It is designed as a self-contained unit for organisations that want to operate frontier AI workloads on-premises rather than in cloud infrastructure. A single DGX B200 is priced at approximately $300,000. The DGX line has historically been the entry point for research institutions building on-premises AI infrastructure — universities, government labs, and enterprises that cannot or will not send proprietary data to cloud providers. The B200 generation's memory capacity means a single DGX B200 can serve models with up to ~700 billion parameters.

Chip NVIDIA

NVIDIA H100 SXM5

The H100 is the chip that defined the AI infrastructure buildout of 2023–2024. Based on the Hopper architecture (80 billion transistors, TSMC 4N process), the H100 SXM5 delivers 3,958 TFLOPS of FP16 tensor compute and 3.35 TB/s of HBM3 memory bandwidth. Its Transformer Engine — hardware specifically designed to accelerate attention mechanisms — made it the GPU of choice for training and serving large language models. A single H100 SXM5 costs approximately $30,000–$40,000. The H100 became a geopolitically significant object: the US government restricted its export to China in October 2022 and tightened restrictions in 2023, making NVIDIA GPU access a proxy for national AI capability. Data centres acquiring H100s in 2023 and 2024 spent billions of dollars — Microsoft, Google, Meta, and Amazon each deployed tens of thousands of units.

Chip NVIDIA

NVIDIA H200 SXM

The H200 is NVIDIA's incremental upgrade to the H100, released in late 2023. The compute specifications are identical — it uses the same Hopper GPU die — but the memory system is substantially upgraded: 141GB of HBM3e versus the H100's 80GB, with memory bandwidth increasing from 3.35 TB/s to 4.8 TB/s. For large model inference, where memory capacity and bandwidth are often the bottleneck, this is a meaningful improvement. The H200 can serve larger models or larger batch sizes than the H100 without model parallelism. It became the primary GPU for hyperscale inference workloads in 2024.

enes