LocalBuild Community / Various March 2023

Consumer LLM Workstation

The emergence of capable open-weight models (Llama 3.3 70B, Mistral Large, Phi-4) has made local AI inference practical on consumer hardware for the first time. A workstation with a high-VRAM discrete GPU — the NVIDIA RTX 4090 (24GB VRAM) or RTX 3090 (24GB VRAM) — can run 13–70B parameter models at useful speeds using tools such as llama.cpp, Ollama, or LM Studio. The Apple M-series unified memory architecture enables even larger models: an Apple Mac Studio with 192GB unified memory can run 405B-parameter models locally. AICI regards local inference as a significant development for AI governance: it enables private, auditable AI use cases that do not expose data to cloud providers, reduces inference costs for organisations with existing hardware, and expands AI capability to jurisdictions or contexts where data sovereignty makes cloud AI impractical.

Specifications

Reference build: NVIDIA RTX 4090 (24GB VRAM) or Apple M2 Ultra (192GB unified memory) | Software: llama.cpp / Ollama / LM Studio | Capable models: Llama 3.3 70B, Phi-4, Mistral 7B | Approximate cost: €2,000–€5,000