LLM VRAM Calculator

How much GPU memory do you need to run or fine-tune a model? Pick the size, precision, and what you're doing — get a VRAM estimate and a GPU that fits. Everything runs in your browser.

Model size (billion parameters)

Precision (weights) What are you doing?

Context length (tokens) — affects inference KV cache

Estimated VRAM needed

— GB

Estimate only — actual usage varies with batch size, sequence length, framework, and kernels. For models that use grouped-query or multi-query attention (most 70B+ open models), the KV-cache figure here is a deliberate over-estimate, so real inference memory is often lower.

Related guides

LoRA & QLoRA fine-tuning guide — how the modes above actually differ
LLM inference engines compared — once it fits, serve it efficiently
Self-hosting LLMs vs cloud APIs — is buying the GPU worth it?

LLM VRAM Calculator

Related reading

Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide

SpecKV: Adaptive Speculative Decoding with Dynamic Gamma

Self-Hosting LLMs vs Cloud APIs: Cost, Speed, Privacy 2026

Find these tools useful?