Llama.cpp Optimization Cheat Sheet

Optimize and run Llama models locally with llama.cpp. Quantization, performance tuning, GPU acceleration.

Last Updated: December 24, 2025

Installation

Build from source, pre-built binaries

Key point 1

Detailed explanation for installation

Key point 2

Detailed explanation for installation

Key point 3

Detailed explanation for installation

Key point 4

Detailed explanation for installation

Model Conversion

Convert to GGUF format

Key point 1

Detailed explanation for model conversion

Key point 2

Detailed explanation for model conversion

Key point 3

Detailed explanation for model conversion

Key point 4

Detailed explanation for model conversion

Quantization Options

q4_0, q4_K_M, q8_0 explained

Key point 1

Detailed explanation for quantization options

Key point 2

Detailed explanation for quantization options

Key point 3

Detailed explanation for quantization options

Key point 4

Detailed explanation for quantization options

Command Line Usage

Run models, parameters, options

Key point 1

Detailed explanation for command line usage

Key point 2

Detailed explanation for command line usage

Key point 3

Detailed explanation for command line usage

Key point 4

Detailed explanation for command line usage

GPU Acceleration

CUDA, Metal, OpenCL

Key point 1

Detailed explanation for gpu acceleration

Key point 2

Detailed explanation for gpu acceleration

Key point 3

Detailed explanation for gpu acceleration

Key point 4

Detailed explanation for gpu acceleration

Performance Tuning

Batch size, context, threads

Key point 1

Detailed explanation for performance tuning

Key point 2

Detailed explanation for performance tuning

Key point 3

Detailed explanation for performance tuning

Key point 4

Detailed explanation for performance tuning

💡 Pro Tip: Master the fundamentals first before moving to advanced techniques. Practice regularly and refer to this cheatsheet for quick reference.

← Back to Data Science & ML | Browse all categories | View all cheat sheets

Installation

Model Conversion

Quantization Options

Command Line Usage

GPU Acceleration

Performance Tuning

Related Cheat Sheets