gemma-4-12B-it-qat-GGUF

gemma-4-12B-it-qat-q4_0-unquantized is a 12-billion-parameter instruction-tuned vision-language model from Google DeepMind, part of the Gemma 4 family, optimized using Quantization-Aware Training (QAT) to preserve bfloat16-level quality while significantly reducing memory requirements. It features a unified encoder-free architecture that projects raw image patches and audio waveforms directly into the LLM's embedding space, supports text, image, and audio modalities, and offers a 256K token context window. The model employs a hybrid attention mechanism interleaving local sliding window and full global attention, with multilingual support across 140+ languages, native function calling, configurable thinking/reasoning mode, and achieves strong benchmark scores including 77.2% on MMLU Pro, 78.8% on GPQA Diamond, and 77.5% on AIME 2026. The Q4_0 unquantized variant specifically refers to half-precision weights extracted from the QAT pipeline, making it ideal for custom downstream compilation and research rather than direct deployment.

Google DeepMind’s Gemma 4 Quantization-Aware Training (QAT) releases compress models by simulating lower precision during the training process itself. This drastically reduces VRAM requirements and accelerates local inference on consumer hardware and mobile devices while preserving the near-original quality of uncompressed baselines.

Model Files

File Name Quant Type File Size File Link
gemma-4-12B-it-qat-q4_0-unquantized.BF16.gguf BF16 23.8 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.F16.gguf F16 23.8 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q2_K.gguf Q2_K 4.83 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_L.gguf Q3_K_L 6.57 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_M.gguf Q3_K_M 6.09 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_S.gguf Q3_K_S 5.53 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_0.gguf Q4_0 6.98 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_K_M.gguf Q4_K_M 7.38 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_K_S.gguf Q4_K_S 7.02 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_0.gguf Q5_0 8.34 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_K_M.gguf Q5_K_M 8.55 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_K_S.gguf Q5_K_S 8.34 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q6_K.gguf Q6_K 9.79 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.Q8_0.gguf Q8_0 12.7 GB Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-bf16.gguf mmproj-bf16 175 MB Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-f16.gguf mmproj-f16 175 MB Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-q8_0.gguf mmproj-q8_0 159 MB Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month
2,197
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/gemma-4-12B-it-qat-GGUF

Quantized
(23)
this model

Collections including prithivMLmods/gemma-4-12B-it-qat-GGUF