How to use from
Hermes Agent
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf prithivMLmods/Gemma4-BLIP3o-Captioner-8B:
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default prithivMLmods/Gemma4-BLIP3o-Captioner-8B:
Run Hermes
hermes
Quick Links

Gemma4-BLIP3o-Captioner-8B

Gemma4-BLIP3o-Captioner-8B is a fine-tuned image captioning model built on top of Gemma-4-E4B-it, designed to mimic and replicate the BLIP3o (Bootstrapped Language-Image Pretraining) Captioning System through targeted fine-tuning on ~3K samples from the BLIP3o-Pretrain-Long-Caption dataset. The model features a modified chat template with a hardcoded expert system prompt engineered for dense, detail-rich image captioning — covering a wide range of image categories including scenery, natural environments, portraits, objects, and more — with thinking mode disabled by default to prioritize low-latency captioning outputs. It supports sequential video frame captioning, making it suitable for temporally ordered visual description tasks. As a captioning-specialized variant, the model may produce artifacts or degraded outputs when used for general-purpose conversational or instruction-following tasks outside its captioning scope. Ideal for applications requiring structured, verbose, and contextually accurate image descriptions in privacy-focused or local inference environments, leveraging the efficient multimodal backbone of Gemma-4-E4B-it.

Note: This model is experimental and not an all-purpose one.

Model Files

File Name Quant Type File Size File Link
Gemma4-BLIP3o-Captioner-8B.BF16.gguf BF16 15.1 GB Download
Gemma4-BLIP3o-Captioner-8B.F16.gguf F16 15.1 GB Download
Gemma4-BLIP3o-Captioner-8B.Q2_K.gguf Q2_K 4.4 GB Download
Gemma4-BLIP3o-Captioner-8B.Q3_K_L.gguf Q3_K_L 5.02 GB Download
Gemma4-BLIP3o-Captioner-8B.Q3_K_M.gguf Q3_K_M 4.85 GB Download
Gemma4-BLIP3o-Captioner-8B.Q3_K_S.gguf Q3_K_S 4.65 GB Download
Gemma4-BLIP3o-Captioner-8B.Q4_0.gguf Q4_0 5.19 GB Download
Gemma4-BLIP3o-Captioner-8B.Q4_K_M.gguf Q4_K_M 5.34 GB Download
Gemma4-BLIP3o-Captioner-8B.Q4_K_S.gguf Q4_K_S 5.2 GB Download
Gemma4-BLIP3o-Captioner-8B.Q5_0.gguf Q5_0 5.69 GB Download
Gemma4-BLIP3o-Captioner-8B.Q5_K_M.gguf Q5_K_M 5.76 GB Download
Gemma4-BLIP3o-Captioner-8B.Q5_K_S.gguf Q5_K_S 5.69 GB Download
Gemma4-BLIP3o-Captioner-8B.Q6_K.gguf Q6_K 6.22 GB Download
Gemma4-BLIP3o-Captioner-8B.Q8_0.gguf Q8_0 8.01 GB Download
Gemma4-BLIP3o-Captioner-8B.mmproj-bf16.gguf mmproj-bf16 992 MB Download
Gemma4-BLIP3o-Captioner-8B.mmproj-f16.gguf mmproj-f16 992 MB Download
Gemma4-BLIP3o-Captioner-8B.mmproj-q8_0.gguf mmproj-q8_0 560 MB Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month
57
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Gemma4-BLIP3o-Captioner-8B

Quantized
(229)
this model

Collection including prithivMLmods/Gemma4-BLIP3o-Captioner-8B