Compare the Best Open-Source Models for Image Understanding
1. Upload Your Images
Upload up to 4 images that you want to analyze.
2. Write Your Prompt
Describe what you want to analyze in your images. Be specific to get the best results.
3. Get Parallel Analysis
All models analyze your images simultaneously. Compare their responses side by side.
Available Models for Image Understanding

Pixtral-12B
A powerful vision-language model with 12 billion parameters
48GB VRAM
0.88$-1.03$/hour

InternVL2.5-1B
Efficient vision-language model with 1 billion parameters.
16GB VRAM
0.28$/hour

InternVL2.5-2B
Efficient vision-language model with 2 billion parameters.
16GB VRAM
0.28$/hour

InternVL2.5-4B
Efficient vision-language model with 4 billion parameters.
16GB VRAM
0.28$/hour

InternVL2.5-8B
Efficient vision-language model with 8 billion parameters.
24GB VRAM
0.43$-0.69$/hour

Llama-3.2-11B-Vision-Instruct
A powerful vision-language model with 11 billion parameters.
48GB VRAM
0.88$-1.03$/hour

DeepSeek-Janus-Pro-1B
A powerful vision-language model from DeepSeek with 1 billion parameters.
16GB VRAM
0.28$/hour

DeepSeek-Janus-Pro-7B
A powerful vision-language model from DeepSeek with 7 billion parameters.
24GB VRAM
0.43$-0.69$/hour
What You'll Get
For Each Model:
- Detailed analysis of your images based on your prompt
- Quality score (1-10) evaluating how well the model performed for your specific use case
- Processing time metrics (container warmup and execution time)
- Real-time status updates during processing