Instructions to use rp-yu/Dimple-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rp-yu/Dimple-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="rp-yu/Dimple-7B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("rp-yu/Dimple-7B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rp-yu/Dimple-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rp-yu/Dimple-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rp-yu/Dimple-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/rp-yu/Dimple-7B

SGLang

How to use rp-yu/Dimple-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rp-yu/Dimple-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rp-yu/Dimple-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rp-yu/Dimple-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rp-yu/Dimple-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use rp-yu/Dimple-7B with Docker Model Runner:
```
docker model run hf.co/rp-yu/Dimple-7B
```

🤗 Model | 💬 Demo: Chat with Dimple | 📑 Paper | ✨ Code

💧 Dimple-7B

Dimple is the first Discrete Diffusion Multimodal Large Language Model (DMLLM) that leverages a hybrid training paradigm combining autoregressive and diffusion-based instruction tuning. The model architecture is similar to Qwen and LLaVA, while introducing an autoregressive-then-diffusion training strategy:

Stage 1: Autoregressive fine-tuning for alignment and initial instruction tuning.
Stage 2: Diffusion-based fine-tuning for enhanced instruction-following capabilities.

Trained on the same dataset as LLaVA-NEXT, Dimple-7B surpasses LLaVA-NEXT-7B by 3.9%, demonstrating that diffusion-based multimodal language models can match its autoregressive counterparts under similar training budget.

🔍 Highlights

Hybrid Training: Combines autoregressive and diffusion training.
Diffusion Decoding: Supports confident decoding, random decoding, maskgit-style decoding, and entropy-based decoding.
Controllable Generation: Enables fine-grained control over format, structure, and length via structure priors.
Autoregressive-like Prefilling: Enhances inference speed using prefilling techniques.

📊 Evaluation Results

Benchmark	Dimple-7B (ours)	LLaVA-1.5-7B	LLaVA-NEXT-7B	Eagle-7B	Eagle2-9B	Qwen-VL-7B	Qwen2.5-VL-7B
Training Samples	1.3M	1.2M	1.3M	2.4M	27.8M	1.5B	-
Training Tokens	0.8B	-	-	-	-	-	2.6T
Base LLM	Dream (Qwen2.5)	Vicuna	Vicuna-1.5	Vicuna	Qwen2.5	Qwen	Qwen2.5
GQA	59.2	62.0	64.8	64.9	-	59.3	-
MMBench (en test)	74.6	64.3	68.7	68.4	-	-	83.5
MME (Perception)	1514	1510	1519	1528	-	-	-
MME (Cognition)	432	-	332	-	-	-	-
MME (Total)	1946	-	1851	-	-	-	2347
POPE	86.2	85.8	86.7	88.8	-	-	-
MMMU (val)	45.2	-	35.8	36.3	56.1	-	58.6
SQA (img)	77.1	66.8	72.8	70.0	-	-	-
AI2D	74.4	-	65.4	-	83.9	62.3	83.9
ChartQA	63.4	-	54.9	67.7	86.4	65.7	87.3
TextVQA	61.6	-	64.8	-	83.0	-	-
OCRBench	565	-	490	529	-	-	-
MathVista (mini)	42.3	-	33.0	-	63.8	37.0	68.2
MMVet	41.2	31.1	47.3	-	62.2	-	67.1

🛠️ Environment

Make sure your environment includes the following versions:

transformers==4.46.2
torch==2.5.1
accelerate==1.6.0

🚀 Inference Example

import torch
from transformers import AutoProcessor, AutoModel
import json, requests
from PIL import Image

model_name = "rp-yu/Dimple-7B"
processor = AutoProcessor.from_pretrained(
    model_name,
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
messages = [
    [{"role": "user", "content": [
        {"type": "image", "image": image_url},
        {"type": "text", "text": "Describe this image."}
    ]}],
]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, add_vision_id=False
)
images = [
    Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
]

inputs = processor(
    text=text,
    images=images,
    videos=None,
    padding="longest",
    return_tensors="pt",
)

input_ids = inputs.pop("input_ids")
output = model.diffusion_generate(
    input_ids,
    max_new_tokens=64,
    output_history=True,
    return_dict_in_generate=True,
    steps=64,
    temperature=0.2,
    top_p=0.95,
    alg="origin",
    use_cache=True,
    alg_p_threshold=0.95,
    use_original_confidence=True,
    decoding_pipeline="dim",
    **inputs
)

generations = [
    processor.tokenizer.decode(g[len(p):].cpu().tolist())
    for p, g in zip(input_ids, output.sequences)
]

for j in range(len(messages)):
    print("output:", j, generations[j].split(processor.tokenizer.eos_token)[0])

# output: 0 In the image, a woman wearing a shirt with a plaid and a dog are sitting together on a beach. The sun appears to be setting in the background, creating a warm and serene atmosphere.

📚 Citation

@misc{dimple,
      title={Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding}, 
      author={Runpeng Yu and Xinyin Ma and Xinchao Wang},
      year={2025},
      eprint={2505.16990},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.16990}, 
}