Instructions to use zai-org/GLM-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-OCR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zai-org/GLM-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForImageTextToText

tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-OCR")
model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zai-org/GLM-OCR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-OCR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-OCR

SGLang

How to use zai-org/GLM-OCR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-OCR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-OCR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-OCR with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-OCR
```

Add OlmOCRBench evaluation results

#43

by staghado - opened Mar 5

base: refs/heads/main

←

from: refs/pr/43

Discussion Files changed

+73

-0

Add OlmOCRBench evaluation results24d1a23a

staghado

Mar 5

•

edited Mar 5

This PR ensures your model shows up at https://huggingface.co/datasets/allenai/olmOCR-bench
The evaluation was done through the official SDK.

Update .eval_results/olmocrbench.yaml0365ee86

Update .eval_results/olmocrbench.yaml0b03da6b

iyuge2

Z.ai org Mar 6

@staghado Thanks for running the evaluation and submitting this PR! We appreciate you taking the time to benchmark the model.

However, the reported metrics look a bit unusual to us. We’re planning to rerun the evaluation on our side using the official SDK to double-check the results. We’ll follow up once we’ve reproduced and verified the numbers.

Thanks again for the effort!

iyuge2

Z.ai org Mar 6

Also, could you confirm the inference setup you used? For example, did you run inference via the MaaS API, or through the SDK provided in our GitHub repo (https://github.com/zai-org/GLM-OCR)? Knowing the exact setup would help us reproduce the evaluation more accurately.

iyuge2

Z.ai org Mar 6

It seems the evaluation was run using the ZAI API for inference. We’ll try reproducing the results with the same setup on our side. Thanks!

staghado

Mar 6

Thanks for looking into this! Here's what I did:

I used the ZAI Python SDK (zai-sdk==0.2.2) with the layout_parsing.create endpoint. The olmOCR-bench PDFs were pre-rendered to PNG at 200 DPI with a max side length of 1540px (aspect ratio preserved; native resolution kept if smaller). Each image was processed 3 times and test pass rates were averaged across repeats. I then ran the official olmocr.bench.benchmark evaluation script with the standard test JSONL files.

For context, I had previously run GLM-OCR standalone using vLLM with just the "Text Recognition:" prompt (no layout detection), which scored 67.5% overall (excl. h&f). The per-category scores largely match between the two setups, except for tables (42.5% → 77.6%) — which makes sense since the API includes layout detection that routes table regions to the "Table Recognition:" prompt. The other categories see only minor differences, confirming that the evaluation is correct.

Category	vLLM (w/o layout)	ZAI API (with layout)
arxiv_math	80.4%	80.7%
multi_column	79.9%	76.7%
old_scans_math	74.9%	68.3%
old_scans	39.9%	37.6%
long_tiny_text	87.6%	86.9%
table_tests	42.5%	77.6%
Overall (excl. h&f)	67.5%	75.2%

The full extraction script is available as a gist.

Hope this helps reproduce!

iyuge2

Z.ai org Mar 9

@staghado Hi，thanks a lot for the detailed explanation and for sharing your setup and results — this is very helpful for us.
We also really appreciate you taking the time to run such a thorough evaluation of our model and documenting the pipeline so clearly.
This is also very helpful for us as we continue iterating and improving future versions of the model.

staghado

Mar 11

@iyuge2 Hello,
Glad this helps, please merge this when you get the results/setup or post your results and i will update
Also for the future, I think OlmOCR-bench should be one of the primary benchmark to report as it does not suffer from edit-distance biases and is closer to how a human would evaluate OCR(i.e test that various facts check out like reading order, table cells, formula etc).
Similarly comparing to existing SOTA models would also help the community. For instance LightOnOCR-2-1B was released before GLM-OCR and scores way higher on OlmOCR-bench but was not compared to.
Thanks

nielsr

Mar 18

Hi @iyuge2 , nice to see this discussion thread!

Would you be for merging the following PRs?

For GLM-5:

For GLM-4.7:

For GLM-4.7 Flash:

https://huggingface.co/zai-org/GLM-4.7-Flash/discussions/69

These are based on the new evaluation results feature on the hub.

Thank you!

Cheers,

The HF team

iyuge2 changed pull request status to merged Apr 14

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment