Instructions to use zai-org/GLM-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/GLM-OCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="zai-org/GLM-OCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-OCR") model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-OCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use zai-org/GLM-OCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zai-org/GLM-OCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/zai-org/GLM-OCR
- SGLang
How to use zai-org/GLM-OCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zai-org/GLM-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zai-org/GLM-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use zai-org/GLM-OCR with Docker Model Runner:
docker model run hf.co/zai-org/GLM-OCR
Add OlmOCRBench evaluation results
This PR ensures your model shows up at https://huggingface.co/datasets/allenai/olmOCR-bench
The evaluation was done through the official SDK.
@staghado Thanks for running the evaluation and submitting this PR! We appreciate you taking the time to benchmark the model.
However, the reported metrics look a bit unusual to us. We’re planning to rerun the evaluation on our side using the official SDK to double-check the results. We’ll follow up once we’ve reproduced and verified the numbers.
Thanks again for the effort!
Also, could you confirm the inference setup you used? For example, did you run inference via the MaaS API, or through the SDK provided in our GitHub repo (https://github.com/zai-org/GLM-OCR)? Knowing the exact setup would help us reproduce the evaluation more accurately.
It seems the evaluation was run using the ZAI API for inference. We’ll try reproducing the results with the same setup on our side. Thanks!
Thanks for looking into this! Here's what I did:
I used the ZAI Python SDK (zai-sdk==0.2.2) with the layout_parsing.create endpoint. The olmOCR-bench PDFs were pre-rendered to PNG at 200 DPI with a max side length of 1540px (aspect ratio preserved; native resolution kept if smaller). Each image was processed 3 times and test pass rates were averaged across repeats. I then ran the official olmocr.bench.benchmark evaluation script with the standard test JSONL files.
For context, I had previously run GLM-OCR standalone using vLLM with just the "Text Recognition:" prompt (no layout detection), which scored 67.5% overall (excl. h&f). The per-category scores largely match between the two setups, except for tables (42.5% → 77.6%) — which makes sense since the API includes layout detection that routes table regions to the "Table Recognition:" prompt. The other categories see only minor differences, confirming that the evaluation is correct.
| Category | vLLM (w/o layout) | ZAI API (with layout) |
|---|---|---|
| arxiv_math | 80.4% | 80.7% |
| multi_column | 79.9% | 76.7% |
| old_scans_math | 74.9% | 68.3% |
| old_scans | 39.9% | 37.6% |
| long_tiny_text | 87.6% | 86.9% |
| table_tests | 42.5% | 77.6% |
| Overall (excl. h&f) | 67.5% | 75.2% |
The full extraction script is available as a gist.
Hope this helps reproduce!
@staghado Hi,thanks a lot for the detailed explanation and for sharing your setup and results — this is very helpful for us.
We also really appreciate you taking the time to run such a thorough evaluation of our model and documenting the pipeline so clearly.
This is also very helpful for us as we continue iterating and improving future versions of the model.
@iyuge2 Hello,
Glad this helps, please merge this when you get the results/setup or post your results and i will update
Also for the future, I think OlmOCR-bench should be one of the primary benchmark to report as it does not suffer from edit-distance biases and is closer to how a human would evaluate OCR(i.e test that various facts check out like reading order, table cells, formula etc).
Similarly comparing to existing SOTA models would also help the community. For instance LightOnOCR-2-1B was released before GLM-OCR and scores way higher on OlmOCR-bench but was not compared to.
Thanks
Hi @iyuge2 , nice to see this discussion thread!
Would you be for merging the following PRs?
For GLM-5:
- https://huggingface.co/zai-org/GLM-5/discussions/51
- https://huggingface.co/zai-org/GLM-5/discussions/44
For GLM-4.7:
- https://huggingface.co/zai-org/GLM-4.7/discussions/44
- https://huggingface.co/zai-org/GLM-4.7/discussions/43
- https://huggingface.co/zai-org/GLM-4.7/discussions/48
- https://huggingface.co/zai-org/GLM-4.7/discussions/50
For GLM-4.7 Flash:
These are based on the new evaluation results feature on the hub.
Thank you!
Cheers,
The HF team