Instructions to use moonshotai/Kimi-VL-A3B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use moonshotai/Kimi-VL-A3B-Thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-VL-A3B-Thinking", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("moonshotai/Kimi-VL-A3B-Thinking", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use moonshotai/Kimi-VL-A3B-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "moonshotai/Kimi-VL-A3B-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-VL-A3B-Thinking", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/moonshotai/Kimi-VL-A3B-Thinking
- SGLang
How to use moonshotai/Kimi-VL-A3B-Thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-VL-A3B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-VL-A3B-Thinking", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-VL-A3B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-VL-A3B-Thinking", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use moonshotai/Kimi-VL-A3B-Thinking with Docker Model Runner:
docker model run hf.co/moonshotai/Kimi-VL-A3B-Thinking
gguf?
please
Unable to run even with 4090 and 3090, running out of memory.
Unable to run even with 4090 and 3090, running out of memory.
a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.
It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
My dual-4090 rig also went poooooff, OOM. Any ideas? xD
Yes, id like GGUF. id like to test out this model
Same if you got it. The safetensors compiling or assembling thing is prone to failure, and I'm not sure if I'm getting it right.
where is the GGUF?