Instructions to use Kquant03/MistralTrix8x9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Kquant03/MistralTrix8x9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Kquant03/MistralTrix8x9B")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Kquant03/MistralTrix8x9B") model = AutoModelForMultimodalLM.from_pretrained("Kquant03/MistralTrix8x9B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Kquant03/MistralTrix8x9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Kquant03/MistralTrix8x9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kquant03/MistralTrix8x9B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Kquant03/MistralTrix8x9B
- SGLang
How to use Kquant03/MistralTrix8x9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Kquant03/MistralTrix8x9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kquant03/MistralTrix8x9B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Kquant03/MistralTrix8x9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kquant03/MistralTrix8x9B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Kquant03/MistralTrix8x9B with Docker Model Runner:
docker model run hf.co/Kquant03/MistralTrix8x9B
Interesting read
Thanks for the detailed explanation on this repo page and I'm really curious to see if this improves my MistralTrix hobby project and does even better.
I'll keep an eye out on this one. Thanks for sharing, and exciting times indeed! :)
yeah man, it's no big deal, and I really appreciate you making a GGUF convert of your model. I don't have a very high-end system so even my own merges are really more for research purposes. Having a model to actually inference with is really nice :)
your model turned out well, too...so just continue with confidence and try to research whatever interests and inspires you most.
I just realised that my MistralTrix model isn't mentioned anywhere on this page lol.
It's old news now so it's fine but that would've been a nice thing to do on your end! :)
I just realised that my MistralTrix model isn't mentioned anywhere on this page lol.
It's old news now so it's fine but that would've been a nice thing to do on your end! :)
this model is actually busted afaik I have no idea but the 9B passthrough models back then were not working with the MoE script. Also I was just learning how to write model cards, today I would give you a shoutout just for putting a single line of code into the architecture lol