Just sharing a result of a homelab infrastructure experiment:
I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com
I cannot test this I'm on AMD "AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support" (I assumed it can fall back to some non optimized kernel but seems not)
If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.
If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€― Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€ 1οΈβ£ Generate and cache image features for each frame 2οΈβ£ Create a list of embeddings for selected patch(es) 3οΈβ£ Compute cosine similarity between each patch and the selected patch(es) 4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! π€― π£οΈ Transcribe videos, meeting notes, songs and more π Runs on-device, meaning no data is sent to a server π Multilingual (8 languages) π€ Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! π₯
Has anyone ever backed up a model to a sequential tape drive, or I'm the world first? :D Just played around with my retro PC that has got a tape driveβdid it just because I can.
NEW: Real-time conversational AI models can now run 100% locally in your browser! π€―
π Privacy by design (no data leaves your device) π° Completely free... forever π¦ Zero installation required, just visit a website β‘οΈ Blazingly-fast WebGPU-accelerated inference
For those interested, here's how it works: - Silero VAD for voice activity detection - Whisper for speech recognition - SmolLM2-1.7B for text generation - Kokoro for text to speech
Powered by Transformers.js and ONNX Runtime Web! π€ I hope you like it!