Hugging Face Party @ PyTorch Conference

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

parkneurals authored a paper 17 days ago

KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs

xianbao submitted a paper about 1 month ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

1aurent authored a paper 2 months ago

Ministral 3

View all activity

xianbao

submitted a paper to Daily Papers about 1 month ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published Mar 11 • 12

julien-c

submitted a paper to Daily Papers 3 months ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published Jan 29 • 27

csabakecskemeti

posted an update 4 months ago

Post

3327

Just sharing a result of a homelab infrastructure experiment:

I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com

Screen recording:
https://lnkd.in/gKM9H5GJ

3 replies

Johannes

authored a paper 4 months ago

INTELLECT-3: Technical Report

Paper • 2512.16144 • Published Dec 18, 2025 • 20

woojun-jung

authored a paper 4 months ago

Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models

Paper • 2512.10362 • Published Dec 11, 2025 • 1

csabakecskemeti

posted an update 5 months ago

Post

1430

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

csabakecskemeti

posted an update 5 months ago

Post

2103

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!

8 replies

csabakecskemeti

posted an update 6 months ago

Post

315

Recently there are so much activity on token efficient formats, I've also build a package (inspired by toon).

Deep-TOON

My goal was to token efficiently handle json structures with complex embeddings.

So this is what I've built on the weekend. Feel free try:

https://pypi.org/project/deep-toon/0.1.0/

xianbao

authored a paper 6 months ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

Paper • 2510.17950 • Published Oct 20, 2025 • 9

csabakecskemeti

posted an update 6 months ago

Post

2659

Christmas came early this year

3 replies

Xenova

posted an update 8 months ago

Post

22791

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!

4 replies

1024m

authored 2 papers 9 months ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published Aug 6, 2025

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published Jul 31, 2025 • 2

wubingheng

authored a paper 9 months ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4, 2025 • 19

Xenova

posted an update 9 months ago

Post

4946

The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀

2 replies

wubingheng

authored a paper 9 months ago

Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

Paper • 2505.19716 • Published May 26, 2025 • 4

Xenova

posted an update 9 months ago

Post

3755

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU

Skylion007

authored a paper 10 months ago

The Diffusion Duality

Paper • 2506.10892 • Published Jun 12, 2025 • 37

csabakecskemeti

posted an update 11 months ago

Post

3089

Has anyone ever backed up a model to a sequential tape drive, or I'm the world first? :D
Just played around with my retro PC that has got a tape drive—did it just because I can.

5 replies

Xenova

posted an update 11 months ago

Post

7633

NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!

5 replies

AI & ML interests

Recent Activity

Team members 198

HF-Party's activity