Oh wow, good
Ujjwal Tyagi
AI & ML interests
Recent Activity
Organizations
1. Unsloth
GitHub: https://github.com/unslothai/unsloth
โ Fastest way to fine-tune LLMs locally
โ Optimized for low VRAM (even laptops)
โ Plug-and-play with Hugging Face models
2. Axolotl
GitHub: https://github.com/OpenAccess-AI-Collective/axolotl
โ Flexible LLM fine-tuning configs
โ Supports LoRA, QLoRA, multi-GPU
โ Great for custom training pipelines
3. TRL (Transformer Reinforcement Learning)
GitHub: https://github.com/huggingface/trl
โ RLHF, DPO, PPO for LLM alignment
โ Built on Hugging Face ecosystem
โ Essential for post-training optimization
4. DeepSpeed
GitHub: https://github.com/microsoft/DeepSpeed
โ Train massive models efficiently
โ Memory + speed optimization
โ Industry standard for scaling
5. LLaMA-Factory
GitHub: https://github.com/hiyouga/LLaMA-Factory
โ All-in-one fine-tuning UI + CLI
โ Supports multiple models (LLaMA, Qwen, etc.)
โ Beginner-friendly + powerful
6. PEFT
GitHub: https://github.com/huggingface/peft
โ Fine-tune with minimal compute
โ LoRA, adapters, prefix tuning
โ Best for cost-efficient training
๐ Try it now: FINAL-Bench/Darwin-9B-NEG
๐ Q4 bit : FINAL-Bench/Darwin-9B-MFP4
We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer โ our proprietary Native Entropy Gating (NEG) technology.
๐ GPQA Diamond (198 PhD-level questions):
โธ Baseline Darwin-9B (no NEG) โ 51.01 %
โธ Pure NEG (greedy ยท 1ร cost) โ 63.64 % ๐ฅ +12.63 %p
โธ + Permutation (4ร cost) โ 76.26 %
โธ + Ensemble Refinement (~20ร) โ 84.34 % ๐
With only 9 billion parameters and 1ร inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % โ surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.
๐ฌ What makes NEG different from Multi-Turn Iteration (MTI)?
Classical MTI needs 3-8ร extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens โ
essentially free at inference time.
โจ Key differentiators
โข Architecturally internalised โ model file *is* the feature
โข 1ร inference cost (vs. 3-8ร for MTI)
โข Drop-in with vLLM / SGLang / TGI / transformers โ no extra engine
โข +12.63 %p reasoning at zero latency overhead
โข Single-file deployment, Apache 2.0 licensed
๐งฌ Lineage
Qwen/Qwen3.5-9B โ Darwin-9B-Opus (V7 evolutionary merge) โ Darwin-9B-NEG (V8 + NEG training)
#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2
and I didnโt type a single line of code. Not even a semicolon.
This Coding Agent is on steroids. Its comprehension in long back-and-forths is night and day better, and that 256K context window swallows the entire project structure whole.
Tell it what you want, and it actually gets the full picture no confused blank stares from the AI.
And weโre not messing around with dinky little code snippets here. It spits out a fully functional project
app.json, every pageโs wxml/wxss/js/json, even Mock data pre-packed. Import it into WeChat Dev Tools and it runs on the first try
Only one tiny visual nitpick, zero logic bugs. Point out the flaw, and it fixes it instantly no new bugs, no passive-aggressive code breaks, no headaches
The entire vibe Tell it your idea โ Get a complete working project โ Mention a tiny flaw โ AI polishes it.
No coding, no endless edits, no soul-crushing debugging that makes you want to throw your laptop. Absolute game-changer
Oh very wonderful work! Nice work guys
I love your diagrams, it's very good for beginners, nice work!
It all starts with ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐ฉ๐ฒ๐ฟ๐ถ๐ณ๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฅ๐ฒ๐๐ฎ๐ฟ๐ฑ๐
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training
In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)
Consider a more complex tic-tac-toe env โโญ
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions
(envs can also include tools)
---
What happens at training?
We use ๐๐ฟ๐ผ๐๐ฝ ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐๐ฒ ๐ฃ๐ผ๐น๐ถ๐ฐ๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป with a tic-tac-toe env
No critic model needed, the group is the baseline
Simpler than PPO
1๏ธโฃ Rollout generation: from the same board, model plays N games via sampling
2๏ธโฃ Each game scored with deterministic rewards (win, format, ...)
3๏ธโฃ Mean score computed across the group
4๏ธโฃ Each rollout's advantage = its score minus the group mean
5๏ธโฃ Model updated to favor trajectories above baseline
๐ Repeat
For a deep dive, check out
๐ฑ https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs
insightful paper, as you are a researcher, you can apply as a researcher role through this mail: careers@shirova.com, we are building our founding team of researchers, Shirova AI is a research lab based in india
Glad to hear, nice work!
oh I can understand, your research is interesting, nice work!, keep going ๐ ๐ค
Oh nice! Good work
You're welcome. If you haven't already, you can review my master notes in the dataset repo card, https://huggingface.co/datasets/Ujjwal-Tyagi/ai-ml-foundations-book-collection#my-master-notes-and-main-concept-understanding-after-i-read-those-books
it looks interesting but like any implementation plan, or any kind of result by implementing it? in the simple easy way, could you please explain what is it for and how we can implement it?
Ujjwal-Tyagi/ai-ml-foundations-book-collection