AI & ML interests
None defined yet.
Recent Activity
View all activity
Articles
mubashir1837
published an article about 19 hours ago
Article
GeneFix-AI: AI-Powered CRISPR-Cas9 System for Real-Time Detection and Correction of Mutations in Non-Human Species
hugging-science
• Smith42
updated a
dataset 1 day ago
Smith42
updated a
dataset 10 days ago
Smith42
published a
dataset 10 days ago
Add README with skymap as hero figure
#2 opened 12 days ago
by
tbussozungri
tom-hehir
updated a
dataset 17 days ago
cgeorgiaw
updated a
dataset 23 days ago
specimba
updated a
model 23 days ago
specimba
published a
model 23 days ago
Address horizontal icon overflow
#15 opened about 1 month ago
by
evijit
philipp-zettl
posted an update about 2 months ago
Post
206
My [
philipp-zettl/vecdb-wasm
To demonstrate the capabilities of this engine, the intern then went ahead and implemented a fully offline document RAG application
philipp-zettl/academic-copilot
And that was just one of my interns working yesterday 😄
ml-intern]( smolagents/ml-intern) assisted me yesterday to build a vector DB that runs in your browserphilipp-zettl/vecdb-wasm
To demonstrate the capabilities of this engine, the intern then went ahead and implemented a fully offline document RAG application
philipp-zettl/academic-copilot
And that was just one of my interns working yesterday 😄
philipp-zettl
posted an update about 2 months ago
Post
2656
I've been cooking something neat over the past weeks 👨🍳
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
Final models get exported in
- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6744 MB / 7805 MB
Data on GPU: 1641 MB
Grads on GPU: 3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6776 MB / 7805 MB
Data on GPU: 1561 MB
Grads on GPU: 3279 MB
CPU Offloaded: 18590 MB
-----------------------------Final models get exported in
safetensors format and are compatible with PyTorch and transformers, for accessibility.- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
philipp-zettl
posted an update 2 months ago
Post
222
I'm unemployed, I have a gaming GPU, and I just published a German LLM.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.
m1b
authored a
paper 4 months ago
AyushM6
authored a
paper 4 months ago
ArkaMukherjee
authored 2
papers 6 months ago
sfaezella
authored a
paper about 1 year ago
ameerazam08
posted an update over 1 year ago
jms98
authored a
paper over 1 year ago