|
Overfitting in BERT IMDB50k
|
|
2
|
1139
|
March 6, 2026
|
|
LLM Course code errors
|
|
7
|
56
|
March 6, 2026
|
|
Different output when we inference through packing with flash attention in bf16
|
|
1
|
9
|
March 6, 2026
|
|
Why are gradient_checkpointing and training bound?
|
|
2
|
11
|
March 2, 2026
|
|
Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer
|
|
2
|
3974
|
March 2, 2026
|
|
Attentions not returned from transformers ViT model when using output_attentions=True
|
|
5
|
1210
|
March 2, 2026
|
|
Using hyperparameter-search in Trainer
|
|
102
|
38909
|
March 2, 2026
|
|
Issue with summarization and translation pipeline
|
|
3
|
18
|
March 2, 2026
|
|
Is LLaMA rotary embedding implementation correct?
|
|
8
|
9538
|
February 26, 2026
|
|
Gemma 3 12B: 4-bit Quantization failing/ignored in Transformers v5.1.0 (Gemma3ForConditionalGeneration)
|
|
10
|
94
|
February 23, 2026
|
|
[Help Needed] Dual-Phase Softmax Steering on Llama-2 Residual Stream Yields Identical POPE Results
|
|
3
|
30
|
February 23, 2026
|
|
[Research/Discussion] Depth-agnostic stability for residual models (no extra norms, no tuning). Is this useful to you?
|
|
1
|
25
|
February 22, 2026
|
|
LLaVA Steering: Why does grounding fix hallucinations in captioning but not in Yes/No QA?
|
|
1
|
32
|
February 19, 2026
|
|
KV Caching problem with gemma 3
|
|
2
|
45
|
February 17, 2026
|
|
Num_beam_groups removed in V5?
|
|
1
|
36
|
February 14, 2026
|
|
[LLaVA-1.5] Implementing Control Barrier Functions (LCBF) via Attention Hooking – Persistent AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'
|
|
4
|
18
|
February 13, 2026
|
|
Error while importing "Trainer"
|
|
1
|
83
|
February 13, 2026
|
|
[LLaVA-1.5] Very low hallucination rate & weak attention correlation in "Attention Gap" experiment – Is my implementation of output_attentions correct?
|
|
4
|
26
|
February 12, 2026
|
|
Confusion with freezing Whisper's feature encoder
|
|
3
|
24
|
February 11, 2026
|
|
When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models
|
|
4
|
47
|
February 8, 2026
|
|
Hyperparameters vs message format prompt tuning
|
|
2
|
29
|
February 6, 2026
|
|
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True
|
|
2
|
92
|
February 5, 2026
|
|
How to train T5 to distinguish task-relevant tokens from contextual noise?
|
|
1
|
21
|
February 5, 2026
|
|
Finetuning whisper attention mask not set and canot be inferred
|
|
5
|
6195
|
February 4, 2026
|
|
Abnormal generation after multi GPU
|
|
4
|
46
|
February 4, 2026
|
|
500 Internal Error - We're working hard to fix this as soon as possible
|
|
46
|
3240
|
February 1, 2026
|
|
Caching image prototype embeddings for image-guided object detection using OWL-ViT
|
|
3
|
495
|
January 31, 2026
|
|
[Quiestion]How to specify 'model_type' of 'Qwen/Qwen3-VL-8B-Instruct-GGUF'?
|
|
4
|
81
|
January 30, 2026
|
|
SAM3Video: CLIPTextModelOutput passed as tensor causes crash with text prompts
|
|
0
|
45
|
January 29, 2026
|
|
Different lm_head size and vocab_size
|
|
1
|
919
|
January 28, 2026
|