Models

9,804

Full-text search

Active filters: dpo, trl

VladShash/deepseek-math-7b-lean-prover-dpo-olmo-3

Text Generation • 7B • Updated 6 days ago • 3.72k • 3

TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF

7B • Updated Jan 31, 2024 • 2.3k • 126

davidquicast/SmolLM2-FT-DPO-Medicina_es

Text Generation • 0.1B • Updated Jan 9, 2025 • 6 • 1

trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct

Text Generation • Updated Feb 2 • 15 • 1

mradermacher/gpt-4o-distil-Llama-3.1-8B-Instruct-PaperWitch-heresy-GGUF

8B • Updated Feb 19 • 764 • 5

lewtun/zephyr-7b-dpo-full

Text Generation • 7B • Updated Jan 5, 2024 • 3

alignment-handbook/zephyr-7b-dpo-full

Text Generation • 7B • Updated Jan 10, 2024 • 18 • 3

alignment-handbook/zephyr-7b-dpo-qlora

Updated Jan 9, 2024 • 7 • 9

amirali1985/gpt-neo-125m_hh_reward

Text Generation • 0.1B • Updated Apr 27, 2024 • 57

lewtun/zephyr-7b-dpo-qlora

Updated Jan 9, 2024 • 38

sambar/zephyr-7b-ipo-lora

Text Generation • Updated Jan 5, 2024 • 1

nlee282/moai-dpo-1.0

Updated Jan 5, 2024 • 6

nikkoyabut/merged_model_dpo

Updated Jan 5, 2024 • 6

sambar/zephyr-7b-ipo-lora-5ep

Text Generation • Updated Jan 6, 2024

alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo

Text Generation • 1B • Updated Jan 7, 2024 • 8 • 2

AlbelTec/mistral-dpo-old

Updated Jan 7, 2024 • 5

Yaxin1992/mixtral-dpo-1000

Updated Jan 9, 2024 • 5

adhi29/openhermes-mistral-dpo-gptq

Updated Jan 10, 2024

ybelkada/test-tags-model

Text Generation • 1.03M • Updated Jan 9, 2024 • 7

ybelkada/test-tags-model-2

Text Generation • 1.03M • Updated Jan 9, 2024 • 1

justinj92/dpoplatypus-phi2

Text Generation • 3B • Updated Jan 10, 2024

Belred/mistral-dpo

Updated Jan 9, 2024 • 4

lewtun/zephyr-7b-dpo-qlora-8e0975a

Updated Jan 10, 2024 • 8

mecoaoge2/results

Updated Jan 10, 2024 • 3

mecoaoge2/fununun

Updated Jan 10, 2024 • 4

akashkumarbtc/openhermes-mistral-dpo-gptq

Updated Jan 10, 2024

darshan8950/openhermes-mistral-dpo-gptq

Updated Jan 10, 2024

sonu2023/mistral-dpo

Updated Jan 11, 2024

ondevicellm/zephyr-7b-dpo-full

Text Generation • 7B • Updated Jan 12, 2024 • 4

jdang/openhermes-mistral-dpo-gptq

Updated Jan 20, 2024