Inference Providers
Active filters: dpo, trl
VladShash/deepseek-math-7b-lean-prover-dpo-olmo-3
Text Generation
• 7B • Updated • 3.72k
• 3
TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF
7B • Updated • 2.3k
• 126
davidquicast/SmolLM2-FT-DPO-Medicina_es
Text Generation
• 0.1B • Updated • 6
• 1
trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct
Text Generation
• Updated • 15
• 1
mradermacher/gpt-4o-distil-Llama-3.1-8B-Instruct-PaperWitch-heresy-GGUF
8B • Updated • 764
• 5
lewtun/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 3
alignment-handbook/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 18
• 3
alignment-handbook/zephyr-7b-dpo-qlora
Updated • 7
• 9
amirali1985/gpt-neo-125m_hh_reward
Text Generation
• 0.1B • Updated • 57
lewtun/zephyr-7b-dpo-qlora
sambar/zephyr-7b-ipo-lora
Text Generation
• Updated • 1
nikkoyabut/merged_model_dpo
sambar/zephyr-7b-ipo-lora-5ep
Text Generation
• Updated alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo
Text Generation
• 1B • Updated • 8
• 2
Yaxin1992/mixtral-dpo-1000
adhi29/openhermes-mistral-dpo-gptq
Updated
Text Generation
• 1.03M • Updated • 7
ybelkada/test-tags-model-2
Text Generation
• 1.03M • Updated • 1
justinj92/dpoplatypus-phi2
Text Generation
• 3B • Updated lewtun/zephyr-7b-dpo-qlora-8e0975a
akashkumarbtc/openhermes-mistral-dpo-gptq
Updated
darshan8950/openhermes-mistral-dpo-gptq
Updated
ondevicellm/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 4
jdang/openhermes-mistral-dpo-gptq
Updated