-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
Abhranil Chandra PRO
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Organizations
Augmenting Pretrained FMs with Post-Training/RL
-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
RL/FM/Agent Data/Benchmark
models 60
abhranil14/L8B_on_MBPP_Code_G27B_IT_H_Paraphrased_subset_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/G2B_on_CODE_MBPP_G_601_subset_wrt_G_601_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/G2B_on_CODE_MBPP_H_774_subset_wrt_G_601_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/L8B_on_CODE_MBPP_H_774_subset_wrt_G_601_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/L8B_on_CODE_MBPP_G_601_subset_wrt_G_601_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/L8B_on_CODE_MBPP_G_601_subset_wrt_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/L8B_on_CODE_MBPP_H_774_subset_wrt_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/L8B_on_CODE_MBPP_W_354_subset_wrt_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/G2B_on_CODE_MBPP_H_774_subset_wrt_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
Updated
abhranil14/gemma2_2B_FF_gemini_flash_gold_7114_batch256_lr10e-6_warmup0.1_max_tokens_2048
Updated
datasets 5
abhranil14/VideoAgent_Data
Preview
• Updated
• 19
abhranil14/syn_qs_and_soln_cleaned_0_and_less20_multiple_soln_per_qs_1937545
Viewer
• Updated
• 1.94M • 12
abhranil14/syn_qs_and_soln_cleaned_0_and_less20_1_soln_per_qs_131845
Viewer
• Updated
• 132k • 13
abhranil14/instruct-human-assistant-prompt-clean-105k
Viewer
• Updated
• 105k • 8
abhranil14/first-instruct-human-assistant-prompt-clean-33k
Viewer
• Updated
• 33.1k • 2