liweiqing's picture

liweiqing

lwq

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

liked a model 5 months ago

ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1

new activity 6 months ago

ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1:If possible, I’d really appreciate it if you could share the related technical report.

View all activity

Organizations

upvoted a paper 1 day ago

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Paper • 2605.25604 • Published 2 days ago • 121

upvoted a paper 9 months ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 37