S2L-PO model weights (ICML 2026). Qwen3-8B/14B trained with small-model explorers.
qishisuren
qishisuren
AI & ML interests
None yet
Recent Activity
upvoted a paper 1 day ago
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO submitted a paper 1 day ago
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO updated a model 19 days ago
qishisuren/Qwen3-14B-S2L-PO-4Bexplorer