papers - a Pechyony Collection

Pechyony 's Collections

papers

updated 3 days ago

AI for Auto-Research: Roadmap & User Guide

Paper • 2605.18661 • Published May 18 • 67
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Paper • 2605.18287 • Published May 18 • 15
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Paper • 2605.16865 • Published May 16 • 9
MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Paper • 2603.28069 • Published Mar 30 • 9
VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization

Paper • 2602.09934 • Published Feb 10 • 1
Flash-WAM: Modality-Aware Distillation for World Action Models

Paper • 2606.05254 • Published 16 days ago • 7
Latent Reasoning with Normalizing Flows

Paper • 2606.06447 • Published 15 days ago • 8
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

Paper • 2606.05645 • Published 15 days ago • 2
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Paper • 2606.04811 • Published 15 days ago • 17
Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published 18 days ago • 122
Qwen-Image-Flash: Beyond Objective Design

Paper • 2606.03746 • Published 17 days ago • 35
Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

Paper • 2606.02684 • Published 18 days ago • 16
GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Paper • 2606.05160 • Published 16 days ago • 8
Unlocking Feature Learning in Gated Delta Networks at Scale

Paper • 2606.04048 • Published 17 days ago • 2
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Paper • 2605.31148 • Published 21 days ago • 3
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published 18 days ago • 230
PhysBrain 1.0 Technical Report

Paper • 2605.15298 • Published May 14 • 143
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 22 days ago • 143
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published 24 days ago • 142
RLDX-1 Technical Report

Paper • 2605.03269 • Published May 5 • 125
Robots Need More than VLA and World Models

Paper • 2606.06556 • Published 15 days ago • 29
World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Paper • 2606.12403 • Published 9 days ago • 25
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Paper • 2606.11683 • Published 9 days ago • 30
On Subquadratic Architectures: From Applications to Principles

Paper • 2606.12364 • Published 8 days ago • 23
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Paper • 2606.11324 • Published 10 days ago • 141
World Model Self-Distillation: Training World Models to Solve General Tasks

Paper • 2606.12072 • Published 9 days ago • 13
MiniMax Sparse Attention

Paper • 2606.13392 • Published 8 days ago • 138
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Paper • 2606.13672 • Published 8 days ago • 3
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Paper • 2606.14409 • Published 7 days ago • 12
μ_0: A Scalable 3D Interaction-Trace World Model

Paper • 2606.13769 • Published 8 days ago • 7
World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Paper • 2606.13652 • Published 8 days ago • 11