Papers
arxiv:2602.06028

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Published on Feb 5
ยท Submitted by
ShuoChen
on Feb 6
Authors:
,
,
,

Abstract

Context Forcing addresses student-teacher mismatch in long video generation by using a long-context teacher to guide long-rollout students through a Slow-Fast Memory architecture that extends context length beyond 20 seconds.

Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student's context length. To resolve this, we propose Context Forcing, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this computationally feasible for extreme durations (e.g., 2 minutes), we introduce a context management system that transforms the linearly growing context into a Slow-Fast Memory architecture, significantly reducing visual redundancy. Extensive results demonstrate that our method enables effective context lengths exceeding 20 seconds -- 2 to 10 times longer than state-of-the-art methods like LongLive and Infinite-RoPE. By leveraging this extended context, Context Forcing preserves superior consistency across long durations, surpassing state-of-the-art baselines on various long video evaluation metrics.

Community

Paper author Paper submitter
Paper author

Paper author

Paper author

Paper author

arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/context-forcing-consistent-autoregressive-video-generation-with-long-context-9536-7490abff

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.06028
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.06028 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.06028 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.06028 in a Space README.md to link it from this page.

Collections including this paper 2