Video Creation by Demonstration

More Results

Additional Qualitative Results

Something-Something v2

Epic Kitchens 100


Bottleneck Design Ablation

Qualitative results for bottleneck ablation. Applying no ("None") or temporal normalization ("Temp. Norm.") bottleneck suffers from appearance leakage, while generation based on our appearance bottleneck preserves the input context.

Comparisons with Prior Works

Qualitative comparisons of \(\delta\)-Diffusion against MotionDirector and WALT.

Something-Something v2

For MotionDirector and WALT, ground truth captions are additionally provided during inference:
Row 1: "pushing a cloth clip from right to left".
Row 2: "moving phone up".

Epic Kitchens 100

For MotionDirector and WALT, ground truth captions are additionally provided during inference:
Row 1: "put oregano back".
Row 2: "wash knife".


For MotionDirector and WALT, ground truth captions are additionally provided during inference:
Row 1: "knock blue plastic bottle over".
Row 2: "knock water bottle over".

Failure Cases

We show failure cases where the demonstration and context image are mis-matched (row 1), semantics of the action concepts are not fully carried out (row 2), and permanence is not held for objects with fast appearance changes (row 3).