{
  "video": "video-6e65bc4d.mp4",
  "description": "This video is a presentation slide or page from a research paper titled **\"TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets\"**.\n\nThe presentation is authored by researchers including Zhiyuan Lu, Peter Scheldenhreins, Yijun Li, Long Mai, Aniruddha Mahapatra, Csauh Han, Jean Oh, and Jui-Hsien Wang, and is affiliated with Adobe Research and Carnegie Mellon University.\n\nThe main visual content consists of three example images arranged side-by-side, demonstrating the capabilities of the \"TokenDial\" method. Below each image, there is a text prompt or description indicating what is being controlled or generated.\n\nHere is a detailed breakdown of what is shown:\n\n**Overall Context:**\nThe video/slide is introducing a technique, \"TokenDial,\" which allows for continuous control over attributes in text-to-video models by using \"spatiotemporal token offsets.\"\n\n**The Three Examples (From Left to Right):**\n\n1.  **Example 1: Making the cat more kitten-like**\n    *   **Image:** A photo of a cat.\n    *   **Prompt/Control:** \"Make the cat more kitten-like\"\n    *   **Implication:** This demonstrates the model's ability to modify an existing subject (a cat) based on a specific descriptive attribute (\"kitten-like\") in a continuous manner.\n\n2.  **Example 2: Making the person older**\n    *   **Image:** A photo of a person (appears to be a middle-aged or slightly younger individual).\n    *   **Prompt/Control:** \"Make the person older\"\n    *   **Implication:** This shows control over temporal or developmental attributes, allowing the model to age the subject within the video generation.\n\n3.  **Example 3: Making the dog furrier**\n    *   **Image:** A photo of a dark-colored dog (likely a shepherd or similar breed).\n    *   **Prompt/Control:** \"Make the dog furrier\"\n    *   **Implication:** This illustrates attribute manipulation concerning physical texture or appearance (specifically, the density or fluffiness of the fur).\n\n**Concluding Text:**\nThe slide concludes with a summary sentence:\n\"Our TokenDial framework turns pretrained text-to-video models into continuous video editors. Explore our applications below:\"\n\n**In summary, the video is a promotional or introductory segment for a machine learning paper showcasing that their new framework, TokenDial, can act as a sophisticated editor for text-to-video generation, allowing users to seamlessly and continuously modify attributes like age, breed characteristics, or physical appearance of subjects in generated video content.**",
  "codec": "av1",
  "transcoded": true,
  "elapsed_s": 15.1
}