aiGalen Guan

HyperFrames: When AI Agents Write HTML, Video Becomes as Easy as Talking

HeyGen open-sourced HyperFrames in March 2026 — a rendering framework that defines videos as HTML. Less than two months later, it has crossed 12,700 GitHub stars. Its core proposition fits in one sentence: Write HTML. Render video. Built for agents.

This isn't yet another "generate video with code" tool. Its design philosophy is more radical — it bets that AI agents are best at writing HTML, so the video definition format should just be HTML.

Why This Matters

Traditional AI video generation follows two paths:

  1. Generative models (Sora, Kling) — pixels from text, creative but uncontrollable, uneditable, non-reproducible.
  2. Template-driven (Canva API, Remotion) — code-controlled video, precise but requires React/TSX that agents struggle with.

HyperFrames takes a third path: HTML-native + deterministic rendering + agent-first design.

Its core insight: LLMs are trained on mountains of HTML. They already "know" how to write it. No need to teach agents React components, custom DSLs, or complex API call sequences. The agent writes HTML like it would a web page, and the framework turns it into video.

A Video Is Just HTML

Here's the complete data structure of a HyperFrames video:

<div id="root" data-composition-id="intro"
     data-start="0" data-width="1920" data-height="1080">

  <h1 id="title" class="clip"
      data-start="0" data-duration="5" data-track-index="0"
      style="font-size: 72px; color: white;">
    Hello, HyperFrames!
  </h1>

  <audio id="bg-music" data-start="0" data-duration="5"
         data-track-index="1" data-volume="0.5" src="music.wav">
  </audio>
</div>

<script>
  const tl = gsap.timeline({ paused: true });
  tl.from("#title", { opacity: 0, y: -50, duration: 1 }, 0);
  window.__timelines["intro"] = tl;
</script>

That's it. data-start controls timing, data-duration controls length, data-track-index controls layering, GSAP controls animation. Browser preview works, npx hyperframes render outputs MP4.

The Elegance of AI Agent Integration

This is HyperFrames' true killer feature. Compare the "cognitive burden" for an agent to produce the same effect:

Task Remotion (React) HyperFrames (HTML)
Define an animated title TSX component + React state + useCurrentFrame() An <h1> + data-* attributes
Add background music <Audio> component + import + src path <audio> tag
Animation control interpolate() + spring() GSAP timeline (massive training data)
Build step Bundler required (Webpack/Vite) None, index.html previews directly
Render video npx remotion render npx hyperframes render

The critical difference: the agent's ability to generate HTML is already baked in through pre-training. No few-shot examples needed, no fine-tuning, no elaborate system prompt teaching it "how to write Remotion components." You tell the agent "make a 10-second product intro," it outputs HTML, the framework renders it.

The HyperFrames team even built a complete Skills system — installable agent skill documents that teach agents how to write correct compositions:

npx skills add heygen-com/hyperframes

After installation, Claude Code, Cursor, and Codex can create videos via the /hyperframes command. This isn't "agent calling a video API" — this is agent directly writing video source code.

The Rendering Pipeline: Determinism Is Key

HyperFrames' rendering engine does one crucial thing: completely eliminates time dependencies.

frame = floor(time × fps)

Every frame is captured independently. The engine launches headless Chrome via Puppeteer, captures frames one by one using the beginFrame API, then encodes to MP4 via FFmpeg. No wall-clock dependencies, no frame drift from animation stuttering. Same input always produces identical output.

This is essential for agent workflows — you can't accept "slightly different results each run" for video output.

Running a Demo Locally

I built a 16-second demo video with HyperFrames — 4 scenes, each with GSAP entrance animations:

The entire process used a single HTML file. From npx hyperframes init to render completion: under 3 minutes. Code volume: one index.html at 200 lines, mostly CSS and GSAP animation.

The core workflow is four steps:

  1. Define scenes<div class="clip" data-start="0" data-duration="4">
  2. Write styles — standard CSS: gradients, Flexbox, fonts
  3. Add animationgsap.from("#title", { y: 60, opacity: 0 })
  4. Rendernpx hyperframes render --output demo.mp4

HyperFrames vs Remotion

HyperFrames is clearly inspired by Remotion (source code retains attribution comments). Both use headless Chrome for deterministic rendering. The core difference is what the author writes:

HyperFrames Remotion
Authoring HTML + CSS + GSAP React components (TSX)
Build step None; index.html plays as-is Bundler required
Library animations Frame-accurate seekable Wall-clock during render
HTML/CSS passthrough Paste and animate Rewrite as JSX
Distributed rendering Single-machine today Lambda, production-ready
License Apache 2.0 (fully open) Source-available, paid tiers

For agent-driven workflows, HTML-native is a decisive advantage. React components require understanding component lifecycles, hooks, and import paths — common failure modes for agents. HTML's fault tolerance is significantly higher.

The Ecosystem

HyperFrames isn't just a rendering engine — it's building a complete ecosystem:

  • 50+ reusable blocks — social overlays, shader transitions, data visualizations, cinematic effects; install with npx hyperframes add <name>
  • Studio editor — browser-based visual editor with hot reload
  • Player Web Component — embeddable <hyperframes-player> tag
  • Skills system — 6 agent skills covering composition authoring, CLI operations, GSAP animation, Remotion migration, and more
  • MCP integration — exposes tools to agents via Model Context Protocol

Best Use Cases

HyperFrames excels at:

  • Agent-generated videos — GitHub repo demos, data visualizations, product intros
  • Programmatic batch video — same template, different data; deterministic rendering guarantees consistency
  • HTML-to-video — web pages, emails, reports turned into video directly
  • Rapid prototyping — no After Effects needed, usable video in minutes

Less ideal for: large-scale distributed production rendering (single-machine only today), pixel-perfect cinematic post-production.

Getting Started

# Fastest: let an agent do it
npx skills add heygen-com/hyperframes
# Then describe the video you want

# Manual
npx hyperframes init my-video
cd my-video
npx hyperframes preview      # browser preview
npx hyperframes render       # render to MP4

Requirements: Node.js >= 22, FFmpeg. Licensed under Apache 2.0 — fully open source, commercially usable.


HyperFrames represents an interesting trend: AI tools are being redesigned around LLM-native capabilities. Instead of teaching agents new APIs, tools are adapting to formats agents already understand. HTML is just the beginning — next could be SQL, SVG, or any structured language LLMs have already "learned."

Sources