HyperFrames: When AI Agents Write HTML, Video Becomes as Easy as Talking
HeyGen open-sourced HyperFrames in March 2026 — a rendering framework that defines videos as HTML. Less than two months later, it has crossed 12,700 GitHub stars. Its core proposition fits in one sentence: Write HTML. Render video. Built for agents.
This isn't yet another "generate video with code" tool. Its design philosophy is more radical — it bets that AI agents are best at writing HTML, so the video definition format should just be HTML.
Why This Matters
Traditional AI video generation follows two paths:
- Generative models (Sora, Kling) — pixels from text, creative but uncontrollable, uneditable, non-reproducible.
- Template-driven (Canva API, Remotion) — code-controlled video, precise but requires React/TSX that agents struggle with.
HyperFrames takes a third path: HTML-native + deterministic rendering + agent-first design.
Its core insight: LLMs are trained on mountains of HTML. They already "know" how to write it. No need to teach agents React components, custom DSLs, or complex API call sequences. The agent writes HTML like it would a web page, and the framework turns it into video.
A Video Is Just HTML
Here's the complete data structure of a HyperFrames video:
<div id="root" data-composition-id="intro"
data-start="0" data-width="1920" data-height="1080">
<h1 id="title" class="clip"
data-start="0" data-duration="5" data-track-index="0"
style="font-size: 72px; color: white;">
Hello, HyperFrames!
</h1>
<audio id="bg-music" data-start="0" data-duration="5"
data-track-index="1" data-volume="0.5" src="music.wav">
</audio>
</div>
<script>
const tl = gsap.timeline({ paused: true });
tl.from("#title", { opacity: 0, y: -50, duration: 1 }, 0);
window.__timelines["intro"] = tl;
</script>
That's it. data-start controls timing, data-duration controls length, data-track-index controls layering, GSAP controls animation. Browser preview works, npx hyperframes render outputs MP4.
The Elegance of AI Agent Integration
This is HyperFrames' true killer feature. Compare the "cognitive burden" for an agent to produce the same effect:
| Task | Remotion (React) | HyperFrames (HTML) |
|---|---|---|
| Define an animated title | TSX component + React state + useCurrentFrame() | An <h1> + data-* attributes |
| Add background music | <Audio> component + import + src path |
<audio> tag |
| Animation control | interpolate() + spring() | GSAP timeline (massive training data) |
| Build step | Bundler required (Webpack/Vite) | None, index.html previews directly |
| Render video | npx remotion render | npx hyperframes render |
The critical difference: the agent's ability to generate HTML is already baked in through pre-training. No few-shot examples needed, no fine-tuning, no elaborate system prompt teaching it "how to write Remotion components." You tell the agent "make a 10-second product intro," it outputs HTML, the framework renders it.
The HyperFrames team even built a complete Skills system — installable agent skill documents that teach agents how to write correct compositions:
npx skills add heygen-com/hyperframes
After installation, Claude Code, Cursor, and Codex can create videos via the /hyperframes command. This isn't "agent calling a video API" — this is agent directly writing video source code.
The Rendering Pipeline: Determinism Is Key
HyperFrames' rendering engine does one crucial thing: completely eliminates time dependencies.
frame = floor(time × fps)
Every frame is captured independently. The engine launches headless Chrome via Puppeteer, captures frames one by one using the beginFrame API, then encodes to MP4 via FFmpeg. No wall-clock dependencies, no frame drift from animation stuttering. Same input always produces identical output.
This is essential for agent workflows — you can't accept "slightly different results each run" for video output.
Running a Demo Locally
I built a 16-second demo video with HyperFrames — 4 scenes, each with GSAP entrance animations:
The entire process used a single HTML file. From npx hyperframes init to render completion: under 3 minutes. Code volume: one index.html at 200 lines, mostly CSS and GSAP animation.
The core workflow is four steps:
- Define scenes —
<div class="clip" data-start="0" data-duration="4"> - Write styles — standard CSS: gradients, Flexbox, fonts
- Add animation —
gsap.from("#title", { y: 60, opacity: 0 }) - Render —
npx hyperframes render --output demo.mp4
HyperFrames vs Remotion
HyperFrames is clearly inspired by Remotion (source code retains attribution comments). Both use headless Chrome for deterministic rendering. The core difference is what the author writes:
| HyperFrames | Remotion | |
|---|---|---|
| Authoring | HTML + CSS + GSAP | React components (TSX) |
| Build step | None; index.html plays as-is | Bundler required |
| Library animations | Frame-accurate seekable | Wall-clock during render |
| HTML/CSS passthrough | Paste and animate | Rewrite as JSX |
| Distributed rendering | Single-machine today | Lambda, production-ready |
| License | Apache 2.0 (fully open) | Source-available, paid tiers |
For agent-driven workflows, HTML-native is a decisive advantage. React components require understanding component lifecycles, hooks, and import paths — common failure modes for agents. HTML's fault tolerance is significantly higher.
The Ecosystem
HyperFrames isn't just a rendering engine — it's building a complete ecosystem:
- 50+ reusable blocks — social overlays, shader transitions, data visualizations, cinematic effects; install with
npx hyperframes add <name> - Studio editor — browser-based visual editor with hot reload
- Player Web Component — embeddable
<hyperframes-player>tag - Skills system — 6 agent skills covering composition authoring, CLI operations, GSAP animation, Remotion migration, and more
- MCP integration — exposes tools to agents via Model Context Protocol
Best Use Cases
HyperFrames excels at:
- Agent-generated videos — GitHub repo demos, data visualizations, product intros
- Programmatic batch video — same template, different data; deterministic rendering guarantees consistency
- HTML-to-video — web pages, emails, reports turned into video directly
- Rapid prototyping — no After Effects needed, usable video in minutes
Less ideal for: large-scale distributed production rendering (single-machine only today), pixel-perfect cinematic post-production.
Getting Started
# Fastest: let an agent do it
npx skills add heygen-com/hyperframes
# Then describe the video you want
# Manual
npx hyperframes init my-video
cd my-video
npx hyperframes preview # browser preview
npx hyperframes render # render to MP4
Requirements: Node.js >= 22, FFmpeg. Licensed under Apache 2.0 — fully open source, commercially usable.
HyperFrames represents an interesting trend: AI tools are being redesigned around LLM-native capabilities. Instead of teaching agents new APIs, tools are adapting to formats agents already understand. HTML is just the beginning — next could be SQL, SVG, or any structured language LLMs have already "learned."