Google’s URL2Video can turn websites into videos using AI

Google has some sweet new artificial intelligence technology that can take elements of a website and convert them into a really slick video.

In this multi-channel world we live in, brands spend an awful amount of time and money reformatting content for different platforms.

A new project from Google Research, recently published on the Google AI Blog, is called URL2Video. This automatically converts a web page into a short video and the great thing is, it’s capable of formatting that video in different aspect ratios, suiting both vertical and horizontal orientations.

The tool interrogates the website code and walks the DOM lokoing for multimedia elements, headings, images, video etc that it can leverage to create the content. Then based on the extracted assets, arranges the content in a visually interesting way, then renders it as an MP4 video.

This is a really impressive use of AI to automate a business process that would normally take hours, into just seconds. The process is fairly human-like in its abiltiy to make editing decisions on font and color choices, timing, and content ordering from the source page.

At the moment this is simply an exploratory exercise, but its easy to see how useful this could be and don’t be surprised to see Google offer this as a service in the future.

To understand how it works, you can watch the video, or read the full description below.

URL2Video Overview
Assume a user provides an URL to a web page that illustrates their business. The URL2Video pipeline automatically selects key content from the page and decides the temporal and visual presentation of each asset, based on a set of heuristics derived from an interview study with designers who were familiar with web design and video ad creation.

These designer-informed heuristics capture common video editing styles, including content hierarchy, constraining the amount of information in a shot and its time duration, providing consistent color and style for branding, and more.

Using this information, the URL2Video pipeline parses a web page, analyzing the content and selecting visually salient text or images while preserving their design styles, which it organizes according to the video specifications provided by the user.

Webpage Analysis

Given a webpage URL, URL2Video extracts document object model (DOM) information and multimedia materials. For the purposes of our research prototype, we limited the domain to static web pages that contain salient assets and headings preserved in an HTML hierarchy that follows recent web design principles, which encourage the use of prominent elements, distinct sections, and an order of visual focus that guides readers in perceiving information.

URL2Video identifies such visually-distinguishable elements as a candidate list of asset groups, each of which may contain a heading, a product image, detailed descriptions, and call-to-action buttons, and captures both the raw assets (text and multimedia files) and detailed design specifications (HTML tags, CSS styles, and rendered locations) for each element.

It then ranks the asset groups by assigning each a priority score based on their visual appearance and annotations, including their HTML tags, rendered sizes, and ordering shown on the page. In this way, an asset group that occupies a larger area at the top of the page receives a higher score.

Constraints-Based Asset Selection

We consider two goals when composing a video: (1) each video shot should provide concise information, and (2) the visual design should be consistent with the source page.

Based on these goals and the video constraints provided by the user, including the intended video duration (in seconds) and aspect ratio (commonly 16:9, 4:3, 1:1, etc.), URL2Video automatically selects and orders the asset groups to optimize the total priority score.

To make the content concise, it presents only dominant elements from a page, such as a headline and a few multimedia assets.

It constrains the duration of each visual element for viewers to perceive the content. In this way, a short video highlights the most salient information from the top of the page, and a longer video contains more campaigns or products.

Scene Composition & Video Rendering

Given an ordered list of assets based on the DOM hierarchy, URL2Video follows the design heuristics obtained from interview studies to make decisions about both the temporal and spatial arrangement to present the assets in individual shots.

It transfers the graphical layout of elements into the video’s aspect ratio, and applies the style choices including fonts and colors. To make a video more dynamic and engaging, it adjusts the presentation timing of assets. Finally, it renders the content into a video in the MPEG-4 container format.

User Control

The interface to the research prototype allows the user to review the design attributes in each video shot extracted from the source page, reorder the materials, change the detailed design, such as colors and fonts, and adjust the constraints to generate a new video.

More information at the Google AI Blog.

Jason Cartwright
Jason Cartwright
Creator of techAU, Jason has spent the dozen+ years covering technology in Australia and around the world. Bringing a background in multimedia and passion for technology to the job, Cartwright delivers detailed product reviews, event coverage and industry news on a daily basis. Disclaimer: Tesla Shareholder from 20/01/2021

Leave a Reply


Must Read

Latest Reviews