Helm.ai, a provider of advanced AI software for high-end ADAS, Level 4 autonomous driving and robotic automation, has unveiled a generative AI model that produces highly realistic video sequences of driving scenes for autonomous driving development and validation.
The AI technology, known as launch VidGen-1, follows Helm.ai’s announcement of GenSim-1 for AI-generated labeled images and is useful for both prediction tasks and generative simulation.
Trained on thousands of hours of diverse driving footage, the AI video tech leverages deep neural network (DNN) architectures and deep teaching, an unsupervised training technology, to create realistic video sequences of driving scenes. These videos – produced at a resolution of 384 x 640 with variable frame rates up to 30 frames per second and lasting up to several minutes – can be generated either randomly without an input prompt or prompted with a single image or input video.
“Combining our deep teaching technology, which we’ve been developing for years, with additional in-house innovation on generative DNN architectures results in a highly effective and scalable method for producing realistic AI-generated videos. Our technology is general and can be applied equally effectively to autonomous driving, robotics and any other domain of video generation without change,” said Helm.ai’s CEO and co-founder, Vladislav Voroninski.
The company says VidGen-1 can generate videos of driving scenes across different geographies and from multiple types of cameras and vehicle perspectives.
The model can produce both highly realistic appearances and temporally consistent object motion, and learn and reproduce human-like driving behaviors, to generate motions of the ego-vehicle and surrounding agents in accordance with traffic rules.
It can simulate realistic video footage of various scenarios in multiple international cities, encompassing urban and suburban environments; a variety of vehicles; pedestrians; bicyclists; intersections; turns; weather conditions (such as rain and fog); illumination effects (like glare and night driving) and accurate reflections on wet road surfaces; reflective building walls, and the hood of the ego-vehicle.
“Predicting the next frame in a video is similar to predicting the next word in a sentence but much more high dimensional,” said Voroninski.
“Generating realistic video sequences of a driving scene represents the most advanced form of prediction for autonomous driving, as it entails accurately modeling the appearance of the real world and includes both intent prediction and path planning as implicit sub-tasks at the highest level of the stack. This capability is crucial for autonomous driving because, fundamentally, driving is about predicting what will happen next.”