Parallel Domain says autonomous driving won’t scale without synthetic data • TechCrunch

Achieving safe autonomous driving requires almost endless hours of training software on all the situations that could arise before putting a vehicle on the road. Historically, autonomous companies have collected hordes of real-world data with which to train their algorithms, but it’s impossible to train an edge case management system based on real-world data alone. Not only that, but it even takes a lot of time to collect, sort, and label all that data in the first place.

Most autonomous vehicle companies, like Cruise, Waymo, and Waabi, use synthetic data to train and test perception models with a speed and level of control not possible with data collected in the real world. Parallel domaina startup that has built a data-generating platform for self-driving businesses says synthetic data is a critical component to scaling the AI ​​that powers vision and perception systems and preparing them for the unpredictability of the physical world.

The startup just closed a $30 million Series B led by March Capital, with the participation of returning investors Costanoa Ventures, Foundry Group, Calibrate Ventures and Ubiquity Ventures. Parallel Domain has focused on the automotive market, providing synthetic data to some of the major OEMs that build advanced driver assistance systems and self-driving companies that build much more advanced self-driving systems. Now, Parallel Domain is poised to expand into drones and mobile computer vision, according to co-founder and CEO Kevin McNamara.

“We’re also really doubling down on generative AI approaches for content generation,” McNamara told TechCrunch. “How can we use some of the advances in generative AI to bring greater diversity of objects, people, and behaviors into our worlds?” Because again, the hardest part here is really, once you have a physically accurate renderer, how are you actually going to create the million different scenarios that a car will have to encounter? »

The startup also wants to hire a team to support its growing customer base in North America, Europe and Asia, according to McNamara.

Construction of a virtual world

A sample of Parallel Domain synthetic data. Image credit: Parallel domain

When Parallel Domain was founded in 2017, the startup was hyper-focused on creating virtual worlds based on real-world map data. Over the past five years, Parallel Domain has added to its global generation by filling it with cars, people, different times of day, weather, and the whole range of behaviors that make these worlds interesting. This allows customers – whose Parallel Domain account Google, Continental, Woven Planet and Toyota Research Institute — to generate dynamic camera, radar and lidar data they would need to train and test their vision and perception systems, McNamara said.

Parallel Domain’s synthetic data platform consists of two modes: training and testing. During training, customers will describe the high-level parameters – for example, highway driving with 50% rain, 20% at night, and an ambulance in each sequence – on which they wish to train their model and the system will generate hundreds of thousands of examples to meet these parameters.

On the testing side, Parallel Domain offers an API that allows the customer to control the placement of dynamic elements in the world, which can then be plugged into their simulator to test specific scenarios.

Waymo, for example, is particularly fond of using synthetic data to test different weather situation, the company told TechCrunch. (Disclaimer: Waymo is not a confirmed Parallel Domain customer.) Waymo considers weather a new goal that it can apply to all miles it has traveled in the real world and in simulation, because it would be impossible to remember all these experiences with arbitrary weather conditions. conditions.

Whether for testing or training, each time Parallel Domain’s software creates a simulation, it is able to automatically generate labels corresponding to each simulated agent. This helps machine learning teams perform supervised learning and testing without having to go through the arduous process of labeling the data itself.

Parallel Domain envisions a world in which autonomous enterprises use synthetic data for most, if not all, of their training and testing needs. Today, the ratio of synthetic data to real-world data varies from company to company. More established companies with the historical resources to have collected a lot of data use synthetic data for around 20% to 40% of their needs, while companies that are earlier in their product development process rely on 80% on synthesis versus 20% in the real world, according to McNamara.

Julia Klein, a partner at March Capital and now a board member of Parallel Domain, said she believes synthetic data will play a critical role in the future of machine learning.

“Getting the real-world data you need to train computer vision models is often a hurdle and there are delays in terms of being able to get that data, label that data, prepare it for a position where they can actually be used,” Klein told TechCrunch. “What we’ve seen with Parallel Domain is that they speed up that process dramatically, and they also address things that you might not get not even in real-world datasets.”