Data Storage Trying to Keep Up with a Speeding Car

Development and testing of autonomous vehicle technology is data intensive, requiring solutions for the management and storage and of rapidly expanding volumes of data.

The primary reason for this explosion is the collection of video and LiDAR data, which has to be high-resolution. Naturally, the more sensors one has on the AV, the more data the AV would generate. An AV sensor suite may comprise of eight cameras, two LiDAR sensors and two radars sensors or it may comprise of just one LiDAR and a couple of cameras.

Depending on the operating conditions, for example the operational design domain (ODD), of the automated driving system (ADS), the manufacturers’ choice of sensor suite may differ substantially. For example, the resolution of data required for a low-speed ADS application in a business park (constrained ODD) may be different to a high-speed application on highways.

“Vehicles are collecting data 24/7 as they create a 3D world via LiDAR and video and there is no easy way to compress data,” explained Ken Obuszewski, global general manager of NetApp’s automotive vertical. “Traditional methods of data reduction don’t work in this context.” He said when it comes to managing this data, it is critical to have rich metadata about the data you have captured in order to optimize the processing of the data, while data tiering and archiving to the cloud are necessary to store and retain massive amounts of data.

Data management methods such as tiering and archiving allow for proper insight into what data gets stored, and what data gets discarded. For storing the data, Obuszewski said that because these huge volumes of data are generated in multiple locations, the challenge is how to provide and access that data where and when you need it. In short – flexibility and locality matters.

“Virtual desktop infrastructure is important, sometimes it’s easier to bring the engineer to the data,” he said. “The flexibility and delivery of data is made more efficient by the ability to create a data fabric across the data pipeline, a way to store and move data from the edge to the core and cloud and back again.” Obuszewski said some of the “heavy lifting”, such as AI processing needs, should be centralized for efficient data movement, noting data must be moved in a secure and efficient manner.

In the next 18-24 months, he sees more data processing being done at the edge, capturing event-based data. He also predicted an evolution towards more efficient use of the data, such as being able to identify and keep only what they feel is valuable and using AI at the edge to both label and identify event-based data to reduce the amount of data being retained, moved, and managed. This includes data intelligence strategies that use AI at the edge and being more targeted and determine what is being captured. “These concepts will make the data more valuable and increase the quality of the data you train against,” he said.

Obuszewski noted cloud technologies would also continue to play a major role, with data being centralized within the cloud, providing compute resources and state-of-the-art tools and applications in the development environment. “The cloud is the environment of choice for the data scientist, because it provides flexibility and availability, as well as long term data and cold data storage, offering cost efficiencies, for example, and legal and compliance needs,” he said. “Automakers are looking for ways to efficiently store this explosion of cold data. The cloud provides that option.”

Dr Siddartha Khastgir, head of verification and validation, intelligent vehicles at WMG, University of Warwick, pointed out that depending on the operating conditions, i.e. ODD of the ADS, the manufacturers’ choice of sensor suite may differ substantially. Khastgir explained the crucial aspect of AV testing is not in understanding the quantity of data but rather it resides in understanding “what data” is useful. “Data storage and management is about storing the ‘right’ data and the challenges are associated within the process of defining the right data and subsequently extracting it,” he said. “Thus, there is now a shift in focus from storing ‘all data’ to storing the ‘right data’.”

Michael Erich Hammer, who leaders the Big Data intelligence team at powertrain development, simulation and testing firm AVL, said at the moment, it is still unclear what the main strategy will be on handling and storing such large amounts of data. “The costs for hot storage of data in the Petabyte range is huge,” he noted. “Therefore, a continuous trade-off between scalability, performance and costs between cloud and on-premise storage has to be expected.”

He noted transfers of large amount of data (>10Gbit/sec created in the vehicle) will not be possible via the network for quite some time. “The workflow contains shipment of physical storage devices from the vehicle to back-office environments,” he explained. “This ‘long loop’ from data creation to information extraction, with potential for delays lasting more than a week, needs to be supported by a ‘short loop’ for live fleet monitoring during operation of the vehicles on road.”

Like Obuszewski he said when it comes to scalability in processing information out of the data, there is “no doubt” about the advantages of cloud-based services. “We expect to see hybrid solutions where workloads are executed on premise and in cloud environments,” he said. “One important point is to first reduce the data necessary to store.”

This means extracting relevant driving situations, or so-called “scenarios”, which are key to validate the AVs. “For example, highly dynamic scenarios with more traffic participants and also vulnerable road users involved are considered more interesting than just simple, constant driving scenarios,” Hammer said. “This includes scenarios where an unexpected behavior of the environment or other traffic participants as well as the AV occurs, would be of great interest.”

Khastgir pointed out that depending on the system architecture for the AV application, the cloud could play an important role in data storage and management, and like Hammer, said the challenge currently lies in being able to get the data off the AV and get it stored in the cloud quick enough. “High speed connectivity solutions like 5G may offer a solution to this challenge, along with data compression methods,” he said. “However, coverage and signal strength of 5G connectivity may pose a challenge for such an approach.”

He added that standards like ASAM OpenLabel, which are focusing on ways to label sensor data, are a big step towards getting the industry to speak the same language for raw data. There are some activities which are taking place (or have just taken place) like Data Storage System for Automated Driving (DSSAD) at UNECE, ASAM OpenLabel and the BSI PAS 1882 to provide some guidance on this aspect.

“However, a more detailed description may be required to help manufacturers overcome data storage challenges,” Khastgir said. “I do see more efforts on both technology and standardization fronts happening for off-board storage.”

Leave a comment

Your email address will not be published. Required fields are marked *