Industry Voices: The Tesla Emphasis on Vision

Opinion piece by Brad Rosen, co-founder and chief operating officer of NODAR, a provider of multi-camera 3D vision technology for advanced driver-assistant systems and autonomous vehicles.

NHTSA has opened an investigation of Tesla, examining 11 collisions that happened at sites where first responders were working.

The accidents all involved either Autopilot or Traffic Aware Cruise Control. This comes on the heels of Tesla’s announcement that it has decided to drop radar from their sensor suite, putting an emphasis on camera-based Autopilot for Model 3 and Model Y vehicles. Tesla explains on their website that, “Model 3 and Model Y vehicles built for the North American market will no longer be equipped with radar. Instead, these will be the first Tesla vehicles to rely on camera vision and neural net processing to deliver Autopilot, Full-Self Driving and certain active safety features”.

This is an unusual move since many makers of autonomous vehicles fuse cameras, radar and LiDAR on vehicles while Tesla, uniquely in the market, has only used cameras and radar. Tesla announced to investors that “a vision-only system is ultimately all that is needed for full autonomy” and Tesla CEO Elon Musk tweeted that “pure vision” is being introduced to the market.

Speculators believe Tesla may have made this monumental decision to drop radar in order to reduce the cost of the sensor stack since their vehicles require multiple cameras. Another factor may be that the radar is known to generate false alarms that cause ambiguity, require additional processing to analyze and may degrade the overall reliability of the system.

Tesla is the only company doing this today. One question that can be raised is: Will Tesla be able to get reliable depth maps and real range measurements to give them the reliability that they will need with a monocular system? A human driver relies on vision, however, a human can employ other cognitive abilities and is not completely reliant on vision alone.

A camera-based system using a monocular approach that relies on AI and training will make it difficult to attain a high degree of confidence in depth measurement. AI can only be as good as the training, which would require training for every possible edge case, and could potentially confuse an adult 10 m (33 ft.) away versus a child 100 m (328 ft.) away. Identifying what an object actually is can be quite difficult. Tesla plans to solve this with more data and training for potential edge cases.

To solve some of these issues on the market today, there are camera-based 3D vision technologies that apply advances in compute, CMOS technology and computer vision, and can precisely produce depth maps up to 1,000 m (3,281 ft.) in real-time. Mounted on the vehicle are multiple cameras with overlapping views that use auto-calibration software so the cameras can be placed far apart, providing long-range sensing with exact accuracy.

Additionally, this approach is well-suited for all environments including harsh weather conditions. Another aspect of this technology is a confidence scoring system that rates each image based on reliability – it can say the image is 80% accurate (rather than an all-or-nothing approach). This way the more precise aspects of the image can be used while the less visible aspects are discarded.

Having the right sensor system is crucial to the safety of ADAS and autonomous vehicles, making consumers and lawmakers confident and moving the industry forward. Tesla’s Autopilot has reduced collisions dramatically but still has some hurdles to overcome. The industry is waiting to see the direction pure vision takes and the advances a vision-only approach can achieve.

Leave a comment

Your email address will not be published. Required fields are marked *