New camera design can identify threats faster and using less memory

Elon Musk, back in October 2021, tweeted that “humans drive with eyes and biological neural networks, so silicon cameras and neural networks are the only way to achieve a widespread solution to autonomous driving.” The problem with his logic has been that human eyes are much better than RGB cameras at detecting fast-moving objects and estimating distances. Our brains have also far outperformed all artificial neural networks in overall processing of visual information.

To close this gap, a team of scientists at the University of Zurich developed a new in-car object detection system that brings the performance of digital cameras much closer to the human eye. “Unofficial sources say that Tesla uses multiple Sony IMX490 cameras with 5.4 megapixel resolution that [capture] up to 45 frames per second, which translates into a perceptual latency of 22 milliseconds. Comparing [these] By combining only cameras with our solution, we already see a 100-fold reduction in perception latency,” says Daniel Gehrig, researcher at the University of Zurich and lead author of the study.

Replicating human vision

When a pedestrian suddenly jumps in front of your car, several things must happen before a driver assistance system initiates emergency braking. First of all, the pedestrian must be captured in images taken by a camera. The time this takes is called perceptual latency: it is a delay between the existence of a visual stimulus and its appearance in the reading of a sensor. The read must then reach a processing unit, which adds network latency of around 4 milliseconds.

The processing to classify the image of a pedestrian also requires precious milliseconds. Once this is done, the detection is passed to a decision-making algorithm, which takes some time to decide to hit the brakes; All of this processing is known as computational latency. In general, reaction time ranges from 0.1 to half a second. If the pedestrian travels at 12 km/h, he would travel between 0.3 and 1.7 meters in this time. Your car, if you drive at 50 km/h, would travel between 1.4 and 6.9 meters. In a close range encounter, this means you will most likely hit them.

Gehrig and Davide Scaramuzza, a professor at the University of Zurich and co-author of the study, aimed to shorten these reaction times by reducing perceptual and computational latencies.

The easiest way to reduce the former was to use standard high-speed cameras that simply record more frames per second. But even with a 30 to 45 fps camera, a self-driving car would generate almost 40 terabytes of data per hour. Installing something that significantly reduces perceptual latency, such as a 5,000 fps camera, would overwhelm a car’s on-board computer in an instant: computational latency would skyrocket.

So the Swiss team used something called an “event camera,” which mimics the way biological eyes work. “Compared to a frame-based video camera, which records dense images at a fixed rate (frames per second), event cameras contain independent smart pixels that only measure brightness changes,” explains Gehrig. Each of these pixels starts with a set brightness level. When the brightness change exceeds a certain threshold, the pixel registers an event and sets a new reference brightness level. All pixels in the event camera do this continuously, and each recorded event appears as a dot on an image.

This makes event cameras particularly good at detecting high-speed motion and allows them to do so using much less data. The problem with putting them in cars has been that they had trouble detecting things that were moving slowly or not moving at all relative to the camera. To solve this, Gehrig and Scaramuzza opted for a hybrid system, in which an event camera was combined with a traditional one.

Leave a Comment