LiDAR, or light detection and ranging, is a type of remote sensing technology MobiDev has been using extensively in computer vision projects since 2019. In this article, we’ll delve into the Lidar-based application development process. Note that mobile Lidar used in Apple devices relies on different approaches and technology stack. You can read our dedicated article on using LiDAR for measurement app development. Here, we focus on stand-alone Lidar, the pros and cons of working with its data, and what tasks we can complete within Lidar-based applications.
Working with Lidar Point Cloud
To make this text shorter, let’s strip down our discussion to the three main parts:
- The data, and tasks suitable for Lidars
- Common challenges and how we can resolve them
- Hardware components
The most important aspect of working with Lidar, is understanding its specific type of data and what information could be extracted from it. Lidars provide a detailed scan of the environment in a 3D space. Depending on the sensor’s type and size, the distance of the scan will differ. What’s common for any Lidar is that scanning of the surrounding objects is done in straight lines both vertically and horizontally. Which means Lidar can’t “see” behind the object, which is why multiple Lidars are usually used to perform complex tasks where objects obstruct each other.
Point clouds are the data Lidar devices produce by emitting a laser and constantly measuring the distance to surrounding objects. Usually, point clouds include XYZ coordinates, but may include other channels like velocity, intensity, or ambient light.
3D point cloud taken from autonomous vehicle Lidar
Point cloud data has a number of advantages over optical sensors used in standard computer vision:
- Accurate measurement. The way Lidar works is similar to what a laser rangefinder does, providing accurate depth perception within the application.
- Data privacy. Point clouds preserve data privacy, since dotted patterns don’t show people’s faces, car license numbers, or other sensitive information.
- Less distortion. Compared to video or still images, Lidar data is much closer to the real dimensions of the objects, because we don’t have to deal with optical distortion of the lens itself, image distortion caused by aberrations, coating defects, and light distorting the image itself.
- Better low light performance. As Lidar relies on laser beams, it can gather data in total darkness without any impact on the final output, compared to optical sensors that require enough reflected light to form the image.
This doesn’t mean Lidar is superior to computer vision techniques based on image/video captioning. This is because Lidar has a limited amount of information we can extract, like text, color, smaller details of the object, etc. That is why each type of data and sensors are used for appropriate tasks, or in conjunction with one another. So let’s discuss in more detail how we can handle point clouds and which tasks we can perform using them.
Depending on the Lidar resolution, point clouds may include more than 65,000 points. Similar to image/video based computer vision, point clouds also need to be preprocessed to extract only useful information for the task. Bulk processing is rarely required, since point clouds will fetch a lot of surrounding information that only adds weight to the whole system. Here are some of the most common preprocessing techniques we use on our Lidar projects:
Different point cloud density levels
Downsampling. This is the reduction of point cloud density for efficient processing, especially if the original data is too dense for efficient processing. Methods like voxel grid downsampling or random sampling can be applied.
Outlier removal. This technique includes identifying and removing outlier points that are caused by sensor noise or measurement errors.
Ground removal. This is the process of separating ground points from non-ground points, since ground will usually dominate point clouds.
Normalization. Normalize the point cloud data to a common reference frame or coordinate system. This ensures consistency and facilitates the comparison of Lidar data collected at different times or from different devices.
Smoothing, cropping and filtering. This reduces noise and enhances the overall quality of point clouds.
Transformations. Some basic operations on point clouds include shifting, rotating along the axis, scaling, mirroring, etc. These processes can be used for centering or axis-aligning the point cloud or aligning point clouds from different sources. There are both manual and automatic solutions for this task.
For the majority of applications, we’ll need to process Lidar output on the fly. Preprocessing, as a set of automated operations, makes it less resource-intensive for the target device, and generally improves task performance. The set of preprocessing operations will depend on the task itself, number of Lidar and data-gathering conditions.
Segmentation and Object Detection
Point clouds can be fed to different algorithms to perform segmentation and object detection tasks. Two common approaches here are machine learning algorithms based on radial search, and neural networks.
Radial search algorithms detect big clusters of points within a specified radius. These algorithms are easy to explain and fast, but require a lot of preprocessing work to provide decent results. However, they can be extremely beneficial to the development process since they don’t require training.
Neural networks like VoxelNet, PointCNN, or SparsConvNet, are trained for point cloud segmentation. In this case, we need to gather a dataset and training procedure. Although the output is usually more accurate, as neural networks can handle more difficult cases and data fed to the model requires significantly less preprocessing. Additionally, such models can handle multiple tasks at once, for example 3D segmentation and object classification.
Object classification using Lidar data. Image source: semanticscholar.org
Object classification can be done more efficiently with optical sensors, because objects often differ by color or much smaller details (car curves, face details, etc). However, this is still possible with point clouds, because Lidars provide enough detail to classify objects. What’s beneficial here is that we can perform classification using public shots, because point clouds are anonymous.
Point cloud details
Classification with point clouds is also done with neural networks, either with 3D and 2D data. Lidars that have additional data channels like intensity, ambient, or reflective data can serve as a source of 2D data. Since trained 2D neural networks can be found and applied to the task with minimal finetuning.
Movement Prediction and Tracking
So, objects can be detected or segmented on every separate frame, but often we also need to track these objects for navigation purposes. Providing a tracking ID to every unique object in the frame, we can associate these objects between the frames and predict movement trajectories.
Models like JPDAF, SORT, and DeepSORT perform 3D multiple object tracking out of the box, as they are used extensively in autonomous vehicles. Deep learning approaches such as RNN and Graph Networks can also be used for object tracking.
Navigation and tracking in outdoor environments are some of the most common scenarios for commercial Lidar devices.
However, there are much lighter versions of Lidar sensors, e.g. small Lidars used in iPhones and iPads via TrueDepth camera, or even Sony cameras to support autofocus, and through-the-camera measurements of physical distances. We can also implement navigation applications for indoor and outdoor environments through mobile devices. But with smaller Lidar, maximum measuring distance will differ. The data density will provide less detailed mapping compared to big industrial Lidar devices as well.
Depending on the task, different Lidar configurations and additional layers of hardware/software may be required.
Working with Lidar Sensors and Edge Devices
Here, I’ll discuss the second part, the hardware components. In case we’re developing a Lidar system that relies on a standalone device, we need to know what hardware infrastructure will be needed to set up all the pieces together in a single pipeline.
Lidar vendors offer devices that generate different data formats and types of measurements we can work with. To interact with the device itself and extract the data output, we need an API to perform the following tasks:
- Connect to the sensors
- Setup and configure sensor’s work
- Stream data from sensors to other software applications
- Record data and read recordings
- Visualize data
These are the main ones, but there are a lot of smaller configurations that need to be done on the sensor. In the case that there are multiple Lidars in a single system, the configuration procedure will take more time, since all the devices need to be synchronized, the data has to be interpreted to some common format, etc.
However, since we’ll receive huge amounts of data, we need to process it closer to the device itself since data transmission doesn’t help with real-time processing. For this purpose, we have used edge devices like Nvidia Xaviers to provide mobility for the system and implement an offline mode of work. Edge devices have fewer computing resources and specific architecture, so our general approach is to use C++ to overcome platform limitations and integration issues. Because C++ is a low level language that provides the fastest computational speed.
Edge devices are small computers optimized for machine learning tasks that provide a range of benefits such as real-time processing of the data input, bandwidth efficiency, energy efficiency and regulatory compliance. As an addition, edge devices can interact with other devices, including different types of sensors. You can read about our edge biometrics project that implements ML for face and voice recognition to learn more about the capabilities of edge devices.
Read also:Edge biometrics in security
Overcoming Lidar Application Development Challenges
Now we’re entering the last part of our discussion – the common challenges of working with Lidars and how we can overcome them.
A well known issue with Lidars is poor scanning of black-painted objects. Since Lidar receives data through a laser beam reflected off the surface, black paint can absorb a large part of the light pulse sent from the sensor. This will result in a corrupted point cloud of an object. This can be especially dangerous with autonomous vehicles not being able to recognize, say, a black car.
The issue can be addressed in multiple ways, by either getting the supplement input from another sensor (camera, radar), providing additional lighting to the scene, or solving the problem algorithmically. Our approach is to develop a separate model that performs the task of filling in missing areas with values to detect black objects.
Low frame rate
Point clouds and videos both represent discrete sources of data. However, the majority of optical sensors set up for machine vision provide at least 30 frames per second, while Lidar generates a new frame every 50ms, which is approximately 20 FPS. Such frame rate can be too low for dynamic environments, which is why we need to address this problem as well.
- We can use multiple sensors to create a system where sensors gather data in shifts. While it requires careful data synchronization, this way we can merge the output from the sensors to generate more frames per second.
- Alternatively, we use ML algorithms to predict in-between frames.
We’ve used both approaches in our projects, and there are advantages to both. However, multi-sensor setup is a more reliable option in our experience.
As I’ve mentioned before, real-time processing of Lidar data will be required in the majority of cases. Data preprocessing and optimization of the whole system is basically the key to implement real-time processing. We have experience with the following ways of optimization:
- Code base written in C++ to provide fastest performance
- System design optimization
- Algorithm choice based on the benchmark and testing
- Model architecture selection
Similar optimization techniques are used in real-time video processing for machine learning and computer vision.
Launching a Lidar-based CV project together
While Lidar data may be uncommon for automating visual tasks, it doesn’t necessarily mean more complex and costly projects. Lidar is just an alternative way to provide a computer with data about its surroundings. Compared to video or images, in some cases Lidar is a preferred type of sensor because of the data properties we can extract and how we can process this information. At MobiDev, we continue to extend our expertise with Lidar devices so that we are well prepared to handle standard issues and fulfill optimization requirements with various devices.