Decoding Gaussian Splats

•

🧵 Gaussian Splats for Robotics - a Primer 🧵 Gaussian splats are the hyped new technique people hope will revolutionize VSLAM. Is this promising tech a game changer or a Gaussian 🫠 ? Let’s explore what they are, their potential, and the hurdles they face.

Let's start by defining Gaussian splats - Imagine a tiny, elongated 3D blob floating in space, with: • 3D position (x,y,z) • Shape, orientation & uncertainty (covariance matrix) • Color • Opacity That's a splat! Together, many splats form a continuous 3D scene.

Splats are made by projecting pixels' colour & depth into a blob. Dense areas of the scene have more, smaller splats. Sparse areas have fewer, larger splats. This allows details to be represented very efficiently.

The good - Splats offer a balance between realism and efficiency for SLAM. They are dense and continuous, providing more info than feature maps. While also being way faster to render and optimise than point clouds. This has gotten people in the graphics world very excited!

So how can splats be used for SLAM? First, your robot needs a camera & the ability to determine depth. Often an RGB-D camera is used. 1/ Initialization: For the first frame, initialize the camera pose and create the initial splats.

As the robot moves around 2/ Camera Tracking: Render the current Gaussian splat map from the predicted camera pose. Compare the rendered image with the actual camera image & optimize the camera pose to minimise error.

3/ Keyframe Selection: Determine if the current frame should be a keyframe based on the overlap & deviation from the previous keyframe. 4/ Map Expansion: Identify areas in the new frame not covered by existing splats Create new Gaussian splats in these areas

5/ Map & maintenance optimization: Optimize splat parameters (position, shape, colour, opacity) to minimize errors between renders and actual images Prune unnecessary or unstable Gaussian splats.

6/ Output: A continuously updated 3D map represented by Gaussian splats + an estimated camera trajectory. This approach can render hyperrealistic scenes faster than NERFs and other cutting-edge approaches.

7/ Localisation In a given location the robot will predict where it is and compare a rendered view of the predicted pose to the actual camera view. This will be optimised until the error is reduced and an accurate pose is predicted.

Reality check: This approach is still in the research stage and has only been partially implemented in lab environments. This leaves areas for development: • Scaling to larger environments • Incorporating loop closure • Robustness in dynamic & challenging conditions

Major challenge: Computational Load Even though this approach aims to reduce computation it is still a pretty hardcode approach for robotics. Current implementations run at < 5 FPS on the highest-end GPUs, whereas industrial systems need a pose at ~30 FPS to work in the field.

Conclusion - Splats are an interesting approach but more work is needed to see them widely used in mobile robots. As the industry pushes towards edge-only computing, the question remains how scaleable splats will become i.e. will they be able to run on low-end CPUs at high FPS?

Research in the area is moving quickly & it will be exciting to follow along. For those interested here are some SoTA papers for further reading: arxiv.org/html/2405.16544v1 arxiv.org/pdf/2312.02126 arxiv.org/pdf/2311.11700 arxiv.org/html/2312.06741v1 #robotics #AI #SLAM