Simultaneous localization and mapping (SLAM) robotics techniques: a possible application in surgery
Simultaneous localization and mapping (SLAM) is the process by which a mobile robot can construct a map of an unknown environment and simultaneously compute its location using the map (1). SLAM has been formulated and solved as a theoretical problem in many different forms. It has been implemented in several domains from indoor to outdoor, and the possibility of combining robotic in surgery issues has captured the attention of the medical community. The common point is that the accuracy of the navigation affects the success and the results of a task, independently from application field. Since its beginning, the SLAM problem has been developed and optimized in different ways. There are three main paradigms: Kalman filters (KF), particle filters and graph-based SLAM. The first two are also referred as filtering techniques, where the position and map estimates are augmented and refined by incorporating new measurements when they become available. Due to their incremental nature, these approaches are generally acknowledged as on-line SLAM techniques. Conversely, graph-based SLAM estimates the entire trajectory and the map from the full set of measurements and it is called full SLAM problem.
The choice of the type of algorithm to use depends on the peculiarities of the application and on many factors, such as the desired map resolution, the update time, the nature of the environment, the type of sensor the robot is equipped with, and so on.
This article aims to introduce the state of the art of SLAM techniques in section II and present in section III a focus on SLAM applications in medical fields. Section IV draws conclusions on the benefits of employing these techniques in surgery.
State of the art SLAM techniques
KF techniques
Smith et al. (2) were the first to present the idea of representing the structure of the navigation area in a discrete-time state-space framework, introducing the concept of stochastic map. As KF original algorithm relies on the assumption of linearity, that is rarely fulfilled, two variations are mainly employed from then: extended KF (EKF) and information filtering (IF). The EKF overcomes the linearity assumption describing the next state probability and the measurement probabilities by nonlinear functions (3). In literature, there exist several examples of the use of the EKF algorithm (4-10) and it has been the basis of many recent developments in the field (11,12).
The unscented KF (UKF) has been developed in recent years to overcome some main problems of the EKF (13). It approximates the state distribution with a Gaussian Random Variable, like in EKF, but here it is represented using a minimal set of carefully chosen sample points, called σ-points. When propagated through the nonlinear system, they capture the posterior mean and covariance accurately to the 3rd order of the Taylor series for any nonlinearity (14). Some examples of the use of UKF for navigation and localization can be found in (15-17).
The dual of the KF is the information filter, that relies on the same assumptions but the key difference arises in the way the Gaussian belief is represented. The estimated covariance and estimated state are replaced by the information matrix and information vector respectively. It brings to several advantages over the KF: the data is filtered by simply summing the information matrices and vector, providing more accurate estimates (18); the information filter tends to be numerically more stable in many applications (3). The KF is more advantageous in the prediction step because the update step is additive while UKF involves the inversion of two matrices, which means an increase of computational complexity with a high-dimension state space. Anyway, these roles are reversed in the measurement step, illustrating the dual character of Kalman and information filters.
Thrun et al. (18), from the observation that the normalized information matrix is sparse, developed the sparse extended information filter (SEIF), a variant of the EIF, that consists in an approximation which maintains a sparse representation of environmental dependencies to achieve a constant time updating. They were inspired by other works on SLAM filters that represent relative distances (19-22) but none of them are able to perform a constant time updating.
To overcome the difficulties of both EKF and IF, and to be more efficient in terms of computational complexity, a combined kalman-information filter SLAM algorithm (CF-SLAM) has been adopted in (23). It is a combination of EKF and EIF that allows to execute highly efficient SLAM in large environments.
Particle filters techniques
Particle filters (13,24,25) comprise a large family of sequential Monte Carlo algorithms (26,27); the posterior is represented by a set of random state samples, called particles. Almost any probabilistic robot model that presents a Markov chain formulation can be suitable for their application. Their accuracy increases with the available computational resource, so it doesn’t require a fixed computation time. They are also relatively easy to implement: they do not need to linearize non-linear models and do not worry about closed-form solutions of the conditional probability as in KF. The poor performance in higher dimensional spaces is their main limitation. Rao-Blackwellized particle filters (24,28-30) lead to more efficient solutions, also to the data association problem, but these algorithms are susceptible to considerable estimation inconsistencies because they generally underestimate their own error (31). The need of increasing the consistency of estimation, together with the problem of heterogeneity of the trajectory samples, brought to the adoption of different sampling strategies (32-34).
FastSLAM (24,35,36) denotes a family of algorithms that integrates particle filters and EKF. It exploits the fact that the features estimates are conditional independent given the observations, the controls, and the robot path. This implies that the mapping problem can be split into separate problems, one for each feature in the map, considering that also the single map errors are independent. FastSLAM uses particle filters for estimating the robot path and, for each particle, uses the EKF for estimating feature locations, offering computational advantages over plain EKF implementations and well coping with non-linear robot motion models. However, the particle approximation doesn’t converge uniformly in time due to the presence of the map in the state space, which is a static parameter (37).
Expectation maximization (EM) technique and improvement with missing or hidden data
The EM (38) is an efficient iterative procedure to compute parameter estimation in probabilistic models with missing or hidden data. Each iteration consists of two processes: the expectation, or E-step, estimating the missing data given the current model and the observed data; the M-step, which computes parameters maximizing the expected log-likelihood found on the E-step. The estimate of the missing data from the E-step are used in place of the actual missing data. The algorithm guarantees the convergence to a local maximum of the objective function. A real-time implementation of this algorithm is described in (39).
Since it requires the whole data being available at each iteration. an online version has been implemented (40), where there is no need to store the data since they are used sequentially. This algorithm has been used also to relax the assumption that the environment in many SLAM problems is static. Most of the existing methods are robust for mapping environments that are static, structured, and limited in size, while mapping unstructured, dynamic, or large-scale environments remains an open research problem. In literature, there are mainly two directions: partitioning the model into two maps, one holding only the static landmarks and the other holding the dynamic landmarks (41,42), or trying to track moving objects while mapping the static landmarks (43,44).
Graph-based SLAM techniques
Graph-based SLAM addresses the SLAM problem adopting a graphical formulation, which means building a graph whose nodes represent robot poses or landmarks, linked by soft constraints established by sensor measurements (45); this phase is called front-end. The back-end consists in correcting the robot poses with the goal of getting a consistent map of the environment given the constraints. The critical point concerns the configuration of the nodes: to be maximally consistent with the measurements, a large error minimization problem should be solved.
This technique has been firstly introduced by Lu and Milios (22). Bosse et al. (46) developed the ATLAS framework, which integrates global and local mapping using multiple connected local maps, circumscribing the error representation to local areas and adopting topological methods to provide a global map managing local submaps. Similarly, Estrada et al. proposed Hierarchical SLAM (47) as a technique for using independent local maps and the work of Nüchter et al. (48) aims at building an integrated SLAM system for 3D mapping.
Gutmann and Konolige (49) proposed a powerful approach to combine the network construction and loop closures detection while running an incremental estimation algorithm. Some authors (50-52) applied gradient descent to optimize the SLAM problem. Konolige (53) and Montemerlo and Thrun (54) introduced conjugate gradient into the field of SLAM, which is more efficient than gradient descent. GraphSLAM (55) reduces the dimensionality of the optimization problem through a variable elimination technique. The nonlinear constraints are linearized and the resulting least squares problem is solved using standard optimization techniques.
Visual SLAM
A distinct paragraph has been dedicated to visual SLAM, since the optical sensors are always more employed in robotics applications and specifically in medical surgery. Most vision-based systems in SLAM problems are monocular and stereo, although those based on trinocular configurations also exist (56). Monocular cameras are quite widely used (57-64) but the types of camera are various. Large-scale direct monocular SLAM (65) uses only RGB images from a monocular camera as information about the environment and sequentially builds topological map. Omnidirectional cameras are gaining popularity: they have a 360° view of the environment and given that the features stay longer in the field of view, it is easier to find and track them (66,67). To improve the accuracy of the features, some works rely on a multi-sensor system. The system of Castellanos et al. (68) consists of a 2D laser scanner and a camera, implementing an EFK-SLAM algorithm. Other examples of the use of EKF in visual SLAM can be found in (69,70).
However, a monocular system shows some weaknesses in certain situations, e.g., it requires extra computation for depth estimates, scale propagation problems or can lead to failure modes due to non-observability.
Stereo systems are hugely adopted in different environments, for both landmark detection and motion estimation (71-73) in indoor (74-79) and outdoor environments (80-82). The adoption of particle filter algorithms with stereo vision system have been analysed in different works (83-86), but it is not the unique technique exploited. Schleicher et al. (87), for example, applied a top-down Bayesian method on the images coming from a wide-angle stereo camera to identify and localize natural landmarks. Lemaire et al. (88) discusses about two vision-based SLAM strategies where 3D points are used as landmarks: one relies on stereovision, where the landmark positions are fully observed from a single location; the other on a bearing-only approach implemented on monocular sequences. There have been also many successful approaches to the visual SLAM problem using the RGB-D sensor to exploit the 3D point clouds provided (89,90).
Most of the visual SLAM systems make use of algorithms from the computer vision, in particular the Structure from Motion (SfM). Nowadays, thanks to high performance computers, techniques such as bundle adjustment (BA) (91) are producing a great interest in the robotics community, considering that their sparse representations can enhance performance over the EKF. The first BA real time application is imputed to Mouragnon et al. (92), with the work on the visual odometry, followed by the parallel tracking and mapping (PTAM) system of Klein and Murray (93). In ORB-SLAM (94), thanks to the covisibility graph, tracking and mapping are considered in a local covisible area, independent of global map size.
Surgical SLAM
In this section, we will focus on the surgery SLAM applications. In medicine areas like assistance, rehabilitation and surgery there are several examples of devices and algorithms typical of robotics, from which they can benefit: special robot manipulators for surgery, control algorithms for tele-operation and cognitive algorithms for decision learning are just a few of them (95).
Medical Surgical Systems provide innovative products, which have had a profound effect on the performance and welfare of health care professionals. Robotic technologies have been developed to help the surgeons work and to allow optimal and accurate results, without the necessity of being in the same location of the patient. They are employed for different types of surgeries: from cardiac and open-heart surgery to prostate surgery, hysterectomies, joint replacements and kidney surgeries.
Da Vinci surgical platform (Intuitive Surgical Inc., Sunnyvale, CA, USA) represents a well-known system for minimally invasive surgery, which evolved from its first release. It consists of three main components: the surgical console, the side robotic cart with four robotic arms that can be manipulated by the surgeon from the console, and a high-definition 3D vision system. The surgical console is the main controller of the system, through which the surgeon manages the surgical instruments mounted on three of the arms of the robotic cart. In accordance with the planned procedure, different instruments can be attached. The 4th robotic arm is dedicated to the camera control. The doctor has the benefit of viewing a 3D video image of the procedure being carried out while the robotic arms compute the movements of his hands. Another example is the Sensei X (Hansen Medical Inc., Mountain View, CA, USA), a medical robot designed for performing complex cardiac arrhythmia operations using a flexible catheter with greater stability and control (96-98). Another commercial robotic tool for surgical applications is Navio PFS TM (Blue Belt Technologies Inc., Pittsburgh, PA, USA). This handheld device comprises a planning and navigation platform with precise bone preparation and dynamic soft tissue balancing.
A plenty of other systems exist, like the DLR MIRO, MiroSurge project, and a large part of these studies was conducted on cardiac field. The NeuroMate and NeuroArm are indeed for neurosurgery applications, like also the one introduced by Peters et al. (99). All these systems provide improved abilities in diagnosis and less invasive but more precise procedures. Furthermore, robots can reduce doctors’ strain and fatigue during surgeries lasting for hours. Image-guided surgery systems are the most adopted since the surgeon has the possibility of observing an operation from different viewpoints, allowing him to take the best decision on how to proceed. Since they include always a tracking device integrated with a surgical tool, the doctor is able to know the robot position related to some targets in the patient’s body and can thus decide where to guide it. One of the most famous image-guidance system is the Ensite-NavX (St Jude Medical, St Paul, MN, USA). used for cardiac mapping and ablation. It is able to 3-dimensionally reconstruct the electric activity and the cardiac cavities in which the operator visualizes in real-time the ablator without the presence of harmful radiations for the patient (100). Carto 3 (Bio-Sense Webster, Diamond Bar, CA, USA) is one of the most sophisticated systems on the cardiac electro-anatomical mapping, able to guide the removal of numerous arrhythmias. It allows the precise localization of the ablator thanks to three ultra-low electromagnetic fields. The anatomical structure of the heart is reconstructed through the contact of the ablator with the endocardial surface and for each point in the map an electric signal is registered (101,102). It has been shown clinically that these image-guidance and electro-anatomical mapping systems can reduce a doctor’s reliance on radiating fluoroscopy (103).
In (104) and (105) other two examples of image-guidance systems that use an electromagnetic tracker registered with preoperative images are presented. Zhong et al. (106) illustrated an automatic registration method based on the Iterative Closest point algorithm to align EM tracker measurements with preoperative images, but it requires long time (on the order of 40 minutes) to complete (107). In (108) authors proposed a software package to create custom image-guidance solutions.
Filtering for surgical applications
It is possible to find the filtering technique in various works. In (109) the pose of a stereoscope is jointly calculated with the recovery of the 3D positions of features detected in images by means of a EKF algorithm. Grasa et al. (110) also implemented an EKF estimator for a monocular SLAM approach on real sequences of endoscope images. Similarly, in (111) a CCD camera mounted on a fiberscope has been employed to reconstruct its motion and the 3D scene in which the surgery is going on. EKF is not the unique technique; some authors tried to implement an unscented particle filter to find the location of an intracardiac echo ultrasound catheter using the measurements coming from the instrument itself (112). The algorithm doesn’t need a prior estimate of the registration: it compares 21 live ultrasound images with the expected image for each particle in the filter during the update step. The authors demonstrated the convergence of the algorithm in about 30 seconds.
SLAM clinical benefits
Significant tissue deformation prohibits precise registration and fusion of pre- and intra-operative data in minimally invasive surgeries, mostly like cardiac, gastrointestinal, or abdominal ones. If manageable tissue motion is present, image-guided surgeries are demonstrated to be effective and with many advantages. Nowadays, vision based techniques such as SfM and visual SLAM, are considerably spreading due to their capabilities of recovering 3D structure and laparoscope motion. They have been exploited in many anatomical settings such as the abdomen (109,113), colon (114), bladder (115) and sinus (116), but the assumption of a static structure is required. In fact, it is a recurrent hypothesis within the research in this area but most surgical procedures cannot accomplish with it (117). SfM has been theorized for being used in non-rigid environments but the requirement of offline batch processing makes difficult its application for real-time uses. In (118), indeed, the estimated cardiac surface is considered static in a selected point, while in (119) is computed by means of tracking regions of interest in the organ. The laparoscopic camera is assumed fixed, which is not realistic for in vivo applications.
SLAM can considerably enhance the performance of image-guided surgeries enabling accurate navigation due to the continuous awareness of the robot location relative to its surroundings. SLAM can surely provide a reliable and appropriate model of the operation. The recursive adjustment of the probabilistic filtering approach, the registration parameters, the surface deformation and the robot configuration at each time step allows to get the most likely solution. Moreover, surgeries desire to combine data from various tracking sensors, images, pre-operative and live information. SLAM can considerably contribute to this aspect because it was born as a sensor-fusion algorithm where information coming from different sources are collected and fused together.
Another improvement that SLAM can make to image-guided systems is to annotate surface models with motion data: in fact, SLAM can estimate the periodic motion of nearby surfaces and adding this information to image-guidance models allows surgeons to plan better the paths towards the anatomical targets thanks to a more informative graphical interface. In relation to the model displayed, it is possible also to compute the uncertainties giving thus to the doctor a feedback which is helpful to determine the aspect of the visualization system that can be trusted to guide the robot precisely. When the robot position is estimated in an infeasible pose, given preoperative models, due to the inaccuracies, like registration errors, it is possible to compute a constraint update step to move it in a feasible region, thereby producing a more accurate and reliable representation of the operation.
Analysing the literature taken in consideration in this paper, it is possible to notice that the most employed technique in SLAM robotics applications, independently from the specific field of application, is the KF (Figure 1). In particular, the EKF covers the biggest part, maybe due to its main advantage of providing a good quality of the estimate and it has a relatively low complexity. This is reflected also in surgery robotics applications, as illustrated in Figure 2, where almost all the works exploit the vision sensors.

Emerging minimally invasive technologies have been embraced by many surgical disciplines over the past few years. This brought significant advancements in SLAM research also in the medical field. In this work, firstly a review of the main techniques adopted and implemented to solve the SLAM problems has been detailed, considering any kind of environment. A distinct paragraph has been dedicated to visual SLAM, since the optical sensors are always more employed in robotics applications and specifically in medical surgery. These systems can translate a surgeon’s movements into precise real-time movements of the robotic instruments inside a patient’s body and some of advanced surgical robotic systems have been summarized. From the analysis considered in this work, all known approaches to SLAM have their own limitations, but it can be stated that the EKF is the most used.
