Skip to main content

3.3 Isaac ROS: Real-Time Visual SLAM for Humanoid Localization

Introduction

The Localization Problem: A robot wakes up in an unknown room. Without GPS, how does it know where it is? How does it build a map while simultaneously figuring out its position within that map?

The SLAM Solution: Visual SLAM (Simultaneous Localization and Mapping) solves both problems at once — tracking the robot's pose (position + orientation) while building a 3D map of the environment using only cameras and an IMU.

This section teaches you to:

  1. Understand how NVIDIA Isaac ROS accelerates VSLAM on GPUs
  2. Configure stereo cameras and IMU for VSLAM input
  3. Launch Isaac ROS VSLAM and achieve 30Hz real-time pose estimation
  4. Visualize maps, trajectories, and loop closures in RViz
  5. Handle VSLAM failures (tracking loss, feature-poor environments)

Real-World Impact: Tesla's Autopilot, Boston Dynamics' Spot, and Amazon warehouse robots all use variants of Visual SLAM for navigation without GPS.


What is Visual SLAM?

The Simultaneous Localization and Mapping Problem

Chicken-and-Egg Dilemma:

  • To localize (know your position), you need a map
  • To build a map, you need to know your position

SLAM's Solution: Do both at the same time!

  1. Initialization: Start from an unknown pose, extract visual features (corners, edges) from the first camera frame
  2. Tracking: As the robot moves, track how features move between frames → estimate camera motion (visual odometry)
  3. Mapping: Use camera motion + triangulation to compute 3D positions of features → build a point cloud map
  4. Loop Closure: When the robot revisits a location, recognize it and correct accumulated drift
  5. Optimization: Continuously refine the map and trajectory to maintain global consistency

Example: A humanoid walks down a hallway:

  • Frame 1: See corner feature at pixel (320, 240)
  • Frame 2: Same corner moved to (350, 240) → Robot moved 0.1m left
  • Triangulate: Corner is 3.5m ahead at world coordinates (3.5, 0.5, 1.2)
  • Add to map: Store 3D point + visual descriptor
  • 100 frames later: See the same corner again → Loop closure! Correct drift in trajectory

Isaac ROS: GPU-Accelerated SLAM

Why Traditional SLAM is Slow

CPU-Based SLAM (e.g., ORB-SLAM2):

  • Extract 500 ORB features per frame: 30ms
  • Match features against map database (10,000 features): 50ms
  • Triangulate new 3D points: 15ms
  • Optimize trajectory (bundle adjustment): 30ms
  • Total: 125ms per frame → 8 Hz maximum update rate

Problem: A humanoid moving at 0.5 m/s travels 6.25cm between pose updates. At high speeds, this causes drift and instability.


Isaac ROS VSLAM: The GPU Advantage

NVIDIA Isaac ROS moves compute-heavy tasks to the GPU:

TaskCPU TimeGPU Time (RTX 3060)Speedup
Feature Extraction30 ms3 ms10x
Feature Matching50 ms8 ms6x
Triangulation15 ms2 ms7.5x
Bundle Adjustment30 ms10 ms3x
Total Pipeline125 ms (8 Hz)23 ms (43 Hz)5.4x

Real-Time Performance: Isaac ROS VSLAM runs at 30Hz (33ms per frame) on RTX 3060+ GPUs, providing smooth, low-latency localization.


Isaac ROS Architecture

Isaac ROS is built on ROS 2 with GPU-accelerated "GEMs" (GPU-Enabled Modules):

┌───────────────────────────────────────────────────────────────────┐
│ Isaac ROS GEMs (GPU Packages) │
├───────────────────────────────────────────────────────────────────┤
│ Visual SLAM │ AprilTag │ DNN Inference │ Image Processing │
│ (cuVSLAM) │ Detection │ (TensorRT) │ (cuDLA) │
└───────────────┬───────────────────────────────────────────────────┘

↓ CUDA / cuDNN / TensorRT
┌───────────────────────────────────────────────────────────────────┐
│ NVIDIA GPU (CUDA Cores) │
│ 2,000+ parallel processors for feature extraction, matching, │
│ triangulation, and optimization │
└───────────────────────────────────────────────────────────────────┘

Key Components:

  • cuVSLAM: CUDA-accelerated Visual SLAM library (closed-source, NVIDIA proprietary)
  • isaac_ros_visual_slam: ROS 2 wrapper providing standard topics and services
  • TensorRT: Optimized deep learning inference for object detection (used in advanced perception)
  • cuDLA: GPU-accelerated image processing (rectification, undistortion, noise reduction)

How Visual SLAM Works: A Deep Dive

Step 1: Feature Extraction

Goal: Find distinct visual landmarks (corners, edges, textures) that can be reliably tracked across frames.

ORB Features (Oriented FAST and Rotated BRIEF):

  • FAST Detector: Identifies corner points (high-contrast regions)
  • BRIEF Descriptor: Creates a 256-bit binary fingerprint of each corner (describing its appearance)
  • Orientation: Adds rotation invariance so features match even if the camera rotates

Example: A corner where two walls meet generates an ORB feature:

  • Keypoint: Pixel location (u, v) = (512, 384)
  • Descriptor: Binary code 10110101... (256 bits)
  • Orientation: 45° (relative to image horizontal)

Isaac ROS Acceleration: Extracts 500 features per frame in 3ms (vs. 30ms on CPU).


Step 2: Feature Tracking

Goal: Match features between consecutive frames to estimate camera motion.

Process:

  1. Extract features from Frame N and Frame N+1
  2. Compare descriptors using Hamming distance (count differing bits)
  3. Match features with distance < threshold (typically 50 bits)
  4. Use RANSAC (Random Sample Consensus) to filter outliers

Example:

  • Frame 1: Feature A at pixel (320, 240), descriptor 10110101...
  • Frame 2: Feature A' at pixel (340, 245), descriptor 10110001... (Hamming distance = 3)
  • Match! Feature moved 20 pixels right, 5 pixels down → Camera moved left and down

Visual Odometry: By tracking many features (100-500), compute the camera's 3D motion (translation + rotation) using triangulation and perspective-n-point (PnP) algorithms.


Step 3: Mapping (Triangulation)

Goal: Convert 2D feature positions into 3D world coordinates.

Stereo Triangulation: Given:

  • Left camera sees feature at pixel (u_L, v_L)
  • Right camera sees same feature at pixel (u_R, v_R)
  • Baseline (distance between cameras) = 0.2m
  • Focal length = 500 pixels

Disparity: d = u_L - u_R (horizontal pixel difference) Depth: Z = (baseline × focal_length) / d 3D Position: X = (u_L - c_x) × Z / f_x, Y = (v_L - c_y) × Z / f_y

Example:

  • Left pixel: (400, 300), Right pixel: (350, 300)
  • Disparity: d = 50 pixels
  • Depth: Z = (0.2m × 500px) / 50px = 2.0m
  • 3D point: (0.0m, 0.0m, 2.0m) — directly ahead, 2 meters away

Map Storage: Each triangulated point is stored with its descriptor and uncertainty estimate (covariance matrix).


Step 4: Keyframe Selection

Problem: Storing every frame (30 FPS) creates a map with 1,800 frames per minute → Too much data, redundant information.

Solution: Select keyframes — frames that add new information to the map.

Keyframe Criteria:

  • Significant camera motion (>0.2m translation or >10° rotation since last keyframe)
  • New visual features visible (>30% of features not in previous keyframes)
  • Tracking quality is good (>100 matched features)

Example: Robot walks down a hallway:

  • Frames 1-10: Forward motion → Select frames 1, 5, 10 as keyframes (50cm apart)
  • Frames 11-20: Static (robot stopped) → No keyframes added
  • Frame 21: Robot turns 90° → New keyframe (different viewing angle)

Result: Map contains ~200 keyframes for a 100m² environment instead of 10,000+ total frames.


Step 5: Loop Closure Detection

The Drift Problem: Small errors in visual odometry accumulate over time. After 100 meters, the robot might think it's 2 meters from its true position.

Loop Closure: When the robot revisits a previously mapped area, recognize it and correct the drift.

Process:

  1. Extract features from current frame
  2. Compare against all keyframes in the map (using bag-of-words for fast lookup)
  3. If similarity score > threshold → Loop closure candidate
  4. Verify geometrically: Compute relative pose between current frame and candidate keyframe
  5. If consistent → Add loop closure constraint to the optimization graph

Example:

  • Robot explores a large room and returns to the start
  • VSLAM recognizes the starting location (matches 150+ features with first keyframe)
  • Detects 1.5m drift in accumulated pose estimate
  • Runs pose graph optimization to distribute correction across entire trajectory
  • Result: Map becomes globally consistent, drift eliminated

Isaac ROS Performance: Loop closure detection runs at 5Hz without blocking real-time tracking.


Step 6: Pose Graph Optimization

Goal: Minimize accumulated error across all keyframes and loop closures.

Pose Graph:

  • Nodes: Keyframes with estimated poses (x, y, z, roll, pitch, yaw)
  • Edges: Constraints between keyframes (odometry, loop closures)
  • Optimization: Adjust all poses to minimize total error (graph SLAM problem)

Optimization Method: Gauss-Newton or Levenberg-Marquardt iterative solver

Example:

  • 100 keyframes in a loop
  • Accumulated drift: 2 meters by frame 100
  • Loop closure detects: Frame 100 is actually at the same location as Frame 1
  • Optimization: Distribute the 2m error across all 100 keyframes (0.02m correction per frame)
  • Result: Smooth, consistent trajectory with no jumps

Isaac ROS: Runs incremental optimization in background thread (doesn't block real-time tracking).


ROS 2 Topics: The VSLAM Interface

Input Topics

TopicMessage TypeFrequencyPurpose
/camera/left/image_rawsensor_msgs/Image30 HzLeft stereo camera (RGB or grayscale)
/camera/right/image_rawsensor_msgs/Image30 HzRight stereo camera (for depth)
/camera/imusensor_msgs/Imu200 HzIMU data (acceleration + gyroscope)
/camera/left/camera_infosensor_msgs/CameraInfo30 HzCamera calibration (intrinsics)

Note: IMU fusion is optional but highly recommended for better tracking during rapid motion.


Output Topics

TopicMessage TypeFrequencyPurpose
/vslam/posegeometry_msgs/PoseStamped30 HzRobot pose in map frame (x, y, z, quaternion)
/vslam/odometrynav_msgs/Odometry30 HzPose + velocity estimate
/vslam/mapsensor_msgs/PointCloud21 Hz3D point cloud map (feature positions)
/vslam/pathnav_msgs/Path5 HzRobot trajectory (history of poses)
/vslam/tracking_statusstd_msgs/String10 HzStatus: "TRACKING", "LOST", "INITIALIZING"

Key Topic: /vslam/pose is used by Nav2 for localization (where the robot is in the map).


Hands-On: Launching Isaac ROS VSLAM

Prerequisites

Hardware:

  • NVIDIA GPU (RTX 2060 or higher)
  • Stereo camera (or simulated in Isaac Sim)

Software:

  • ROS 2 Humble or Iron
  • Isaac ROS installed: sudo apt install ros-humble-isaac-ros-visual-slam
  • Isaac Sim running (if using simulation)

Step 1: Verify Camera Topics

Check that camera data is publishing:

# List camera topics
ros2 topic list | grep camera

# Expected output:
/camera/left/image_raw
/camera/right/image_raw
/camera/left/camera_info
/camera/right/camera_info
/camera/imu

Check data rates:

# Verify 30Hz camera rate
ros2 topic hz /camera/left/image_raw

# Expected output: average rate: 30.0

# Verify 200Hz IMU rate
ros2 topic hz /camera/imu

# Expected output: average rate: 200.0

Step 2: Configure VSLAM Parameters

Create vslam_params.yaml:

/**:
ros__parameters:
# Enable IMU fusion for better tracking
enable_imu_fusion: true

# Enable loop closure for drift correction
enable_loop_closure: true

# Feature extraction parameters
num_cameras: 2 # Stereo pair
min_num_features: 150 # Minimum for tracking
max_num_features: 500 # Maximum per frame

# Frame IDs (coordinate frames)
map_frame: "map"
odom_frame: "odom"
base_frame: "base_link"
camera_optical_frame: "camera_left_optical_frame"

# IMU parameters
imu_frame: "imu_link"
gyro_noise_density: 0.0001 # rad/s/√Hz
gyro_random_walk: 0.00001 # rad/s²/√Hz
accel_noise_density: 0.001 # m/s²/√Hz
accel_random_walk: 0.0001 # m/s³/√Hz

# Loop closure parameters
loop_closure_frequency: 5.0 # Hz (how often to check)
loop_closure_min_score: 0.75 # Similarity threshold

Parameter Explanations:

  • enable_imu_fusion: Combines visual odometry with IMU for smoother tracking during fast motion
  • enable_loop_closure: Corrects drift when robot revisits areas
  • min_num_features: If tracked features drop below 150, tracking quality degrades
  • max_num_features: Limits computation (more features = slower but more robust)

Step 3: Create Launch File

Create vslam_isaac_sim.launch.py:

from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import DeclareLaunchArgument
from launch.substitutions import LaunchConfiguration

def generate_launch_description():
# Launch arguments
use_sim_time = LaunchConfiguration('use_sim_time', default='true')

return LaunchDescription([
DeclareLaunchArgument(
'use_sim_time',
default_value='true',
description='Use simulation time from Isaac Sim'
),

# Isaac ROS Visual SLAM Node
Node(
package='isaac_ros_visual_slam',
executable='visual_slam_node',
name='visual_slam',
parameters=[
'config/vslam_params.yaml',
{'use_sim_time': use_sim_time}
],
remappings=[
# Remap Isaac Sim topics to VSLAM expected names
('/stereo_camera/left/image', '/camera/left/image_raw'),
('/stereo_camera/right/image', '/camera/right/image_raw'),
('/stereo_camera/left/camera_info', '/camera/left/camera_info'),
('/stereo_camera/right/camera_info', '/camera/right/camera_info'),
('/imu', '/camera/imu'),
],
output='screen'
),

# Static TF Publishers (camera to base_link)
Node(
package='tf2_ros',
executable='static_transform_publisher',
name='camera_left_tf',
arguments=['0.1', '0.05', '1.6', '0', '0', '0', 'base_link', 'camera_left_optical_frame']
# Humanoid: Camera 1.6m high, 0.1m forward, 0.05m left of center
),

Node(
package='tf2_ros',
executable='static_transform_publisher',
name='camera_right_tf',
arguments=['0.1', '-0.05', '1.6', '0', '0', '0', 'base_link', 'camera_right_optical_frame']
# Right camera: Same height, 0.05m right of center (0.1m baseline)
),

Node(
package='tf2_ros',
executable='static_transform_publisher',
name='imu_tf',
arguments=['0', '0', '0.5', '0', '0', '0', 'base_link', 'imu_link']
# IMU at robot's center of mass (0.5m high)
),
])

Step 4: Launch VSLAM

# Terminal 1: Start Isaac Sim with humanoid scene
# (Assuming Isaac Sim is already running with cameras publishing)

# Terminal 2: Launch VSLAM
ros2 launch isaac-ros-vslam vslam_isaac_sim.launch.py

# Expected output:
[visual_slam]: Initializing Visual SLAM...
[visual_slam]: Camera intrinsics loaded: fx=500.0 fy=500.0 cx=320.0 cy=240.0
[visual_slam]: IMU fusion enabled
[visual_slam]: Loop closure enabled (frequency: 5.0 Hz)
[visual_slam]: Waiting for camera images...
[visual_slam]: First frame received, extracting features...
[visual_slam]: Initialization complete! Tracking started.
[visual_slam]: Pose published at 30.0 Hz

Step 5: Verify VSLAM Output

Check pose output rate:

ros2 topic hz /vslam/pose

# Expected output:
average rate: 30.0
min: 0.032s max: 0.034s std dev: 0.001s window: 100

Target: 30 Hz ± 2 Hz (consistent with camera frame rate).


Check pose data:

ros2 topic echo /vslam/pose --once

# Example output:
header:
stamp:
sec: 125
nanosec: 450000000
frame_id: "map"
pose:
position:
x: 1.234
y: 0.567
z: 0.0
orientation: # Quaternion (x, y, z, w)
x: 0.0
y: 0.0
z: 0.707
w: 0.707 # Facing 90° left

Interpretation: Robot is 1.23m forward, 0.57m left, facing 90° (quaternion z=0.707, w=0.707).


Step 6: Visualize in RViz

Launch RViz with VSLAM visualization:

ros2 run rviz2 rviz2 -d config/vslam_rviz.rviz

RViz Configuration:

  1. Fixed Frame: map
  2. Add PointCloud2:
    • Topic: /vslam/map
    • Color: By intensity (white points)
    • Size: 0.05m
  3. Add Path:
    • Topic: /vslam/path
    • Color: Blue
    • Line width: 0.02m
  4. Add PoseStamped:
    • Topic: /vslam/pose
    • Display: Axes (red=X, green=Y, blue=Z)
    • Length: 0.5m

What You'll See:

  • White point cloud: 3D map of visual features
  • Blue line: Robot's trajectory (history of poses)
  • RGB axes: Current robot pose (orientation)

Step 7: Test Loop Closure

Procedure:

  1. Drive robot in a square loop (5m × 5m)
  2. Observe drift as robot completes loop (may be 0.5m off starting position)
  3. When robot returns to start, VSLAM detects loop closure
  4. Map and trajectory snap to corrected positions

Monitor loop closures:

ros2 topic echo /vslam/tracking_status

# During loop:
data: "TRACKING"

# When loop detected:
data: "LOOP_CLOSURE_DETECTED"

# After optimization:
data: "TRACKING"

RViz Observation: Trajectory line "jumps" slightly when loop closure correction is applied.


Practice Exercise: VSLAM Performance Test

Objective

Validate that Isaac ROS VSLAM achieves real-time performance (30Hz pose updates) and successful loop closure.

Procedure

  1. Launch Isaac Sim with humanoid robot in a 10m × 10m room

  2. Start VSLAM: ros2 launch isaac-ros-vslam vslam_isaac_sim.launch.py

  3. Start RViz: ros2 run rviz2 rviz2 -d config/vslam_rviz.rviz

  4. Drive robot in a loop:

    # Use teleop or autonomous waypoints
    ros2 run teleop_twist_keyboard teleop_twist_keyboard

    # Drive: Forward 5m → Left 90° → Forward 5m → Left 90° → Forward 5m → Left 90° → Forward 5m
  5. Measure performance:

    # Terminal 3: Monitor pose rate
    ros2 topic hz /vslam/pose

    # Expected: ~30 Hz
  6. Check loop closure:

    • Observe map drift at 75% completion of loop
    • Wait for robot to return to start
    • Confirm loop closure detected: ros2 topic echo /vslam/tracking_status
    • Verify trajectory correction in RViz (path snaps to close the loop)

Success Criteria

✅ VSLAM initializes within 5 seconds of first camera frame ✅ Pose updates published at 30 Hz ± 2 Hz ✅ Map contains 500+ feature points after 30 seconds of motion ✅ Trajectory smoothly follows robot motion in RViz ✅ Loop closure detected when robot returns to start (tolerance: within 1 meter) ✅ Drift corrected within 3 seconds of loop closure detection


Reality Check: When VSLAM Fails

Failure Mode 1: Texture-less Environments

Symptom: Tracking status changes to "LOST", no pose updates

Cause: VSLAM relies on visual features (corners, edges, textures). In environments like:

  • White-walled corridors with no decoration
  • Uniform floors (plain carpet, smooth concrete)
  • Featureless ceilings

...there are too few features to track (< 50 features extracted per frame).

Mitigation:

  • Add artificial landmarks (AprilTags, posters, floor markings)
  • Supplement with LiDAR or wheel odometry
  • Reduce lighting (increases camera gain → noise becomes "features")

Example: Tesla Autopilot struggles in parking garages with white walls — adds radar to compensate.


Failure Mode 2: Rapid Motion

Symptom: Jerky pose estimates, frequent re-initialization, high drift

Cause: Fast motion (>1 m/s or >90°/s rotation) causes:

  • Motion blur: Features become blurry → matching fails
  • Large disparity: Features move >50 pixels between frames → search space too large
  • IMU saturation: Gyroscope/accelerometer exceed measurement range

Mitigation:

  • Increase camera frame rate (60 Hz or 90 Hz)
  • Reduce robot maximum velocity
  • Enable IMU fusion to predict feature positions
  • Use global shutter cameras (eliminate rolling shutter distortion)

Example: High-speed drones use 120 Hz cameras for VSLAM to handle rapid motion.


Failure Mode 3: Dynamic Environments

Symptom: Map contains "ghost" features from moving objects, inconsistent loop closures

Cause: VSLAM assumes static world. Moving objects (people, chairs, doors) violate this assumption:

  • Features on moving objects get added to the map
  • Loop closure fails because scene has changed since last visit
  • Optimization distributes error incorrectly (trying to fit inconsistent data)

Mitigation:

  • Use semantic segmentation to filter dynamic objects (people, vehicles)
  • Implement dynamic SLAM (ORB-SLAM3 DynaSLAM variant)
  • Rely on static features only (walls, floor, ceiling)

Example: Warehouse robots use LiDAR + VSLAM and filter out "person" class from object detection.


Failure Mode 4: Scale Ambiguity (Monocular SLAM)

Symptom: Map scale is incorrect (1m in real world = 2m in map)

Cause: Single cameras cannot measure absolute depth → SLAM estimates scale from motion, which can drift

Solution: Always use stereo or depth cameras for metric SLAM. If using monocular, require external scale references:

  • Known-size AprilTags
  • Depth sensor fusion
  • IMU gravity vector (provides absolute vertical scale)

Isaac ROS VSLAM: Designed for stereo cameras — avoids scale ambiguity entirely.


Isaac ROS GEMs: Beyond VSLAM

Isaac ROS includes GPU-accelerated packages for the entire perception pipeline:

Visual Odometry

Purpose: Estimate camera motion between frames (simpler than full SLAM, no loop closure)

Use Case: High-speed robots where map consistency is less important than real-time pose updates

Performance: 60 Hz pose updates on RTX 3060


AprilTag Detection

Purpose: Detect fiducial markers for localization and object tracking

Use Case:

  • Initial localization (robot wakes up, scans for AprilTag, knows starting pose)
  • Object manipulation (tags on objects provide precise 6DOF pose)
  • VSLAM recovery (if tracking lost, relocalize using AprilTags)

Performance: Detect 50+ tags per frame at 30 Hz


DNN Inference (TensorRT)

Purpose: Run deep learning models for object detection, segmentation, pose estimation

Use Case:

  • Semantic SLAM (classify features as "wall", "floor", "person")
  • Obstacle detection (detect dynamic objects)
  • Affordance detection (find "graspable objects")

Performance: YOLOv5 runs at 50 FPS (20ms latency) on RTX 3060


Image Processing (cuDLA)

Purpose: GPU-accelerated image operations (rectification, undistortion, noise reduction, format conversion)

Use Case:

  • Stereo rectification (align left/right images for disparity calculation)
  • Lens distortion correction (remove fisheye effects)
  • Bayer demosaicing (convert raw camera data to RGB)

Performance: Rectify 1920×1080 stereo pair in 2ms


Integration with Nav2: The VSLAM → Planning Bridge

Data Flow

Isaac ROS VSLAM                         Nav2
┌──────────────┐ ┌──────────────┐
│ /vslam/pose │ ──────────────────>│ Localization │
│ (30 Hz) │ │ Module │
└──────────────┘ └──────┬───────┘

┌──────────────┐ ┌──────▼───────┐
│ /vslam/map │ ──────────────────>│ Costmap │
│ (1 Hz) │ │ (obstacles) │
└──────────────┘ └──────┬───────┘

┌──────▼───────┐
│ Path Planner │
└──────────────┘

Key Topics:

  • /vslam/pose → Nav2 AMCL (Adaptive Monte Carlo Localization) uses this as the prior estimate
  • /vslam/map → Converted to occupancy grid for costmap (free space = no features, occupied = dense features)

Next Section: Section 3.4 will configure Nav2 to use VSLAM output for humanoid navigation.


Check Your Understanding

  1. What does SLAM stand for?

    • A) Sensor Localization and Mapping
    • B) Simultaneous Localization and Mapping
    • C) Spatial Learning and Modeling
    • Answer: B
  2. What is loop closure?

    • A) Closing the robot's control loop
    • B) Detecting when the robot revisits a location to correct drift
    • C) Shutting down SLAM when complete
    • Answer: B
  3. Why does Isaac ROS VSLAM run faster than CPU-based SLAM?

    • A) Uses simpler algorithms
    • B) Parallel processing on GPU CUDA cores
    • C) Lower resolution images
    • Answer: B
  4. What is the purpose of IMU fusion in VSLAM?

    • A) Replace cameras entirely
    • B) Improve tracking during fast motion by predicting feature positions
    • C) Measure absolute position
    • Answer: B
  5. What happens if VSLAM tracking is lost?

    • A) Robot stops immediately
    • B) Switches to dead reckoning (odometry only) until tracking recovers
    • C) Robot self-destructs
    • Answer: B
  6. What is a keyframe?

    • A) Every camera frame
    • B) Selected frames that add new information to the map
    • C) Frames with the most features
    • Answer: B

Key Takeaways

Visual SLAM solves localization and mapping simultaneously using only cameras and IMU (no GPS required)

Isaac ROS VSLAM achieves 30Hz real-time performance through GPU acceleration (5x faster than CPU)

Loop closure corrects drift by recognizing previously visited areas and optimizing the entire trajectory

Stereo cameras eliminate scale ambiguity and enable metric maps (accurate distances)

VSLAM can fail in texture-less environments, during rapid motion, and with dynamic obstacles — require robust failure handling

ROS 2 topics provide standard interface: /vslam/pose for localization, /vslam/map for planning


Next: 3.4 Nav2 - Humanoid Path Planning