3.3 Isaac ROS: Real-Time Visual SLAM for Humanoid Localization
Introduction
The Localization Problem: A robot wakes up in an unknown room. Without GPS, how does it know where it is? How does it build a map while simultaneously figuring out its position within that map?
The SLAM Solution: Visual SLAM (Simultaneous Localization and Mapping) solves both problems at once — tracking the robot's pose (position + orientation) while building a 3D map of the environment using only cameras and an IMU.
This section teaches you to:
- Understand how NVIDIA Isaac ROS accelerates VSLAM on GPUs
- Configure stereo cameras and IMU for VSLAM input
- Launch Isaac ROS VSLAM and achieve 30Hz real-time pose estimation
- Visualize maps, trajectories, and loop closures in RViz
- Handle VSLAM failures (tracking loss, feature-poor environments)
Real-World Impact: Tesla's Autopilot, Boston Dynamics' Spot, and Amazon warehouse robots all use variants of Visual SLAM for navigation without GPS.
What is Visual SLAM?
The Simultaneous Localization and Mapping Problem
Chicken-and-Egg Dilemma:
- To localize (know your position), you need a map
- To build a map, you need to know your position
SLAM's Solution: Do both at the same time!
- Initialization: Start from an unknown pose, extract visual features (corners, edges) from the first camera frame
- Tracking: As the robot moves, track how features move between frames → estimate camera motion (visual odometry)
- Mapping: Use camera motion + triangulation to compute 3D positions of features → build a point cloud map
- Loop Closure: When the robot revisits a location, recognize it and correct accumulated drift
- Optimization: Continuously refine the map and trajectory to maintain global consistency
Example: A humanoid walks down a hallway:
- Frame 1: See corner feature at pixel (320, 240)
- Frame 2: Same corner moved to (350, 240) → Robot moved 0.1m left
- Triangulate: Corner is 3.5m ahead at world coordinates (3.5, 0.5, 1.2)
- Add to map: Store 3D point + visual descriptor
- 100 frames later: See the same corner again → Loop closure! Correct drift in trajectory
Isaac ROS: GPU-Accelerated SLAM
Why Traditional SLAM is Slow
CPU-Based SLAM (e.g., ORB-SLAM2):
- Extract 500 ORB features per frame: 30ms
- Match features against map database (10,000 features): 50ms
- Triangulate new 3D points: 15ms
- Optimize trajectory (bundle adjustment): 30ms
- Total: 125ms per frame → 8 Hz maximum update rate
Problem: A humanoid moving at 0.5 m/s travels 6.25cm between pose updates. At high speeds, this causes drift and instability.
Isaac ROS VSLAM: The GPU Advantage
NVIDIA Isaac ROS moves compute-heavy tasks to the GPU:
| Task | CPU Time | GPU Time (RTX 3060) | Speedup |
|---|---|---|---|
| Feature Extraction | 30 ms | 3 ms | 10x |
| Feature Matching | 50 ms | 8 ms | 6x |
| Triangulation | 15 ms | 2 ms | 7.5x |
| Bundle Adjustment | 30 ms | 10 ms | 3x |
| Total Pipeline | 125 ms (8 Hz) | 23 ms (43 Hz) | 5.4x |
Real-Time Performance: Isaac ROS VSLAM runs at 30Hz (33ms per frame) on RTX 3060+ GPUs, providing smooth, low-latency localization.
Isaac ROS Architecture
Isaac ROS is built on ROS 2 with GPU-accelerated "GEMs" (GPU-Enabled Modules):
┌───────────────────────────────────────────────────────────────────┐
│ Isaac ROS GEMs (GPU Packages) │
├───────────────────────────────────────────────────────────────────┤
│ Visual SLAM │ AprilTag │ DNN Inference │ Image Processing │
│ (cuVSLAM) │ Detection │ (TensorRT) │ (cuDLA) │
└───────────────┬───────────────────────────────────────────────────┘
│
↓ CUDA / cuDNN / TensorRT
┌───────────────────────────────────────────────────────────────────┐
│ NVIDIA GPU (CUDA Cores) │
│ 2,000+ parallel processors for feature extraction, matching, │
│ triangulation, and optimization │
└───────────────────────────────────────────────────────────────────┘
Key Components:
- cuVSLAM: CUDA-accelerated Visual SLAM library (closed-source, NVIDIA proprietary)
- isaac_ros_visual_slam: ROS 2 wrapper providing standard topics and services
- TensorRT: Optimized deep learning inference for object detection (used in advanced perception)
- cuDLA: GPU-accelerated image processing (rectification, undistortion, noise reduction)
How Visual SLAM Works: A Deep Dive
Step 1: Feature Extraction
Goal: Find distinct visual landmarks (corners, edges, textures) that can be reliably tracked across frames.
ORB Features (Oriented FAST and Rotated BRIEF):
- FAST Detector: Identifies corner points (high-contrast regions)
- BRIEF Descriptor: Creates a 256-bit binary fingerprint of each corner (describing its appearance)
- Orientation: Adds rotation invariance so features match even if the camera rotates
Example: A corner where two walls meet generates an ORB feature:
- Keypoint: Pixel location (u, v) = (512, 384)
- Descriptor: Binary code
10110101...(256 bits) - Orientation: 45° (relative to image horizontal)
Isaac ROS Acceleration: Extracts 500 features per frame in 3ms (vs. 30ms on CPU).
Step 2: Feature Tracking
Goal: Match features between consecutive frames to estimate camera motion.
Process:
- Extract features from Frame N and Frame N+1
- Compare descriptors using Hamming distance (count differing bits)
- Match features with distance < threshold (typically 50 bits)
- Use RANSAC (Random Sample Consensus) to filter outliers
Example:
- Frame 1: Feature A at pixel (320, 240), descriptor
10110101... - Frame 2: Feature A' at pixel (340, 245), descriptor
10110001...(Hamming distance = 3) - Match! Feature moved 20 pixels right, 5 pixels down → Camera moved left and down
Visual Odometry: By tracking many features (100-500), compute the camera's 3D motion (translation + rotation) using triangulation and perspective-n-point (PnP) algorithms.
Step 3: Mapping (Triangulation)
Goal: Convert 2D feature positions into 3D world coordinates.
Stereo Triangulation: Given:
- Left camera sees feature at pixel (u_L, v_L)
- Right camera sees same feature at pixel (u_R, v_R)
- Baseline (distance between cameras) = 0.2m
- Focal length = 500 pixels
Disparity: d = u_L - u_R (horizontal pixel difference) Depth: Z = (baseline × focal_length) / d 3D Position: X = (u_L - c_x) × Z / f_x, Y = (v_L - c_y) × Z / f_y
Example:
- Left pixel: (400, 300), Right pixel: (350, 300)
- Disparity: d = 50 pixels
- Depth: Z = (0.2m × 500px) / 50px = 2.0m
- 3D point: (0.0m, 0.0m, 2.0m) — directly ahead, 2 meters away
Map Storage: Each triangulated point is stored with its descriptor and uncertainty estimate (covariance matrix).
Step 4: Keyframe Selection
Problem: Storing every frame (30 FPS) creates a map with 1,800 frames per minute → Too much data, redundant information.
Solution: Select keyframes — frames that add new information to the map.
Keyframe Criteria:
- Significant camera motion (>0.2m translation or >10° rotation since last keyframe)
- New visual features visible (>30% of features not in previous keyframes)
- Tracking quality is good (>100 matched features)
Example: Robot walks down a hallway:
- Frames 1-10: Forward motion → Select frames 1, 5, 10 as keyframes (50cm apart)
- Frames 11-20: Static (robot stopped) → No keyframes added
- Frame 21: Robot turns 90° → New keyframe (different viewing angle)
Result: Map contains ~200 keyframes for a 100m² environment instead of 10,000+ total frames.
Step 5: Loop Closure Detection
The Drift Problem: Small errors in visual odometry accumulate over time. After 100 meters, the robot might think it's 2 meters from its true position.
Loop Closure: When the robot revisits a previously mapped area, recognize it and correct the drift.
Process:
- Extract features from current frame
- Compare against all keyframes in the map (using bag-of-words for fast lookup)
- If similarity score > threshold → Loop closure candidate
- Verify geometrically: Compute relative pose between current frame and candidate keyframe
- If consistent → Add loop closure constraint to the optimization graph
Example:
- Robot explores a large room and returns to the start
- VSLAM recognizes the starting location (matches 150+ features with first keyframe)
- Detects 1.5m drift in accumulated pose estimate
- Runs pose graph optimization to distribute correction across entire trajectory
- Result: Map becomes globally consistent, drift eliminated
Isaac ROS Performance: Loop closure detection runs at 5Hz without blocking real-time tracking.
Step 6: Pose Graph Optimization
Goal: Minimize accumulated error across all keyframes and loop closures.
Pose Graph:
- Nodes: Keyframes with estimated poses (x, y, z, roll, pitch, yaw)
- Edges: Constraints between keyframes (odometry, loop closures)
- Optimization: Adjust all poses to minimize total error (graph SLAM problem)
Optimization Method: Gauss-Newton or Levenberg-Marquardt iterative solver
Example:
- 100 keyframes in a loop
- Accumulated drift: 2 meters by frame 100
- Loop closure detects: Frame 100 is actually at the same location as Frame 1
- Optimization: Distribute the 2m error across all 100 keyframes (0.02m correction per frame)
- Result: Smooth, consistent trajectory with no jumps
Isaac ROS: Runs incremental optimization in background thread (doesn't block real-time tracking).
ROS 2 Topics: The VSLAM Interface
Input Topics
| Topic | Message Type | Frequency | Purpose |
|---|---|---|---|
/camera/left/image_raw | sensor_msgs/Image | 30 Hz | Left stereo camera (RGB or grayscale) |
/camera/right/image_raw | sensor_msgs/Image | 30 Hz | Right stereo camera (for depth) |
/camera/imu | sensor_msgs/Imu | 200 Hz | IMU data (acceleration + gyroscope) |
/camera/left/camera_info | sensor_msgs/CameraInfo | 30 Hz | Camera calibration (intrinsics) |
Note: IMU fusion is optional but highly recommended for better tracking during rapid motion.
Output Topics
| Topic | Message Type | Frequency | Purpose |
|---|---|---|---|
/vslam/pose | geometry_msgs/PoseStamped | 30 Hz | Robot pose in map frame (x, y, z, quaternion) |
/vslam/odometry | nav_msgs/Odometry | 30 Hz | Pose + velocity estimate |
/vslam/map | sensor_msgs/PointCloud2 | 1 Hz | 3D point cloud map (feature positions) |
/vslam/path | nav_msgs/Path | 5 Hz | Robot trajectory (history of poses) |
/vslam/tracking_status | std_msgs/String | 10 Hz | Status: "TRACKING", "LOST", "INITIALIZING" |
Key Topic: /vslam/pose is used by Nav2 for localization (where the robot is in the map).
Hands-On: Launching Isaac ROS VSLAM
Prerequisites
Hardware:
- NVIDIA GPU (RTX 2060 or higher)
- Stereo camera (or simulated in Isaac Sim)
Software:
- ROS 2 Humble or Iron
- Isaac ROS installed:
sudo apt install ros-humble-isaac-ros-visual-slam - Isaac Sim running (if using simulation)
Step 1: Verify Camera Topics
Check that camera data is publishing:
# List camera topics
ros2 topic list | grep camera
# Expected output:
/camera/left/image_raw
/camera/right/image_raw
/camera/left/camera_info
/camera/right/camera_info
/camera/imu
Check data rates:
# Verify 30Hz camera rate
ros2 topic hz /camera/left/image_raw
# Expected output: average rate: 30.0
# Verify 200Hz IMU rate
ros2 topic hz /camera/imu
# Expected output: average rate: 200.0
Step 2: Configure VSLAM Parameters
Create vslam_params.yaml:
/**:
ros__parameters:
# Enable IMU fusion for better tracking
enable_imu_fusion: true
# Enable loop closure for drift correction
enable_loop_closure: true
# Feature extraction parameters
num_cameras: 2 # Stereo pair
min_num_features: 150 # Minimum for tracking
max_num_features: 500 # Maximum per frame
# Frame IDs (coordinate frames)
map_frame: "map"
odom_frame: "odom"
base_frame: "base_link"
camera_optical_frame: "camera_left_optical_frame"
# IMU parameters
imu_frame: "imu_link"
gyro_noise_density: 0.0001 # rad/s/√Hz
gyro_random_walk: 0.00001 # rad/s²/√Hz
accel_noise_density: 0.001 # m/s²/√Hz
accel_random_walk: 0.0001 # m/s³/√Hz
# Loop closure parameters
loop_closure_frequency: 5.0 # Hz (how often to check)
loop_closure_min_score: 0.75 # Similarity threshold
Parameter Explanations:
enable_imu_fusion: Combines visual odometry with IMU for smoother tracking during fast motionenable_loop_closure: Corrects drift when robot revisits areasmin_num_features: If tracked features drop below 150, tracking quality degradesmax_num_features: Limits computation (more features = slower but more robust)
Step 3: Create Launch File
Create vslam_isaac_sim.launch.py:
from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import DeclareLaunchArgument
from launch.substitutions import LaunchConfiguration
def generate_launch_description():
# Launch arguments
use_sim_time = LaunchConfiguration('use_sim_time', default='true')
return LaunchDescription([
DeclareLaunchArgument(
'use_sim_time',
default_value='true',
description='Use simulation time from Isaac Sim'
),
# Isaac ROS Visual SLAM Node
Node(
package='isaac_ros_visual_slam',
executable='visual_slam_node',
name='visual_slam',
parameters=[
'config/vslam_params.yaml',
{'use_sim_time': use_sim_time}
],
remappings=[
# Remap Isaac Sim topics to VSLAM expected names
('/stereo_camera/left/image', '/camera/left/image_raw'),
('/stereo_camera/right/image', '/camera/right/image_raw'),
('/stereo_camera/left/camera_info', '/camera/left/camera_info'),
('/stereo_camera/right/camera_info', '/camera/right/camera_info'),
('/imu', '/camera/imu'),
],
output='screen'
),
# Static TF Publishers (camera to base_link)
Node(
package='tf2_ros',
executable='static_transform_publisher',
name='camera_left_tf',
arguments=['0.1', '0.05', '1.6', '0', '0', '0', 'base_link', 'camera_left_optical_frame']
# Humanoid: Camera 1.6m high, 0.1m forward, 0.05m left of center
),
Node(
package='tf2_ros',
executable='static_transform_publisher',
name='camera_right_tf',
arguments=['0.1', '-0.05', '1.6', '0', '0', '0', 'base_link', 'camera_right_optical_frame']
# Right camera: Same height, 0.05m right of center (0.1m baseline)
),
Node(
package='tf2_ros',
executable='static_transform_publisher',
name='imu_tf',
arguments=['0', '0', '0.5', '0', '0', '0', 'base_link', 'imu_link']
# IMU at robot's center of mass (0.5m high)
),
])
Step 4: Launch VSLAM
# Terminal 1: Start Isaac Sim with humanoid scene
# (Assuming Isaac Sim is already running with cameras publishing)
# Terminal 2: Launch VSLAM
ros2 launch isaac-ros-vslam vslam_isaac_sim.launch.py
# Expected output:
[visual_slam]: Initializing Visual SLAM...
[visual_slam]: Camera intrinsics loaded: fx=500.0 fy=500.0 cx=320.0 cy=240.0
[visual_slam]: IMU fusion enabled
[visual_slam]: Loop closure enabled (frequency: 5.0 Hz)
[visual_slam]: Waiting for camera images...
[visual_slam]: First frame received, extracting features...
[visual_slam]: Initialization complete! Tracking started.
[visual_slam]: Pose published at 30.0 Hz
Step 5: Verify VSLAM Output
Check pose output rate:
ros2 topic hz /vslam/pose
# Expected output:
average rate: 30.0
min: 0.032s max: 0.034s std dev: 0.001s window: 100
Target: 30 Hz ± 2 Hz (consistent with camera frame rate).
Check pose data:
ros2 topic echo /vslam/pose --once
# Example output:
header:
stamp:
sec: 125
nanosec: 450000000
frame_id: "map"
pose:
position:
x: 1.234
y: 0.567
z: 0.0
orientation: # Quaternion (x, y, z, w)
x: 0.0
y: 0.0
z: 0.707
w: 0.707 # Facing 90° left
Interpretation: Robot is 1.23m forward, 0.57m left, facing 90° (quaternion z=0.707, w=0.707).
Step 6: Visualize in RViz
Launch RViz with VSLAM visualization:
ros2 run rviz2 rviz2 -d config/vslam_rviz.rviz
RViz Configuration:
- Fixed Frame:
map - Add PointCloud2:
- Topic:
/vslam/map - Color: By intensity (white points)
- Size: 0.05m
- Topic:
- Add Path:
- Topic:
/vslam/path - Color: Blue
- Line width: 0.02m
- Topic:
- Add PoseStamped:
- Topic:
/vslam/pose - Display: Axes (red=X, green=Y, blue=Z)
- Length: 0.5m
- Topic:
What You'll See:
- White point cloud: 3D map of visual features
- Blue line: Robot's trajectory (history of poses)
- RGB axes: Current robot pose (orientation)
Step 7: Test Loop Closure
Procedure:
- Drive robot in a square loop (5m × 5m)
- Observe drift as robot completes loop (may be 0.5m off starting position)
- When robot returns to start, VSLAM detects loop closure
- Map and trajectory snap to corrected positions
Monitor loop closures:
ros2 topic echo /vslam/tracking_status
# During loop:
data: "TRACKING"
# When loop detected:
data: "LOOP_CLOSURE_DETECTED"
# After optimization:
data: "TRACKING"
RViz Observation: Trajectory line "jumps" slightly when loop closure correction is applied.
Practice Exercise: VSLAM Performance Test
Objective
Validate that Isaac ROS VSLAM achieves real-time performance (30Hz pose updates) and successful loop closure.
Procedure
-
Launch Isaac Sim with humanoid robot in a 10m × 10m room
-
Start VSLAM:
ros2 launch isaac-ros-vslam vslam_isaac_sim.launch.py -
Start RViz:
ros2 run rviz2 rviz2 -d config/vslam_rviz.rviz -
Drive robot in a loop:
# Use teleop or autonomous waypoints
ros2 run teleop_twist_keyboard teleop_twist_keyboard
# Drive: Forward 5m → Left 90° → Forward 5m → Left 90° → Forward 5m → Left 90° → Forward 5m -
Measure performance:
# Terminal 3: Monitor pose rate
ros2 topic hz /vslam/pose
# Expected: ~30 Hz -
Check loop closure:
- Observe map drift at 75% completion of loop
- Wait for robot to return to start
- Confirm loop closure detected:
ros2 topic echo /vslam/tracking_status - Verify trajectory correction in RViz (path snaps to close the loop)
Success Criteria
✅ VSLAM initializes within 5 seconds of first camera frame ✅ Pose updates published at 30 Hz ± 2 Hz ✅ Map contains 500+ feature points after 30 seconds of motion ✅ Trajectory smoothly follows robot motion in RViz ✅ Loop closure detected when robot returns to start (tolerance: within 1 meter) ✅ Drift corrected within 3 seconds of loop closure detection
Reality Check: When VSLAM Fails
Failure Mode 1: Texture-less Environments
Symptom: Tracking status changes to "LOST", no pose updates
Cause: VSLAM relies on visual features (corners, edges, textures). In environments like:
- White-walled corridors with no decoration
- Uniform floors (plain carpet, smooth concrete)
- Featureless ceilings
...there are too few features to track (< 50 features extracted per frame).
Mitigation:
- Add artificial landmarks (AprilTags, posters, floor markings)
- Supplement with LiDAR or wheel odometry
- Reduce lighting (increases camera gain → noise becomes "features")
Example: Tesla Autopilot struggles in parking garages with white walls — adds radar to compensate.
Failure Mode 2: Rapid Motion
Symptom: Jerky pose estimates, frequent re-initialization, high drift
Cause: Fast motion (>1 m/s or >90°/s rotation) causes:
- Motion blur: Features become blurry → matching fails
- Large disparity: Features move >50 pixels between frames → search space too large
- IMU saturation: Gyroscope/accelerometer exceed measurement range
Mitigation:
- Increase camera frame rate (60 Hz or 90 Hz)
- Reduce robot maximum velocity
- Enable IMU fusion to predict feature positions
- Use global shutter cameras (eliminate rolling shutter distortion)
Example: High-speed drones use 120 Hz cameras for VSLAM to handle rapid motion.
Failure Mode 3: Dynamic Environments
Symptom: Map contains "ghost" features from moving objects, inconsistent loop closures
Cause: VSLAM assumes static world. Moving objects (people, chairs, doors) violate this assumption:
- Features on moving objects get added to the map
- Loop closure fails because scene has changed since last visit
- Optimization distributes error incorrectly (trying to fit inconsistent data)
Mitigation:
- Use semantic segmentation to filter dynamic objects (people, vehicles)
- Implement dynamic SLAM (ORB-SLAM3 DynaSLAM variant)
- Rely on static features only (walls, floor, ceiling)
Example: Warehouse robots use LiDAR + VSLAM and filter out "person" class from object detection.
Failure Mode 4: Scale Ambiguity (Monocular SLAM)
Symptom: Map scale is incorrect (1m in real world = 2m in map)
Cause: Single cameras cannot measure absolute depth → SLAM estimates scale from motion, which can drift
Solution: Always use stereo or depth cameras for metric SLAM. If using monocular, require external scale references:
- Known-size AprilTags
- Depth sensor fusion
- IMU gravity vector (provides absolute vertical scale)
Isaac ROS VSLAM: Designed for stereo cameras — avoids scale ambiguity entirely.
Isaac ROS GEMs: Beyond VSLAM
Isaac ROS includes GPU-accelerated packages for the entire perception pipeline:
Visual Odometry
Purpose: Estimate camera motion between frames (simpler than full SLAM, no loop closure)
Use Case: High-speed robots where map consistency is less important than real-time pose updates
Performance: 60 Hz pose updates on RTX 3060
AprilTag Detection
Purpose: Detect fiducial markers for localization and object tracking
Use Case:
- Initial localization (robot wakes up, scans for AprilTag, knows starting pose)
- Object manipulation (tags on objects provide precise 6DOF pose)
- VSLAM recovery (if tracking lost, relocalize using AprilTags)
Performance: Detect 50+ tags per frame at 30 Hz
DNN Inference (TensorRT)
Purpose: Run deep learning models for object detection, segmentation, pose estimation
Use Case:
- Semantic SLAM (classify features as "wall", "floor", "person")
- Obstacle detection (detect dynamic objects)
- Affordance detection (find "graspable objects")
Performance: YOLOv5 runs at 50 FPS (20ms latency) on RTX 3060
Image Processing (cuDLA)
Purpose: GPU-accelerated image operations (rectification, undistortion, noise reduction, format conversion)
Use Case:
- Stereo rectification (align left/right images for disparity calculation)
- Lens distortion correction (remove fisheye effects)
- Bayer demosaicing (convert raw camera data to RGB)
Performance: Rectify 1920×1080 stereo pair in 2ms
Integration with Nav2: The VSLAM → Planning Bridge
Data Flow
Isaac ROS VSLAM Nav2
┌──────────────┐ ┌──────────────┐
│ /vslam/pose │ ──────────────────>│ Localization │
│ (30 Hz) │ │ Module │
└──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌──────▼───────┐
│ /vslam/map │ ──────────────────>│ Costmap │
│ (1 Hz) │ │ (obstacles) │
└──────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ Path Planner │
└──────────────┘
Key Topics:
/vslam/pose→ Nav2 AMCL (Adaptive Monte Carlo Localization) uses this as the prior estimate/vslam/map→ Converted to occupancy grid for costmap (free space = no features, occupied = dense features)
Next Section: Section 3.4 will configure Nav2 to use VSLAM output for humanoid navigation.
Check Your Understanding
-
What does SLAM stand for?
- A) Sensor Localization and Mapping
- B) Simultaneous Localization and Mapping
- C) Spatial Learning and Modeling
- Answer: B
-
What is loop closure?
- A) Closing the robot's control loop
- B) Detecting when the robot revisits a location to correct drift
- C) Shutting down SLAM when complete
- Answer: B
-
Why does Isaac ROS VSLAM run faster than CPU-based SLAM?
- A) Uses simpler algorithms
- B) Parallel processing on GPU CUDA cores
- C) Lower resolution images
- Answer: B
-
What is the purpose of IMU fusion in VSLAM?
- A) Replace cameras entirely
- B) Improve tracking during fast motion by predicting feature positions
- C) Measure absolute position
- Answer: B
-
What happens if VSLAM tracking is lost?
- A) Robot stops immediately
- B) Switches to dead reckoning (odometry only) until tracking recovers
- C) Robot self-destructs
- Answer: B
-
What is a keyframe?
- A) Every camera frame
- B) Selected frames that add new information to the map
- C) Frames with the most features
- Answer: B
Key Takeaways
✅ Visual SLAM solves localization and mapping simultaneously using only cameras and IMU (no GPS required)
✅ Isaac ROS VSLAM achieves 30Hz real-time performance through GPU acceleration (5x faster than CPU)
✅ Loop closure corrects drift by recognizing previously visited areas and optimizing the entire trajectory
✅ Stereo cameras eliminate scale ambiguity and enable metric maps (accurate distances)
✅ VSLAM can fail in texture-less environments, during rapid motion, and with dynamic obstacles — require robust failure handling
✅ ROS 2 topics provide standard interface: /vslam/pose for localization, /vslam/map for planning