3.2 Isaac Sim: Photorealistic Simulation & Synthetic Data Generation

Introduction

The Data Problem: Training a vision model to detect obstacles requires 10,000+ labeled images. Collecting and labeling this data manually takes weeks. Running physical robots risks damage during trial-and-error learning.

The Simulation Solution: NVIDIA Isaac Sim generates photorealistic synthetic data at scale — 1,000 labeled images in minutes, not weeks. Train perception models in simulation, then deploy them on real robots with minimal accuracy loss.

This section teaches you to:

Set up Isaac Sim and create a virtual humanoid environment
Configure cameras, sensors, and domain randomization
Export synthetic datasets for AI model training
Validate sim-to-real transfer quality

Real-World Impact: Companies like Amazon Robotics and BMW use Isaac Sim to train warehouse robots and autonomous vehicles without risking physical hardware during development.

Why Photorealistic Simulation Matters

The Reality Gap

Challenge: Models trained on synthetic data often fail on real robots because simulated environments don't match reality:

Lighting: Simulated lights are too perfect (no glare, reflections, or shadows)
Textures: Materials look artificial (plastics too shiny, walls too smooth)
Physics: Objects behave unrealistically (no friction variation, perfect collisions)
Sensor noise: Cameras and depth sensors produce perfect data (no motion blur, lens distortion, or interference)

Isaac Sim's Approach: Ray-traced rendering + Domain randomization + Sensor noise models bridge the reality gap.

Isaac Sim's Photorealism Features

1. RTX Ray Tracing

Purpose: Physically accurate lighting that matches real-world cameras

How It Works:

Simulates millions of light rays bouncing between surfaces
Produces realistic shadows (soft shadows from area lights, sharp shadows from point lights)
Captures reflections (metallic surfaces, glossy floors, glass)
Models global illumination (indirect lighting from light bouncing off walls)

Example: A camera viewing a shiny metal surface will see reflections of overhead lights and nearby objects — just like a real camera would.

Performance: NVIDIA RTX GPUs (RTX 3060 or higher) render at 30-60 FPS with ray tracing enabled.

2. Material System (MDL - Material Definition Language)

Purpose: Create realistic surfaces with physical properties

Material Properties:

Albedo (base color): RGB color under neutral lighting
Roughness: Controls glossiness (0.0 = mirror, 1.0 = matte)
Metallic: Metallic vs. dielectric surfaces (0.0 = plastic, 1.0 = metal)
Normal maps: Add surface detail without extra geometry (scratches, bumps, grain)

Example: A wood floor material has:

Albedo: brown color
Roughness: 0.6 (semi-matte)
Metallic: 0.0 (non-metal)
Normal map: wood grain texture

Result: Camera sees realistic wood grain with directional highlights — indistinguishable from real wood.

3. Depth and Segmentation Sensors

Purpose: Generate ground-truth labels automatically (no manual annotation!)

Sensor Types:

RGB Camera: Standard color images (matching physical cameras)
Depth Camera: Distance to each pixel (in meters)
Semantic Segmentation: Per-pixel object class labels (wall, floor, person, furniture)
Instance Segmentation: Unique ID per object instance (person_01, person_02, chair_01)
Bounding Boxes: 2D/3D boxes around objects for object detection training

Example Output (single frame):

/output/
├── rgb_0001.png          # Color image (1920x1080)
├── depth_0001.exr        # Depth map (32-bit float, meters)
├── semantic_0001.png     # Segmentation (each color = object class)
├── instance_0001.json    # Instance IDs and bounding boxes
└── camera_info.yaml      # Camera intrinsics (focal length, distortion)

Key Advantage: Ground truth is perfect — no human labeling errors. A depth map shows exact distances (accurate to 0.01 meters in simulation).

Domain Randomization: Teaching Robustness

The Problem

A model trained on one environment (e.g., a white-walled lab) fails when deployed in a different environment (e.g., a textured office).

Example: An obstacle detection model trained on images with:

White walls
Bright fluorescent lighting
Clean floors

...will fail when tested on:

Brick walls (different texture)
Dim natural lighting (different brightness)
Cluttered floors (different context)

The Solution: Domain Randomization

Concept: Train on thousands of randomized environments so the model learns features that generalize across all variations.

Randomization Parameters:

Lighting Randomization

Intensity: Vary brightness (500-2000 lux for indoor scenes)
Color temperature: Warm (2700K) to cool (6500K) lighting
Direction: Randomize sun angle, add/remove point lights
Shadows: Hard vs. soft shadows by varying light size

Example: Same scene rendered with 10 different lighting conditions → Model learns to detect obstacles under any lighting.

Texture Randomization

Wall materials: Brick, concrete, drywall, wood paneling
Floor materials: Tile, carpet, hardwood, linoleum
Object colors: Randomize furniture colors, decals, posters

Example: A "chair" appears with 50 different textures → Model learns chair shape, not specific texture.

Camera Randomization

Exposure: Simulate over/underexposed images
Motion blur: Simulate fast robot motion during capture
Lens distortion: Add barrel/pincushion distortion matching real cameras
Noise: Add sensor noise (ISO 100-3200 equivalent)

Example: Training on blurry, noisy, poorly exposed images → Model is robust to real-world camera imperfections.

Clutter and Occlusion

Random objects: Place boxes, chairs, plants in random positions
Partial occlusions: Objects partially hidden behind others
Dynamic elements: People walking, doors opening

Example: Training on cluttered scenes → Model learns to detect partially visible obstacles.

Domain Randomization in Code

Here's how Isaac Sim implements domain randomization using the Replicator API:

import omni.replicator.core as rep

# Register randomization function
def randomize_scene():
    # Randomize lighting
    lights = rep.get.prims(semantics=[("class", "light")])
    with lights:
        rep.modify.attribute("intensity", rep.distribution.uniform(500, 2000))
        rep.modify.attribute("color", rep.distribution.uniform((0.8, 0.8, 0.8), (1.0, 1.0, 1.0)))

    # Randomize textures
    walls = rep.get.prims(path_pattern="/World/Walls/*")
    with walls:
        rep.randomizer.texture(
            textures=rep.distribution.choice([
                "/Materials/Brick.mdl",
                "/Materials/Concrete.mdl",
                "/Materials/Drywall.mdl"
            ])
        )

    # Randomize object placement
    obstacles = rep.get.prims(semantics=[("class", "obstacle")])
    with obstacles:
        rep.modify.pose(
            position=rep.distribution.uniform((-5, 0, -5), (5, 0, 5)),
            rotation=rep.distribution.uniform((0, 0, 0), (0, 360, 0))
        )

    # Randomize camera parameters
    camera = rep.get.prims(path_pattern="/World/Camera")
    with camera:
        rep.modify.attribute("focalLength", rep.distribution.uniform(18, 35))  # mm
        rep.modify.attribute("fStop", rep.distribution.uniform(2.8, 8.0))

# Register and trigger
rep.randomizer.register(randomize_scene)

How This Works:

rep.get.prims() selects scene elements (lights, walls, objects)
rep.modify.attribute() changes properties (intensity, color, position)
rep.distribution.uniform() samples random values from ranges
rep.randomizer.register() runs randomization before each frame capture

Result: Rendering 1,000 frames produces 1,000 unique environments — equivalent to collecting data in 1,000 different real-world locations.

Setting Up Your First Isaac Sim Scene

Prerequisites

Hardware: NVIDIA RTX GPU (RTX 3060 or higher recommended)
Software: Isaac Sim 4.0+ (requires Omniverse Launcher)
OS: Ubuntu 22.04 LTS or Windows 10/11

Installation (see quickstart.md for full instructions):

Install NVIDIA Omniverse Launcher
Install Isaac Sim from the Omniverse app library
Verify installation: Launch Isaac Sim → should see viewport with default scene

Scenario: A 10m x 10m room with obstacles for humanoid navigation testing.

Step 1: Create the Scene Structure

from pxr import Gf, UsdGeom, UsdPhysics
from omni.isaac.core import World
from omni.isaac.core.utils.stage import add_reference_to_stage

# Initialize Isaac Sim world
world = World(stage_units_in_meters=1.0)
stage = world.stage

# Create room floor (10m x 10m)
floor_path = "/World/Floor"
floor = UsdGeom.Cube.Define(stage, floor_path)
floor.GetSizeAttr().Set(1.0)
floor.AddScaleOp().Set(Gf.Vec3f(10.0, 0.1, 10.0))  # 10m x 0.1m x 10m
floor.AddTranslateOp().Set(Gf.Vec3f(0.0, -0.05, 0.0))

# Create walls (4 walls forming a box)
def create_wall(path, position, scale):
    wall = UsdGeom.Cube.Define(stage, path)
    wall.GetSizeAttr().Set(1.0)
    wall.AddScaleOp().Set(Gf.Vec3f(*scale))
    wall.AddTranslateOp().Set(Gf.Vec3f(*position))
    return wall

create_wall("/World/WallNorth", (0, 1.5, 5), (10, 3, 0.2))
create_wall("/World/WallSouth", (0, 1.5, -5), (10, 3, 0.2))
create_wall("/World/WallEast", (5, 1.5, 0), (0.2, 3, 10))
create_wall("/World/WallWest", (-5, 1.5, 0), (0.2, 3, 10))

Explanation:

UsdGeom.Cube.Define(): Creates a cube primitive at the specified path
AddScaleOp(): Scales the cube to desired dimensions (X, Y, Z in meters)
AddTranslateOp(): Positions the cube in world coordinates

Step 2: Add a Humanoid Robot

Isaac Sim includes pre-built robot assets. We'll use a generic humanoid:

# Import humanoid robot USD asset
robot_path = "/World/Humanoid"
robot_usd_path = "omniverse://localhost/NVIDIA/Assets/Isaac/4.0/Isaac/Robots/Humanoid/humanoid.usd"
add_reference_to_stage(robot_usd_path, robot_path)

# Position robot at origin
robot_prim = stage.GetPrimAtPath(robot_path)
UsdGeom.Xformable(robot_prim).AddTranslateOp().Set(Gf.Vec3f(0.0, 0.0, 0.0))

Reality Check: The humanoid.usd file contains:

Mesh geometry: Visual appearance (body, limbs, head)
Collision shapes: Simplified geometry for physics
Joints: Articulations with DOF (degrees of freedom)
Sensors: Cameras, IMU (if configured)

Step 3: Configure Cameras and Sensors

Add stereo cameras (left/right) for VSLAM and depth sensing:

from omni.isaac.sensor import Camera

# Left camera
camera_left = Camera(
    prim_path="/World/Humanoid/camera_left",
    position=np.array([0.1, 1.6, 0.05]),  # 1.6m height (head), 0.1m forward, 0.05m left
    frequency=30,  # 30 Hz
    resolution=(1280, 720),
    orientation=np.array([1, 0, 0, 0])  # Quaternion (forward-facing)
)
camera_left.initialize()

# Right camera (stereo pair, 12cm baseline)
camera_right = Camera(
    prim_path="/World/Humanoid/camera_right",
    position=np.array([0.1, 1.6, -0.05]),  # Same height, 0.05m right
    frequency=30,
    resolution=(1280, 720),
    orientation=np.array([1, 0, 0, 0])
)
camera_right.initialize()

# Depth camera (co-located with left camera for alignment)
camera_depth = Camera(
    prim_path="/World/Humanoid/camera_depth",
    position=np.array([0.1, 1.6, 0.05]),
    frequency=30,
    resolution=(640, 480),
    orientation=np.array([1, 0, 0, 0])
)
camera_depth.set_depth_enabled(True)  # Enable depth output
camera_depth.initialize()

Camera Coordinate Frame:

X: Forward (robot's front)
Y: Up (perpendicular to ground)
Z: Left (robot's left side)

Baseline Explanation: 12cm (0.12m) separation between left/right cameras enables stereo depth estimation. Larger baseline → better depth accuracy at long range, worse at close range.

Step 4: Add Physics and Lighting

# Add physics scene (required for gravity, collisions)
from omni.isaac.core.utils.physics import add_ground_plane
from pxr import UsdPhysics, PhysxSchema

# Configure physics
scene = UsdPhysics.Scene.Define(stage, "/World/PhysicsScene")
scene.CreateGravityDirectionAttr().Set(Gf.Vec3f(0.0, -1.0, 0.0))
scene.CreateGravityMagnitudeAttr().Set(9.81)

# Add PhysX scene settings
physx_scene = PhysxSchema.PhysxSceneAPI.Apply(scene.GetPrim())
physx_scene.CreateEnableCCDAttr().Set(True)  # Continuous collision detection
physx_scene.CreateEnableGPUDynamicsAttr().Set(True)  # GPU acceleration

# Add lighting (dome light for ambient + directional for sun)
from omni.isaac.core.utils.prims import create_prim

dome_light = create_prim(
    "/World/DomeLight",
    "DomeLight",
    attributes={"intensity": 1000}
)

sun_light = create_prim(
    "/World/SunLight",
    "DistantLight",
    attributes={
        "intensity": 3000,
        "angle": 0.53,  # Angular size (degrees) - matches sun
        "color": (1.0, 1.0, 0.95)  # Slightly warm white
    }
)

Physics Notes:

CCD (Continuous Collision Detection): Prevents fast-moving objects from passing through thin surfaces
GPU Dynamics: Offloads physics computation to GPU for faster simulation

Exporting Synthetic Datasets

Full Export Script

This script captures 1,000 frames with randomization:

import omni.replicator.core as rep
from pathlib import Path

class SyntheticDataExporter:
    def __init__(self, output_dir: str, camera_paths: list):
        self.output_dir = Path(output_dir)
        self.camera_paths = camera_paths
        self._setup_output_directories()
        self._setup_writers()

    def _setup_output_directories(self):
        """Create output directory structure."""
        (self.output_dir / "rgb").mkdir(parents=True, exist_ok=True)
        (self.output_dir / "depth").mkdir(parents=True, exist_ok=True)
        (self.output_dir / "semantic").mkdir(parents=True, exist_ok=True)
        (self.output_dir / "annotations").mkdir(parents=True, exist_ok=True)

    def _setup_writers(self):
        """Configure Replicator writers for data export."""
        # RGB writer
        self.rgb_writer = rep.WriterRegistry.get("BasicWriter")
        self.rgb_writer.initialize(
            output_dir=str(self.output_dir / "rgb"),
            rgb=True,
            distance_to_camera=False
        )

        # Depth writer
        self.depth_writer = rep.WriterRegistry.get("BasicWriter")
        self.depth_writer.initialize(
            output_dir=str(self.output_dir / "depth"),
            distance_to_camera=True,
            rgb=False
        )

        # Semantic segmentation writer
        self.semantic_writer = rep.WriterRegistry.get("BasicWriter")
        self.semantic_writer.initialize(
            output_dir=str(self.output_dir / "semantic"),
            semantic_segmentation=True,
            rgb=False
        )

        # COCO annotation writer (for object detection)
        self.coco_writer = rep.WriterRegistry.get("COCOWriter")
        self.coco_writer.initialize(
            output_dir=str(self.output_dir / "annotations"),
            bbox_2d_tight=True,
            semantic_types=["class"]
        )

    def export_dataset(self, num_frames: int = 1000):
        """Export synthetic dataset with domain randomization."""
        # Attach cameras to render products
        render_products = []
        for camera_path in self.camera_paths:
            rp = rep.create.render_product(camera_path, (1280, 720))
            render_products.append(rp)

        # Attach writers to render products
        self.rgb_writer.attach(render_products)
        self.depth_writer.attach(render_products)
        self.semantic_writer.attach(render_products)
        self.coco_writer.attach(render_products)

        # Run randomization and capture
        with rep.trigger.on_frame(num_frames=num_frames):
            randomize_scene()  # Call randomization function from earlier

        # Execute orchestrator
        rep.orchestrator.run()

        print(f"✅ Exported {num_frames} frames to {self.output_dir}")
        return {
            "num_frames": num_frames,
            "output_dir": str(self.output_dir),
            "data_types": ["rgb", "depth", "semantic", "bbox_2d"]
        }

# Usage
exporter = SyntheticDataExporter(
    output_dir="/workspace/datasets/humanoid_nav_v1",
    camera_paths=["/World/Humanoid/camera_left", "/World/Humanoid/camera_right"]
)
exporter.export_dataset(num_frames=1000)

Performance: On RTX 4080, this exports ~10-15 FPS → 1,000 frames in ~70 seconds.

Output Dataset Structure

/workspace/datasets/humanoid_nav_v1/
├── rgb/
│   ├── rgb_0000.png
│   ├── rgb_0001.png
│   └── ...
├── depth/
│   ├── depth_0000.exr  # 32-bit float, values in meters
│   ├── depth_0001.exr
│   └── ...
├── semantic/
│   ├── semantic_0000.png  # Color-coded segmentation
│   ├── semantic_0001.png
│   └── ...
└── annotations/
    └── coco_annotations.json  # COCO format bounding boxes

COCO Annotations Example:

{
  "images": [
    {"id": 1, "file_name": "rgb_0000.png", "width": 1280, "height": 720}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,  # "chair"
      "bbox": [450, 320, 180, 240],  # [x, y, width, height]
      "area": 43200,
      "iscrowd": 0
    }
  ],
  "categories": [
    {"id": 1, "name": "chair"},
    {"id": 2, "name": "table"},
    {"id": 3, "name": "person"}
  ]
}

Validating Sim-to-Real Transfer

Common Transfer Failures

Problem 1: Sim data is "too perfect"

Symptom: Model achieves 99% accuracy in sim, 60% on real robot
Cause: No sensor noise, motion blur, or lens distortion in sim
Fix: Add camera randomization (noise, blur, distortion)

Problem 2: Lighting mismatch

Symptom: Model fails in dimly lit real environments
Cause: All sim training done with bright, uniform lighting
Fix: Randomize lighting intensity (500-2000 lux range)

Problem 3: Texture bias

Symptom: Model detects "blue chairs" but not "red chairs"
Cause: All chairs in sim had blue texture
Fix: Randomize object colors and textures

Validation Workflow

Train on synthetic data (1,000 images from Isaac Sim)
Evaluate on synthetic test set (200 held-out sim images) → Expect 95%+ accuracy
Collect real-world validation set (50 images from physical robot)
Evaluate on real test set → Target 85%+ accuracy (10% gap is acceptable)
If gap > 15%: Analyze failure modes and add missing randomization

Example Results (obstacle detection):

Sim test accuracy: 97%
Real test accuracy: 89%
Gap: 8% → Acceptable sim-to-real transfer

Reality Check: When Simulation Isn't Enough

Limitations:

Physics inaccuracies: Friction, deformation, fluid dynamics are approximations
Sensor models: Real cameras have lens aberrations, rolling shutter effects not in sim
Rare events: Simulation may not capture edge cases (e.g., specular reflections from glass)

Best Practice: Use sim for initial training (90% of data), then fine-tune on small real-world dataset (10% of data).

Hybrid Approach:

Generate 10,000 synthetic images in Isaac Sim
Collect 1,000 real images from robot
Train model on combined dataset (11,000 images)
Result: Model inherits sim diversity + real-world realism

Check Your Understanding

What is the primary advantage of synthetic data over real data?
- A) Higher image resolution
- B) Automatic ground-truth labels without manual annotation
- C) Better color accuracy
- Answer: B
What does domain randomization prevent?
- A) Overfitting to specific environments (e.g., one room layout)
- B) GPU overheating during rendering
- C) Physics simulation errors
- Answer: A
Which Isaac Sim feature enables realistic shadows and reflections?
- A) Material Definition Language (MDL)
- B) RTX ray tracing
- C) Replicator API
- Answer: B
What is a typical sim-to-real accuracy gap for well-randomized datasets?
- A) 50% (model completely fails on real data)
- B) 5-15% (acceptable transfer with some fine-tuning)
- C) 0% (perfect transfer)
- Answer: B
Which file format does Isaac Sim use for depth maps?
- A) .png (8-bit integer)
- B) .jpg (compressed image)
- C) .exr (32-bit float)
- Answer: C

Key Takeaways

✅ Synthetic data solves labeling bottleneck: Generate 1,000 labeled images in minutes vs. weeks of manual work

✅ Photorealism requires ray tracing + materials: RTX GPUs enable realistic lighting and textures matching real cameras

✅ Domain randomization prevents overfitting: Train on diverse environments → Model generalizes to new scenes

✅ Replicator API automates data export: One script exports RGB, depth, segmentation, and bounding boxes simultaneously

✅ Sim-to-real gap is manageable: 5-15% accuracy drop is expected; fine-tune on small real dataset to close the gap

Next: 3.3 Isaac ROS - Real-Time Visual SLAM

Introduction​

Why Photorealistic Simulation Matters​

The Reality Gap​

Isaac Sim's Photorealism Features​

1. RTX Ray Tracing​

2. Material System (MDL - Material Definition Language)​

3. Depth and Segmentation Sensors​

Domain Randomization: Teaching Robustness​

The Problem​

The Solution: Domain Randomization​

Lighting Randomization​

Texture Randomization​

Camera Randomization​

Clutter and Occlusion​

Domain Randomization in Code​

Setting Up Your First Isaac Sim Scene​

Prerequisites​

Creating a Simple Navigation Environment​

Step 1: Create the Scene Structure​

Step 2: Add a Humanoid Robot​

Step 3: Configure Cameras and Sensors​

Step 4: Add Physics and Lighting​

Exporting Synthetic Datasets​

Full Export Script​

Output Dataset Structure​

Validating Sim-to-Real Transfer​

Common Transfer Failures​

Validation Workflow​

Reality Check: When Simulation Isn't Enough​

Check Your Understanding​

Key Takeaways​