Skip to main content

3.2 Isaac Sim: Photorealistic Simulation & Synthetic Data Generation

Introduction

The Data Problem: Training a vision model to detect obstacles requires 10,000+ labeled images. Collecting and labeling this data manually takes weeks. Running physical robots risks damage during trial-and-error learning.

The Simulation Solution: NVIDIA Isaac Sim generates photorealistic synthetic data at scale — 1,000 labeled images in minutes, not weeks. Train perception models in simulation, then deploy them on real robots with minimal accuracy loss.

This section teaches you to:

  1. Set up Isaac Sim and create a virtual humanoid environment
  2. Configure cameras, sensors, and domain randomization
  3. Export synthetic datasets for AI model training
  4. Validate sim-to-real transfer quality

Real-World Impact: Companies like Amazon Robotics and BMW use Isaac Sim to train warehouse robots and autonomous vehicles without risking physical hardware during development.


Why Photorealistic Simulation Matters

The Reality Gap

Challenge: Models trained on synthetic data often fail on real robots because simulated environments don't match reality:

  • Lighting: Simulated lights are too perfect (no glare, reflections, or shadows)
  • Textures: Materials look artificial (plastics too shiny, walls too smooth)
  • Physics: Objects behave unrealistically (no friction variation, perfect collisions)
  • Sensor noise: Cameras and depth sensors produce perfect data (no motion blur, lens distortion, or interference)

Isaac Sim's Approach: Ray-traced rendering + Domain randomization + Sensor noise models bridge the reality gap.


Isaac Sim's Photorealism Features

1. RTX Ray Tracing

Purpose: Physically accurate lighting that matches real-world cameras

How It Works:

  • Simulates millions of light rays bouncing between surfaces
  • Produces realistic shadows (soft shadows from area lights, sharp shadows from point lights)
  • Captures reflections (metallic surfaces, glossy floors, glass)
  • Models global illumination (indirect lighting from light bouncing off walls)

Example: A camera viewing a shiny metal surface will see reflections of overhead lights and nearby objects — just like a real camera would.

Performance: NVIDIA RTX GPUs (RTX 3060 or higher) render at 30-60 FPS with ray tracing enabled.


2. Material System (MDL - Material Definition Language)

Purpose: Create realistic surfaces with physical properties

Material Properties:

  • Albedo (base color): RGB color under neutral lighting
  • Roughness: Controls glossiness (0.0 = mirror, 1.0 = matte)
  • Metallic: Metallic vs. dielectric surfaces (0.0 = plastic, 1.0 = metal)
  • Normal maps: Add surface detail without extra geometry (scratches, bumps, grain)

Example: A wood floor material has:

  • Albedo: brown color
  • Roughness: 0.6 (semi-matte)
  • Metallic: 0.0 (non-metal)
  • Normal map: wood grain texture

Result: Camera sees realistic wood grain with directional highlights — indistinguishable from real wood.


3. Depth and Segmentation Sensors

Purpose: Generate ground-truth labels automatically (no manual annotation!)

Sensor Types:

  • RGB Camera: Standard color images (matching physical cameras)
  • Depth Camera: Distance to each pixel (in meters)
  • Semantic Segmentation: Per-pixel object class labels (wall, floor, person, furniture)
  • Instance Segmentation: Unique ID per object instance (person_01, person_02, chair_01)
  • Bounding Boxes: 2D/3D boxes around objects for object detection training

Example Output (single frame):

/output/
├── rgb_0001.png # Color image (1920x1080)
├── depth_0001.exr # Depth map (32-bit float, meters)
├── semantic_0001.png # Segmentation (each color = object class)
├── instance_0001.json # Instance IDs and bounding boxes
└── camera_info.yaml # Camera intrinsics (focal length, distortion)

Key Advantage: Ground truth is perfect — no human labeling errors. A depth map shows exact distances (accurate to 0.01 meters in simulation).


Domain Randomization: Teaching Robustness

The Problem

A model trained on one environment (e.g., a white-walled lab) fails when deployed in a different environment (e.g., a textured office).

Example: An obstacle detection model trained on images with:

  • White walls
  • Bright fluorescent lighting
  • Clean floors

...will fail when tested on:

  • Brick walls (different texture)
  • Dim natural lighting (different brightness)
  • Cluttered floors (different context)

The Solution: Domain Randomization

Concept: Train on thousands of randomized environments so the model learns features that generalize across all variations.

Randomization Parameters:

Lighting Randomization

  • Intensity: Vary brightness (500-2000 lux for indoor scenes)
  • Color temperature: Warm (2700K) to cool (6500K) lighting
  • Direction: Randomize sun angle, add/remove point lights
  • Shadows: Hard vs. soft shadows by varying light size

Example: Same scene rendered with 10 different lighting conditions → Model learns to detect obstacles under any lighting.


Texture Randomization

  • Wall materials: Brick, concrete, drywall, wood paneling
  • Floor materials: Tile, carpet, hardwood, linoleum
  • Object colors: Randomize furniture colors, decals, posters

Example: A "chair" appears with 50 different textures → Model learns chair shape, not specific texture.


Camera Randomization

  • Exposure: Simulate over/underexposed images
  • Motion blur: Simulate fast robot motion during capture
  • Lens distortion: Add barrel/pincushion distortion matching real cameras
  • Noise: Add sensor noise (ISO 100-3200 equivalent)

Example: Training on blurry, noisy, poorly exposed images → Model is robust to real-world camera imperfections.


Clutter and Occlusion

  • Random objects: Place boxes, chairs, plants in random positions
  • Partial occlusions: Objects partially hidden behind others
  • Dynamic elements: People walking, doors opening

Example: Training on cluttered scenes → Model learns to detect partially visible obstacles.


Domain Randomization in Code

Here's how Isaac Sim implements domain randomization using the Replicator API:

import omni.replicator.core as rep

# Register randomization function
def randomize_scene():
# Randomize lighting
lights = rep.get.prims(semantics=[("class", "light")])
with lights:
rep.modify.attribute("intensity", rep.distribution.uniform(500, 2000))
rep.modify.attribute("color", rep.distribution.uniform((0.8, 0.8, 0.8), (1.0, 1.0, 1.0)))

# Randomize textures
walls = rep.get.prims(path_pattern="/World/Walls/*")
with walls:
rep.randomizer.texture(
textures=rep.distribution.choice([
"/Materials/Brick.mdl",
"/Materials/Concrete.mdl",
"/Materials/Drywall.mdl"
])
)

# Randomize object placement
obstacles = rep.get.prims(semantics=[("class", "obstacle")])
with obstacles:
rep.modify.pose(
position=rep.distribution.uniform((-5, 0, -5), (5, 0, 5)),
rotation=rep.distribution.uniform((0, 0, 0), (0, 360, 0))
)

# Randomize camera parameters
camera = rep.get.prims(path_pattern="/World/Camera")
with camera:
rep.modify.attribute("focalLength", rep.distribution.uniform(18, 35)) # mm
rep.modify.attribute("fStop", rep.distribution.uniform(2.8, 8.0))

# Register and trigger
rep.randomizer.register(randomize_scene)

How This Works:

  1. rep.get.prims() selects scene elements (lights, walls, objects)
  2. rep.modify.attribute() changes properties (intensity, color, position)
  3. rep.distribution.uniform() samples random values from ranges
  4. rep.randomizer.register() runs randomization before each frame capture

Result: Rendering 1,000 frames produces 1,000 unique environments — equivalent to collecting data in 1,000 different real-world locations.


Setting Up Your First Isaac Sim Scene

Prerequisites

  • Hardware: NVIDIA RTX GPU (RTX 3060 or higher recommended)
  • Software: Isaac Sim 4.0+ (requires Omniverse Launcher)
  • OS: Ubuntu 22.04 LTS or Windows 10/11

Installation (see quickstart.md for full instructions):

  1. Install NVIDIA Omniverse Launcher
  2. Install Isaac Sim from the Omniverse app library
  3. Verify installation: Launch Isaac Sim → should see viewport with default scene

Creating a Simple Navigation Environment

Scenario: A 10m x 10m room with obstacles for humanoid navigation testing.

Step 1: Create the Scene Structure

from pxr import Gf, UsdGeom, UsdPhysics
from omni.isaac.core import World
from omni.isaac.core.utils.stage import add_reference_to_stage

# Initialize Isaac Sim world
world = World(stage_units_in_meters=1.0)
stage = world.stage

# Create room floor (10m x 10m)
floor_path = "/World/Floor"
floor = UsdGeom.Cube.Define(stage, floor_path)
floor.GetSizeAttr().Set(1.0)
floor.AddScaleOp().Set(Gf.Vec3f(10.0, 0.1, 10.0)) # 10m x 0.1m x 10m
floor.AddTranslateOp().Set(Gf.Vec3f(0.0, -0.05, 0.0))

# Create walls (4 walls forming a box)
def create_wall(path, position, scale):
wall = UsdGeom.Cube.Define(stage, path)
wall.GetSizeAttr().Set(1.0)
wall.AddScaleOp().Set(Gf.Vec3f(*scale))
wall.AddTranslateOp().Set(Gf.Vec3f(*position))
return wall

create_wall("/World/WallNorth", (0, 1.5, 5), (10, 3, 0.2))
create_wall("/World/WallSouth", (0, 1.5, -5), (10, 3, 0.2))
create_wall("/World/WallEast", (5, 1.5, 0), (0.2, 3, 10))
create_wall("/World/WallWest", (-5, 1.5, 0), (0.2, 3, 10))

Explanation:

  • UsdGeom.Cube.Define(): Creates a cube primitive at the specified path
  • AddScaleOp(): Scales the cube to desired dimensions (X, Y, Z in meters)
  • AddTranslateOp(): Positions the cube in world coordinates

Step 2: Add a Humanoid Robot

Isaac Sim includes pre-built robot assets. We'll use a generic humanoid:

# Import humanoid robot USD asset
robot_path = "/World/Humanoid"
robot_usd_path = "omniverse://localhost/NVIDIA/Assets/Isaac/4.0/Isaac/Robots/Humanoid/humanoid.usd"
add_reference_to_stage(robot_usd_path, robot_path)

# Position robot at origin
robot_prim = stage.GetPrimAtPath(robot_path)
UsdGeom.Xformable(robot_prim).AddTranslateOp().Set(Gf.Vec3f(0.0, 0.0, 0.0))

Reality Check: The humanoid.usd file contains:

  • Mesh geometry: Visual appearance (body, limbs, head)
  • Collision shapes: Simplified geometry for physics
  • Joints: Articulations with DOF (degrees of freedom)
  • Sensors: Cameras, IMU (if configured)

Step 3: Configure Cameras and Sensors

Add stereo cameras (left/right) for VSLAM and depth sensing:

from omni.isaac.sensor import Camera

# Left camera
camera_left = Camera(
prim_path="/World/Humanoid/camera_left",
position=np.array([0.1, 1.6, 0.05]), # 1.6m height (head), 0.1m forward, 0.05m left
frequency=30, # 30 Hz
resolution=(1280, 720),
orientation=np.array([1, 0, 0, 0]) # Quaternion (forward-facing)
)
camera_left.initialize()

# Right camera (stereo pair, 12cm baseline)
camera_right = Camera(
prim_path="/World/Humanoid/camera_right",
position=np.array([0.1, 1.6, -0.05]), # Same height, 0.05m right
frequency=30,
resolution=(1280, 720),
orientation=np.array([1, 0, 0, 0])
)
camera_right.initialize()

# Depth camera (co-located with left camera for alignment)
camera_depth = Camera(
prim_path="/World/Humanoid/camera_depth",
position=np.array([0.1, 1.6, 0.05]),
frequency=30,
resolution=(640, 480),
orientation=np.array([1, 0, 0, 0])
)
camera_depth.set_depth_enabled(True) # Enable depth output
camera_depth.initialize()

Camera Coordinate Frame:

  • X: Forward (robot's front)
  • Y: Up (perpendicular to ground)
  • Z: Left (robot's left side)

Baseline Explanation: 12cm (0.12m) separation between left/right cameras enables stereo depth estimation. Larger baseline → better depth accuracy at long range, worse at close range.


Step 4: Add Physics and Lighting

# Add physics scene (required for gravity, collisions)
from omni.isaac.core.utils.physics import add_ground_plane
from pxr import UsdPhysics, PhysxSchema

# Configure physics
scene = UsdPhysics.Scene.Define(stage, "/World/PhysicsScene")
scene.CreateGravityDirectionAttr().Set(Gf.Vec3f(0.0, -1.0, 0.0))
scene.CreateGravityMagnitudeAttr().Set(9.81)

# Add PhysX scene settings
physx_scene = PhysxSchema.PhysxSceneAPI.Apply(scene.GetPrim())
physx_scene.CreateEnableCCDAttr().Set(True) # Continuous collision detection
physx_scene.CreateEnableGPUDynamicsAttr().Set(True) # GPU acceleration

# Add lighting (dome light for ambient + directional for sun)
from omni.isaac.core.utils.prims import create_prim

dome_light = create_prim(
"/World/DomeLight",
"DomeLight",
attributes={"intensity": 1000}
)

sun_light = create_prim(
"/World/SunLight",
"DistantLight",
attributes={
"intensity": 3000,
"angle": 0.53, # Angular size (degrees) - matches sun
"color": (1.0, 1.0, 0.95) # Slightly warm white
}
)

Physics Notes:

  • CCD (Continuous Collision Detection): Prevents fast-moving objects from passing through thin surfaces
  • GPU Dynamics: Offloads physics computation to GPU for faster simulation

Exporting Synthetic Datasets

Full Export Script

This script captures 1,000 frames with randomization:

import omni.replicator.core as rep
from pathlib import Path

class SyntheticDataExporter:
def __init__(self, output_dir: str, camera_paths: list):
self.output_dir = Path(output_dir)
self.camera_paths = camera_paths
self._setup_output_directories()
self._setup_writers()

def _setup_output_directories(self):
"""Create output directory structure."""
(self.output_dir / "rgb").mkdir(parents=True, exist_ok=True)
(self.output_dir / "depth").mkdir(parents=True, exist_ok=True)
(self.output_dir / "semantic").mkdir(parents=True, exist_ok=True)
(self.output_dir / "annotations").mkdir(parents=True, exist_ok=True)

def _setup_writers(self):
"""Configure Replicator writers for data export."""
# RGB writer
self.rgb_writer = rep.WriterRegistry.get("BasicWriter")
self.rgb_writer.initialize(
output_dir=str(self.output_dir / "rgb"),
rgb=True,
distance_to_camera=False
)

# Depth writer
self.depth_writer = rep.WriterRegistry.get("BasicWriter")
self.depth_writer.initialize(
output_dir=str(self.output_dir / "depth"),
distance_to_camera=True,
rgb=False
)

# Semantic segmentation writer
self.semantic_writer = rep.WriterRegistry.get("BasicWriter")
self.semantic_writer.initialize(
output_dir=str(self.output_dir / "semantic"),
semantic_segmentation=True,
rgb=False
)

# COCO annotation writer (for object detection)
self.coco_writer = rep.WriterRegistry.get("COCOWriter")
self.coco_writer.initialize(
output_dir=str(self.output_dir / "annotations"),
bbox_2d_tight=True,
semantic_types=["class"]
)

def export_dataset(self, num_frames: int = 1000):
"""Export synthetic dataset with domain randomization."""
# Attach cameras to render products
render_products = []
for camera_path in self.camera_paths:
rp = rep.create.render_product(camera_path, (1280, 720))
render_products.append(rp)

# Attach writers to render products
self.rgb_writer.attach(render_products)
self.depth_writer.attach(render_products)
self.semantic_writer.attach(render_products)
self.coco_writer.attach(render_products)

# Run randomization and capture
with rep.trigger.on_frame(num_frames=num_frames):
randomize_scene() # Call randomization function from earlier

# Execute orchestrator
rep.orchestrator.run()

print(f"✅ Exported {num_frames} frames to {self.output_dir}")
return {
"num_frames": num_frames,
"output_dir": str(self.output_dir),
"data_types": ["rgb", "depth", "semantic", "bbox_2d"]
}

# Usage
exporter = SyntheticDataExporter(
output_dir="/workspace/datasets/humanoid_nav_v1",
camera_paths=["/World/Humanoid/camera_left", "/World/Humanoid/camera_right"]
)
exporter.export_dataset(num_frames=1000)

Performance: On RTX 4080, this exports ~10-15 FPS → 1,000 frames in ~70 seconds.


Output Dataset Structure

/workspace/datasets/humanoid_nav_v1/
├── rgb/
│ ├── rgb_0000.png
│ ├── rgb_0001.png
│ └── ...
├── depth/
│ ├── depth_0000.exr # 32-bit float, values in meters
│ ├── depth_0001.exr
│ └── ...
├── semantic/
│ ├── semantic_0000.png # Color-coded segmentation
│ ├── semantic_0001.png
│ └── ...
└── annotations/
└── coco_annotations.json # COCO format bounding boxes

COCO Annotations Example:

{
"images": [
{"id": 1, "file_name": "rgb_0000.png", "width": 1280, "height": 720}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1, # "chair"
"bbox": [450, 320, 180, 240], # [x, y, width, height]
"area": 43200,
"iscrowd": 0
}
],
"categories": [
{"id": 1, "name": "chair"},
{"id": 2, "name": "table"},
{"id": 3, "name": "person"}
]
}

Validating Sim-to-Real Transfer

Common Transfer Failures

Problem 1: Sim data is "too perfect"

  • Symptom: Model achieves 99% accuracy in sim, 60% on real robot
  • Cause: No sensor noise, motion blur, or lens distortion in sim
  • Fix: Add camera randomization (noise, blur, distortion)

Problem 2: Lighting mismatch

  • Symptom: Model fails in dimly lit real environments
  • Cause: All sim training done with bright, uniform lighting
  • Fix: Randomize lighting intensity (500-2000 lux range)

Problem 3: Texture bias

  • Symptom: Model detects "blue chairs" but not "red chairs"
  • Cause: All chairs in sim had blue texture
  • Fix: Randomize object colors and textures

Validation Workflow

  1. Train on synthetic data (1,000 images from Isaac Sim)
  2. Evaluate on synthetic test set (200 held-out sim images) → Expect 95%+ accuracy
  3. Collect real-world validation set (50 images from physical robot)
  4. Evaluate on real test set → Target 85%+ accuracy (10% gap is acceptable)
  5. If gap > 15%: Analyze failure modes and add missing randomization

Example Results (obstacle detection):

  • Sim test accuracy: 97%
  • Real test accuracy: 89%
  • Gap: 8% → Acceptable sim-to-real transfer

Reality Check: When Simulation Isn't Enough

Limitations:

  1. Physics inaccuracies: Friction, deformation, fluid dynamics are approximations
  2. Sensor models: Real cameras have lens aberrations, rolling shutter effects not in sim
  3. Rare events: Simulation may not capture edge cases (e.g., specular reflections from glass)

Best Practice: Use sim for initial training (90% of data), then fine-tune on small real-world dataset (10% of data).

Hybrid Approach:

  • Generate 10,000 synthetic images in Isaac Sim
  • Collect 1,000 real images from robot
  • Train model on combined dataset (11,000 images)
  • Result: Model inherits sim diversity + real-world realism

Check Your Understanding

  1. What is the primary advantage of synthetic data over real data?

    • A) Higher image resolution
    • B) Automatic ground-truth labels without manual annotation
    • C) Better color accuracy
    • Answer: B
  2. What does domain randomization prevent?

    • A) Overfitting to specific environments (e.g., one room layout)
    • B) GPU overheating during rendering
    • C) Physics simulation errors
    • Answer: A
  3. Which Isaac Sim feature enables realistic shadows and reflections?

    • A) Material Definition Language (MDL)
    • B) RTX ray tracing
    • C) Replicator API
    • Answer: B
  4. What is a typical sim-to-real accuracy gap for well-randomized datasets?

    • A) 50% (model completely fails on real data)
    • B) 5-15% (acceptable transfer with some fine-tuning)
    • C) 0% (perfect transfer)
    • Answer: B
  5. Which file format does Isaac Sim use for depth maps?

    • A) .png (8-bit integer)
    • B) .jpg (compressed image)
    • C) .exr (32-bit float)
    • Answer: C

Key Takeaways

Synthetic data solves labeling bottleneck: Generate 1,000 labeled images in minutes vs. weeks of manual work

Photorealism requires ray tracing + materials: RTX GPUs enable realistic lighting and textures matching real cameras

Domain randomization prevents overfitting: Train on diverse environments → Model generalizes to new scenes

Replicator API automates data export: One script exports RGB, depth, segmentation, and bounding boxes simultaneously

Sim-to-real gap is manageable: 5-15% accuracy drop is expected; fine-tune on small real dataset to close the gap


Next: 3.3 Isaac ROS - Real-Time Visual SLAM