3.2 Isaac Sim: Photorealistic Simulation & Synthetic Data Generation
Introduction
The Data Problem: Training a vision model to detect obstacles requires 10,000+ labeled images. Collecting and labeling this data manually takes weeks. Running physical robots risks damage during trial-and-error learning.
The Simulation Solution: NVIDIA Isaac Sim generates photorealistic synthetic data at scale — 1,000 labeled images in minutes, not weeks. Train perception models in simulation, then deploy them on real robots with minimal accuracy loss.
This section teaches you to:
- Set up Isaac Sim and create a virtual humanoid environment
- Configure cameras, sensors, and domain randomization
- Export synthetic datasets for AI model training
- Validate sim-to-real transfer quality
Real-World Impact: Companies like Amazon Robotics and BMW use Isaac Sim to train warehouse robots and autonomous vehicles without risking physical hardware during development.
Why Photorealistic Simulation Matters
The Reality Gap
Challenge: Models trained on synthetic data often fail on real robots because simulated environments don't match reality:
- Lighting: Simulated lights are too perfect (no glare, reflections, or shadows)
- Textures: Materials look artificial (plastics too shiny, walls too smooth)
- Physics: Objects behave unrealistically (no friction variation, perfect collisions)
- Sensor noise: Cameras and depth sensors produce perfect data (no motion blur, lens distortion, or interference)
Isaac Sim's Approach: Ray-traced rendering + Domain randomization + Sensor noise models bridge the reality gap.
Isaac Sim's Photorealism Features
1. RTX Ray Tracing
Purpose: Physically accurate lighting that matches real-world cameras
How It Works:
- Simulates millions of light rays bouncing between surfaces
- Produces realistic shadows (soft shadows from area lights, sharp shadows from point lights)
- Captures reflections (metallic surfaces, glossy floors, glass)
- Models global illumination (indirect lighting from light bouncing off walls)
Example: A camera viewing a shiny metal surface will see reflections of overhead lights and nearby objects — just like a real camera would.
Performance: NVIDIA RTX GPUs (RTX 3060 or higher) render at 30-60 FPS with ray tracing enabled.
2. Material System (MDL - Material Definition Language)
Purpose: Create realistic surfaces with physical properties
Material Properties:
- Albedo (base color): RGB color under neutral lighting
- Roughness: Controls glossiness (0.0 = mirror, 1.0 = matte)
- Metallic: Metallic vs. dielectric surfaces (0.0 = plastic, 1.0 = metal)
- Normal maps: Add surface detail without extra geometry (scratches, bumps, grain)
Example: A wood floor material has:
- Albedo: brown color
- Roughness: 0.6 (semi-matte)
- Metallic: 0.0 (non-metal)
- Normal map: wood grain texture
Result: Camera sees realistic wood grain with directional highlights — indistinguishable from real wood.
3. Depth and Segmentation Sensors
Purpose: Generate ground-truth labels automatically (no manual annotation!)
Sensor Types:
- RGB Camera: Standard color images (matching physical cameras)
- Depth Camera: Distance to each pixel (in meters)
- Semantic Segmentation: Per-pixel object class labels (wall, floor, person, furniture)
- Instance Segmentation: Unique ID per object instance (person_01, person_02, chair_01)
- Bounding Boxes: 2D/3D boxes around objects for object detection training
Example Output (single frame):
/output/
├── rgb_0001.png # Color image (1920x1080)
├── depth_0001.exr # Depth map (32-bit float, meters)
├── semantic_0001.png # Segmentation (each color = object class)
├── instance_0001.json # Instance IDs and bounding boxes
└── camera_info.yaml # Camera intrinsics (focal length, distortion)
Key Advantage: Ground truth is perfect — no human labeling errors. A depth map shows exact distances (accurate to 0.01 meters in simulation).
Domain Randomization: Teaching Robustness
The Problem
A model trained on one environment (e.g., a white-walled lab) fails when deployed in a different environment (e.g., a textured office).
Example: An obstacle detection model trained on images with:
- White walls
- Bright fluorescent lighting
- Clean floors
...will fail when tested on:
- Brick walls (different texture)
- Dim natural lighting (different brightness)
- Cluttered floors (different context)
The Solution: Domain Randomization
Concept: Train on thousands of randomized environments so the model learns features that generalize across all variations.
Randomization Parameters:
Lighting Randomization
- Intensity: Vary brightness (500-2000 lux for indoor scenes)
- Color temperature: Warm (2700K) to cool (6500K) lighting
- Direction: Randomize sun angle, add/remove point lights
- Shadows: Hard vs. soft shadows by varying light size
Example: Same scene rendered with 10 different lighting conditions → Model learns to detect obstacles under any lighting.
Texture Randomization
- Wall materials: Brick, concrete, drywall, wood paneling
- Floor materials: Tile, carpet, hardwood, linoleum
- Object colors: Randomize furniture colors, decals, posters
Example: A "chair" appears with 50 different textures → Model learns chair shape, not specific texture.
Camera Randomization
- Exposure: Simulate over/underexposed images
- Motion blur: Simulate fast robot motion during capture
- Lens distortion: Add barrel/pincushion distortion matching real cameras
- Noise: Add sensor noise (ISO 100-3200 equivalent)
Example: Training on blurry, noisy, poorly exposed images → Model is robust to real-world camera imperfections.
Clutter and Occlusion
- Random objects: Place boxes, chairs, plants in random positions
- Partial occlusions: Objects partially hidden behind others
- Dynamic elements: People walking, doors opening
Example: Training on cluttered scenes → Model learns to detect partially visible obstacles.
Domain Randomization in Code
Here's how Isaac Sim implements domain randomization using the Replicator API:
import omni.replicator.core as rep
# Register randomization function
def randomize_scene():
# Randomize lighting
lights = rep.get.prims(semantics=[("class", "light")])
with lights:
rep.modify.attribute("intensity", rep.distribution.uniform(500, 2000))
rep.modify.attribute("color", rep.distribution.uniform((0.8, 0.8, 0.8), (1.0, 1.0, 1.0)))
# Randomize textures
walls = rep.get.prims(path_pattern="/World/Walls/*")
with walls:
rep.randomizer.texture(
textures=rep.distribution.choice([
"/Materials/Brick.mdl",
"/Materials/Concrete.mdl",
"/Materials/Drywall.mdl"
])
)
# Randomize object placement
obstacles = rep.get.prims(semantics=[("class", "obstacle")])
with obstacles:
rep.modify.pose(
position=rep.distribution.uniform((-5, 0, -5), (5, 0, 5)),
rotation=rep.distribution.uniform((0, 0, 0), (0, 360, 0))
)
# Randomize camera parameters
camera = rep.get.prims(path_pattern="/World/Camera")
with camera:
rep.modify.attribute("focalLength", rep.distribution.uniform(18, 35)) # mm
rep.modify.attribute("fStop", rep.distribution.uniform(2.8, 8.0))
# Register and trigger
rep.randomizer.register(randomize_scene)
How This Works:
rep.get.prims()selects scene elements (lights, walls, objects)rep.modify.attribute()changes properties (intensity, color, position)rep.distribution.uniform()samples random values from rangesrep.randomizer.register()runs randomization before each frame capture
Result: Rendering 1,000 frames produces 1,000 unique environments — equivalent to collecting data in 1,000 different real-world locations.
Setting Up Your First Isaac Sim Scene
Prerequisites
- Hardware: NVIDIA RTX GPU (RTX 3060 or higher recommended)
- Software: Isaac Sim 4.0+ (requires Omniverse Launcher)
- OS: Ubuntu 22.04 LTS or Windows 10/11
Installation (see quickstart.md for full instructions):
- Install NVIDIA Omniverse Launcher
- Install Isaac Sim from the Omniverse app library
- Verify installation: Launch Isaac Sim → should see viewport with default scene
Creating a Simple Navigation Environment
Scenario: A 10m x 10m room with obstacles for humanoid navigation testing.
Step 1: Create the Scene Structure
from pxr import Gf, UsdGeom, UsdPhysics
from omni.isaac.core import World
from omni.isaac.core.utils.stage import add_reference_to_stage
# Initialize Isaac Sim world
world = World(stage_units_in_meters=1.0)
stage = world.stage
# Create room floor (10m x 10m)
floor_path = "/World/Floor"
floor = UsdGeom.Cube.Define(stage, floor_path)
floor.GetSizeAttr().Set(1.0)
floor.AddScaleOp().Set(Gf.Vec3f(10.0, 0.1, 10.0)) # 10m x 0.1m x 10m
floor.AddTranslateOp().Set(Gf.Vec3f(0.0, -0.05, 0.0))
# Create walls (4 walls forming a box)
def create_wall(path, position, scale):
wall = UsdGeom.Cube.Define(stage, path)
wall.GetSizeAttr().Set(1.0)
wall.AddScaleOp().Set(Gf.Vec3f(*scale))
wall.AddTranslateOp().Set(Gf.Vec3f(*position))
return wall
create_wall("/World/WallNorth", (0, 1.5, 5), (10, 3, 0.2))
create_wall("/World/WallSouth", (0, 1.5, -5), (10, 3, 0.2))
create_wall("/World/WallEast", (5, 1.5, 0), (0.2, 3, 10))
create_wall("/World/WallWest", (-5, 1.5, 0), (0.2, 3, 10))
Explanation:
UsdGeom.Cube.Define(): Creates a cube primitive at the specified pathAddScaleOp(): Scales the cube to desired dimensions (X, Y, Z in meters)AddTranslateOp(): Positions the cube in world coordinates
Step 2: Add a Humanoid Robot
Isaac Sim includes pre-built robot assets. We'll use a generic humanoid:
# Import humanoid robot USD asset
robot_path = "/World/Humanoid"
robot_usd_path = "omniverse://localhost/NVIDIA/Assets/Isaac/4.0/Isaac/Robots/Humanoid/humanoid.usd"
add_reference_to_stage(robot_usd_path, robot_path)
# Position robot at origin
robot_prim = stage.GetPrimAtPath(robot_path)
UsdGeom.Xformable(robot_prim).AddTranslateOp().Set(Gf.Vec3f(0.0, 0.0, 0.0))
Reality Check: The humanoid.usd file contains:
- Mesh geometry: Visual appearance (body, limbs, head)
- Collision shapes: Simplified geometry for physics
- Joints: Articulations with DOF (degrees of freedom)
- Sensors: Cameras, IMU (if configured)
Step 3: Configure Cameras and Sensors
Add stereo cameras (left/right) for VSLAM and depth sensing:
from omni.isaac.sensor import Camera
# Left camera
camera_left = Camera(
prim_path="/World/Humanoid/camera_left",
position=np.array([0.1, 1.6, 0.05]), # 1.6m height (head), 0.1m forward, 0.05m left
frequency=30, # 30 Hz
resolution=(1280, 720),
orientation=np.array([1, 0, 0, 0]) # Quaternion (forward-facing)
)
camera_left.initialize()
# Right camera (stereo pair, 12cm baseline)
camera_right = Camera(
prim_path="/World/Humanoid/camera_right",
position=np.array([0.1, 1.6, -0.05]), # Same height, 0.05m right
frequency=30,
resolution=(1280, 720),
orientation=np.array([1, 0, 0, 0])
)
camera_right.initialize()
# Depth camera (co-located with left camera for alignment)
camera_depth = Camera(
prim_path="/World/Humanoid/camera_depth",
position=np.array([0.1, 1.6, 0.05]),
frequency=30,
resolution=(640, 480),
orientation=np.array([1, 0, 0, 0])
)
camera_depth.set_depth_enabled(True) # Enable depth output
camera_depth.initialize()
Camera Coordinate Frame:
- X: Forward (robot's front)
- Y: Up (perpendicular to ground)
- Z: Left (robot's left side)
Baseline Explanation: 12cm (0.12m) separation between left/right cameras enables stereo depth estimation. Larger baseline → better depth accuracy at long range, worse at close range.
Step 4: Add Physics and Lighting
# Add physics scene (required for gravity, collisions)
from omni.isaac.core.utils.physics import add_ground_plane
from pxr import UsdPhysics, PhysxSchema
# Configure physics
scene = UsdPhysics.Scene.Define(stage, "/World/PhysicsScene")
scene.CreateGravityDirectionAttr().Set(Gf.Vec3f(0.0, -1.0, 0.0))
scene.CreateGravityMagnitudeAttr().Set(9.81)
# Add PhysX scene settings
physx_scene = PhysxSchema.PhysxSceneAPI.Apply(scene.GetPrim())
physx_scene.CreateEnableCCDAttr().Set(True) # Continuous collision detection
physx_scene.CreateEnableGPUDynamicsAttr().Set(True) # GPU acceleration
# Add lighting (dome light for ambient + directional for sun)
from omni.isaac.core.utils.prims import create_prim
dome_light = create_prim(
"/World/DomeLight",
"DomeLight",
attributes={"intensity": 1000}
)
sun_light = create_prim(
"/World/SunLight",
"DistantLight",
attributes={
"intensity": 3000,
"angle": 0.53, # Angular size (degrees) - matches sun
"color": (1.0, 1.0, 0.95) # Slightly warm white
}
)
Physics Notes:
- CCD (Continuous Collision Detection): Prevents fast-moving objects from passing through thin surfaces
- GPU Dynamics: Offloads physics computation to GPU for faster simulation
Exporting Synthetic Datasets
Full Export Script
This script captures 1,000 frames with randomization:
import omni.replicator.core as rep
from pathlib import Path
class SyntheticDataExporter:
def __init__(self, output_dir: str, camera_paths: list):
self.output_dir = Path(output_dir)
self.camera_paths = camera_paths
self._setup_output_directories()
self._setup_writers()
def _setup_output_directories(self):
"""Create output directory structure."""
(self.output_dir / "rgb").mkdir(parents=True, exist_ok=True)
(self.output_dir / "depth").mkdir(parents=True, exist_ok=True)
(self.output_dir / "semantic").mkdir(parents=True, exist_ok=True)
(self.output_dir / "annotations").mkdir(parents=True, exist_ok=True)
def _setup_writers(self):
"""Configure Replicator writers for data export."""
# RGB writer
self.rgb_writer = rep.WriterRegistry.get("BasicWriter")
self.rgb_writer.initialize(
output_dir=str(self.output_dir / "rgb"),
rgb=True,
distance_to_camera=False
)
# Depth writer
self.depth_writer = rep.WriterRegistry.get("BasicWriter")
self.depth_writer.initialize(
output_dir=str(self.output_dir / "depth"),
distance_to_camera=True,
rgb=False
)
# Semantic segmentation writer
self.semantic_writer = rep.WriterRegistry.get("BasicWriter")
self.semantic_writer.initialize(
output_dir=str(self.output_dir / "semantic"),
semantic_segmentation=True,
rgb=False
)
# COCO annotation writer (for object detection)
self.coco_writer = rep.WriterRegistry.get("COCOWriter")
self.coco_writer.initialize(
output_dir=str(self.output_dir / "annotations"),
bbox_2d_tight=True,
semantic_types=["class"]
)
def export_dataset(self, num_frames: int = 1000):
"""Export synthetic dataset with domain randomization."""
# Attach cameras to render products
render_products = []
for camera_path in self.camera_paths:
rp = rep.create.render_product(camera_path, (1280, 720))
render_products.append(rp)
# Attach writers to render products
self.rgb_writer.attach(render_products)
self.depth_writer.attach(render_products)
self.semantic_writer.attach(render_products)
self.coco_writer.attach(render_products)
# Run randomization and capture
with rep.trigger.on_frame(num_frames=num_frames):
randomize_scene() # Call randomization function from earlier
# Execute orchestrator
rep.orchestrator.run()
print(f"✅ Exported {num_frames} frames to {self.output_dir}")
return {
"num_frames": num_frames,
"output_dir": str(self.output_dir),
"data_types": ["rgb", "depth", "semantic", "bbox_2d"]
}
# Usage
exporter = SyntheticDataExporter(
output_dir="/workspace/datasets/humanoid_nav_v1",
camera_paths=["/World/Humanoid/camera_left", "/World/Humanoid/camera_right"]
)
exporter.export_dataset(num_frames=1000)
Performance: On RTX 4080, this exports ~10-15 FPS → 1,000 frames in ~70 seconds.
Output Dataset Structure
/workspace/datasets/humanoid_nav_v1/
├── rgb/
│ ├── rgb_0000.png
│ ├── rgb_0001.png
│ └── ...
├── depth/
│ ├── depth_0000.exr # 32-bit float, values in meters
│ ├── depth_0001.exr
│ └── ...
├── semantic/
│ ├── semantic_0000.png # Color-coded segmentation
│ ├── semantic_0001.png
│ └── ...
└── annotations/
└── coco_annotations.json # COCO format bounding boxes
COCO Annotations Example:
{
"images": [
{"id": 1, "file_name": "rgb_0000.png", "width": 1280, "height": 720}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1, # "chair"
"bbox": [450, 320, 180, 240], # [x, y, width, height]
"area": 43200,
"iscrowd": 0
}
],
"categories": [
{"id": 1, "name": "chair"},
{"id": 2, "name": "table"},
{"id": 3, "name": "person"}
]
}
Validating Sim-to-Real Transfer
Common Transfer Failures
Problem 1: Sim data is "too perfect"
- Symptom: Model achieves 99% accuracy in sim, 60% on real robot
- Cause: No sensor noise, motion blur, or lens distortion in sim
- Fix: Add camera randomization (noise, blur, distortion)
Problem 2: Lighting mismatch
- Symptom: Model fails in dimly lit real environments
- Cause: All sim training done with bright, uniform lighting
- Fix: Randomize lighting intensity (500-2000 lux range)
Problem 3: Texture bias
- Symptom: Model detects "blue chairs" but not "red chairs"
- Cause: All chairs in sim had blue texture
- Fix: Randomize object colors and textures
Validation Workflow
- Train on synthetic data (1,000 images from Isaac Sim)
- Evaluate on synthetic test set (200 held-out sim images) → Expect 95%+ accuracy
- Collect real-world validation set (50 images from physical robot)
- Evaluate on real test set → Target 85%+ accuracy (10% gap is acceptable)
- If gap > 15%: Analyze failure modes and add missing randomization
Example Results (obstacle detection):
- Sim test accuracy: 97%
- Real test accuracy: 89%
- Gap: 8% → Acceptable sim-to-real transfer
Reality Check: When Simulation Isn't Enough
Limitations:
- Physics inaccuracies: Friction, deformation, fluid dynamics are approximations
- Sensor models: Real cameras have lens aberrations, rolling shutter effects not in sim
- Rare events: Simulation may not capture edge cases (e.g., specular reflections from glass)
Best Practice: Use sim for initial training (90% of data), then fine-tune on small real-world dataset (10% of data).
Hybrid Approach:
- Generate 10,000 synthetic images in Isaac Sim
- Collect 1,000 real images from robot
- Train model on combined dataset (11,000 images)
- Result: Model inherits sim diversity + real-world realism
Check Your Understanding
-
What is the primary advantage of synthetic data over real data?
- A) Higher image resolution
- B) Automatic ground-truth labels without manual annotation
- C) Better color accuracy
- Answer: B
-
What does domain randomization prevent?
- A) Overfitting to specific environments (e.g., one room layout)
- B) GPU overheating during rendering
- C) Physics simulation errors
- Answer: A
-
Which Isaac Sim feature enables realistic shadows and reflections?
- A) Material Definition Language (MDL)
- B) RTX ray tracing
- C) Replicator API
- Answer: B
-
What is a typical sim-to-real accuracy gap for well-randomized datasets?
- A) 50% (model completely fails on real data)
- B) 5-15% (acceptable transfer with some fine-tuning)
- C) 0% (perfect transfer)
- Answer: B
-
Which file format does Isaac Sim use for depth maps?
- A)
.png(8-bit integer) - B)
.jpg(compressed image) - C)
.exr(32-bit float) - Answer: C
- A)
Key Takeaways
✅ Synthetic data solves labeling bottleneck: Generate 1,000 labeled images in minutes vs. weeks of manual work
✅ Photorealism requires ray tracing + materials: RTX GPUs enable realistic lighting and textures matching real cameras
✅ Domain randomization prevents overfitting: Train on diverse environments → Model generalizes to new scenes
✅ Replicator API automates data export: One script exports RGB, depth, segmentation, and bounding boxes simultaneously
✅ Sim-to-real gap is manageable: 5-15% accuracy drop is expected; fine-tune on small real dataset to close the gap