RC-Sim2Real (Work in Progress)

#ROBOTICS #SIM2REAL #VISION-LANGUAGE #ISAAC-LAB #AUTONOMOUS-NAVIGATION

Project Overview

RC-Sim2Real extends the QuantumTracer UGV project by implementing vision-language control through simulation-to-reality transfer. Instead of learning directly from real-world demonstrations, policies are trained entirely in photorealistic Isaac Lab simulations using 3D Gaussian Splatting reconstructions of real environments, then deployed to physical hardware.

The system enables natural language navigation commands like "Go to the kitchen" combined with onboard camera vision, with all training conducted in NVIDIA Isaac Sim and transferred to a Jetson Orin Nano-powered RC car.

Technical Implementation

Phase 1: Environment Reconstruction

3D Gaussian Splatting: Photorealistic reconstruction using Nerfstudio
Mesh Extraction: Converting splats to collision-ready meshes
USD Integration: Importing environments into Isaac Sim with proper physics materials
Lighting Matching: Preserving real-world lighting conditions for visual fidelity

Phase 2: Robot Modeling

URDF Configuration: Accurate kinematic and dynamic model of RC car
Camera Calibration: Matching simulated camera to real hardware specs
Isaac Lab Integration: Custom robot asset with proper actuator configurations
Physics Validation: Tuning simulation parameters to match real-world behavior

Phase 3: Vision-Language Architecture

Vision Encoder: Pretrained CLIP model for robust visual understanding
Language Encoder: CLIP text encoder for natural language grounding
Policy Network: Fusion architecture mapping vision-language to continuous control
Multi-modal Training: Joint training on image observations and text instructions

Phase 4: Training Pipeline

Behavior Cloning: Initial policy from teleoperation demonstrations in simulation
PPO Fine-tuning: Reinforcement learning for performance optimization
Domain Randomization: Camera effects, lighting, physics, and texture variation
Curriculum Learning: Progressive instruction complexity during training

Phase 5: Deployment Architecture

Model Optimization: PyTorch → ONNX → TensorRT for Jetson inference
Jetson Orin Nano: Edge AI inference with sub-50ms latency
MCU Control: Real-time PWM control via Arduino/ESP32
Communication Protocol: Low-latency UART commands for steering and throttle

Target Performance

Simulation Benchmarks

Success Rate: Target >80% on diverse language instructions
Collision Rate: Target <10% during autonomous navigation
Instruction Following: Target >85% semantic alignment with commands
Training Efficiency: Convergence within 50-100 episodes

Real-World Deployment Goals

Sim2Real Success: Target >70% transfer rate to real hardware
Inference Latency: Target <100ms end-to-end (camera to motors)
Language Generalization: Understanding novel instruction variations
Safe Operation: Collision avoidance and emergency stop capabilities

Key Innovations

Photorealistic Sim2Real

Using 3D Gaussian Splatting to create high-fidelity digital twins of real environments, reducing the visual sim-to-real gap compared to traditional 3D modeling approaches.

Vision-Language Integration

Leveraging pretrained CLIP encoders for zero-shot generalization to novel instructions, enabling flexible natural language control without task-specific fine-tuning.

Domain Randomization

Comprehensive randomization of camera parameters, lighting conditions, physics properties, and floor textures to build robust policies that transfer to real-world variability.

Distributed Edge Inference

Optimized deployment pipeline using TensorRT FP16 quantization for real-time inference on Jetson Orin Nano, with modular MCU control for hardware abstraction.

System Architecture

Hardware Stack

RC Platform: Modified FTX Tracer Truggy with custom electronics
Compute: NVIDIA Jetson Orin Nano (8GB) for AI inference
Vision: 640x480@30fps camera with calibrated intrinsics
Control MCU: Arduino/ESP32 for PWM motor control
Communication: UART serial for low-latency commands

Software Stack

Simulation: NVIDIA Isaac Lab 0.47.1 + Isaac Sim 5.0.0
Training: PyTorch 2.7.0 with CUDA 12.8 support
Vision-Language: OpenAI CLIP (ViT-B/32) encoders
Deployment: TensorRT 8.6 with FP16 optimization
Framework: Custom Isaac Lab environments and tasks

Related Projects

This project builds upon the QuantumTracer UGV project, exploring a complementary sim-to-real approach with vision-language control. Key differences include simulation-first training (Isaac Lab vs. real-world), vision+language input (vs. vision-only), domain randomization transfer method, and Jetson Orin Nano deployment (vs. Raspberry Pi 5).

Maxence Boels