Maxence Boels

Artificial Intelligence Researcher

RC-Sim2Real (Work in Progress)

#ROBOTICS #SIM2REAL #VISION-LANGUAGE #ISAAC-LAB #AUTONOMOUS-NAVIGATION

Project Overview

RC-Sim2Real extends the QuantumTracer UGV project by implementing vision-language control through simulation-to-reality transfer. Instead of learning directly from real-world demonstrations, policies are trained entirely in photorealistic Isaac Lab simulations using 3D Gaussian Splatting reconstructions of real environments, then deployed to physical hardware.

The system enables natural language navigation commands like "Go to the kitchen" combined with onboard camera vision, with all training conducted in NVIDIA Isaac Sim and transferred to a Jetson Orin Nano-powered RC car.

Technical Implementation

Phase 1: Environment Reconstruction

  • 3D Gaussian Splatting: Photorealistic reconstruction using Nerfstudio
  • Mesh Extraction: Converting splats to collision-ready meshes
  • USD Integration: Importing environments into Isaac Sim with proper physics materials
  • Lighting Matching: Preserving real-world lighting conditions for visual fidelity

Phase 2: Robot Modeling

  • URDF Configuration: Accurate kinematic and dynamic model of RC car
  • Camera Calibration: Matching simulated camera to real hardware specs
  • Isaac Lab Integration: Custom robot asset with proper actuator configurations
  • Physics Validation: Tuning simulation parameters to match real-world behavior

Phase 3: Vision-Language Architecture

  • Vision Encoder: Pretrained CLIP model for robust visual understanding
  • Language Encoder: CLIP text encoder for natural language grounding
  • Policy Network: Fusion architecture mapping vision-language to continuous control
  • Multi-modal Training: Joint training on image observations and text instructions

Phase 4: Training Pipeline

  • Behavior Cloning: Initial policy from teleoperation demonstrations in simulation
  • PPO Fine-tuning: Reinforcement learning for performance optimization
  • Domain Randomization: Camera effects, lighting, physics, and texture variation
  • Curriculum Learning: Progressive instruction complexity during training

Phase 5: Deployment Architecture

  • Model Optimization: PyTorch → ONNX → TensorRT for Jetson inference
  • Jetson Orin Nano: Edge AI inference with sub-50ms latency
  • MCU Control: Real-time PWM control via Arduino/ESP32
  • Communication Protocol: Low-latency UART commands for steering and throttle

Target Performance

Simulation Benchmarks

  • Success Rate: Target >80% on diverse language instructions
  • Collision Rate: Target <10% during autonomous navigation
  • Instruction Following: Target >85% semantic alignment with commands
  • Training Efficiency: Convergence within 50-100 episodes

Real-World Deployment Goals

  • Sim2Real Success: Target >70% transfer rate to real hardware
  • Inference Latency: Target <100ms end-to-end (camera to motors)
  • Language Generalization: Understanding novel instruction variations
  • Safe Operation: Collision avoidance and emergency stop capabilities

Key Innovations

Photorealistic Sim2Real

Using 3D Gaussian Splatting to create high-fidelity digital twins of real environments, reducing the visual sim-to-real gap compared to traditional 3D modeling approaches.

Vision-Language Integration

Leveraging pretrained CLIP encoders for zero-shot generalization to novel instructions, enabling flexible natural language control without task-specific fine-tuning.

Domain Randomization

Comprehensive randomization of camera parameters, lighting conditions, physics properties, and floor textures to build robust policies that transfer to real-world variability.

Distributed Edge Inference

Optimized deployment pipeline using TensorRT FP16 quantization for real-time inference on Jetson Orin Nano, with modular MCU control for hardware abstraction.

System Architecture

Hardware Stack

  • RC Platform: Modified FTX Tracer Truggy with custom electronics
  • Compute: NVIDIA Jetson Orin Nano (8GB) for AI inference
  • Vision: 640x480@30fps camera with calibrated intrinsics
  • Control MCU: Arduino/ESP32 for PWM motor control
  • Communication: UART serial for low-latency commands

Software Stack

  • Simulation: NVIDIA Isaac Lab 0.47.1 + Isaac Sim 5.0.0
  • Training: PyTorch 2.7.0 with CUDA 12.8 support
  • Vision-Language: OpenAI CLIP (ViT-B/32) encoders
  • Deployment: TensorRT 8.6 with FP16 optimization
  • Framework: Custom Isaac Lab environments and tasks

Related Projects

This project builds upon the QuantumTracer UGV project, exploring a complementary sim-to-real approach with vision-language control. Key differences include simulation-first training (Isaac Lab vs. real-world), vision+language input (vs. vision-only), domain randomization transfer method, and Jetson Orin Nano deployment (vs. Raspberry Pi 5).