Embodied Decision Intelligence Lab (EDI Lab) 清华大学具身决策智能实验室

RLinf团队与英伟达团队合作成果开源

医疗器械组装领域,有力验证了RLinf的能力,详见https://github.com/isaac-for-healthcare/i4h-workflows/tree/main/workflows/rheo

相关文档如下:

🔬 Technical Overview

Rheo is a blueprint for smart hospital automation and Physical AI development, designed for healthcare robotics researchers and developers building intelligent, autonomous clinical environments, starting with the Operating Room (OR). Healthcare faces a structural demand–capacity crisis: a projected global shortfall of millions of clinicians, costly OR inefficiencies, and billions of diagnostic exams with significant unmet demand. The future hospital must be automation-enabled, where robotics extends clinician capacity, increases procedural throughput, and democratizes access to high-quality care. However, hospitals are heterogeneous, high-stakes environments—every facility has different layouts, workflows, equipment, and patient populations—making it economically and operationally infeasible to capture exhaustive real-world training data across every edge case. Rheo addresses this through simulation-first development, providing a complete pipeline from digital twin composition through demonstration capture, synthetic data generation, policy training, and pre-deployment validation, all built on NVIDIA Isaac Sim and Isaac Lab.

The workflow provides an end-to-end development pipeline for Physical AI in clinical settings:

📋 Table of Contents


🚀 Quick Start

Clone the repository

  git clone https://github.com/isaac-for-healthcare/i4h-workflows.git
  cd i4h-workflows

Prepare models directories (default $HOME/models) and install huggingface-hub

mkdir -p $HOME/models
pip install huggingface-hub

Run Individual Task Inference and Evaluation

Estimated Setup Duration: 30-40 minutes for Cloning Repositories and Docker Build.

Loco-Manipulation Task: Surgical Tray Pick and Place

# Download GR00T-N1.6-Rheo-PickNPlaceTray model
hf download nvidia/GR00T-N1.6-Rheo-PickNPlaceTray --local-dir $HOME/models/GR00T-N1.6-Rheo-PickNPlaceTray
# Use Gr00t N1.6 environment
./workflows/rheo/docker/run_docker.sh -g1.6  \
  python scripts/simulation/examples/policy_runner.py \
    --policy_type gr00t_closedloop \
    --policy_config_yaml_path scripts/config/g1_gr00t_closedloop_pick_and_place_config.yaml \
    --num_steps 15000 \
    --enable_cameras \
    --success_hold_steps 150 \
    g1_locomanip_tray_pick_and_place \
    --object surgical_tray \
    --embodiment g1_wbc_joint

Manipulation Task: Assemble Trocar

# Download GR00T-N1.5-RL-Rheo-AssembleTrocar model
hf download nvidia/GR00T-N1.5-RL-Rheo-AssembleTrocar --local-dir $HOME/models/GR00T-N1.5-RL-Rheo-AssembleTrocar
# Use Gr00t N1.5 environment
./workflows/rheo/docker/run_docker.sh -g1.5 \
  python -u scripts/simulation/examples/eval_assemble_trocar.py \
    --enable_cameras \
    --task Isaac-Assemble-Trocar-G129-Dex3-Joint \
    --model_path /models/GR00T-N1.5-RL-Rheo-AssembleTrocar \
    --rl_ckpt \
    --num_episodes 10 \
    --max_steps 500

🏠 Environment Requirements

System Prerequisites Validation

GPU Architecture Requirements

Driver & System Requirements

Software Dependencies

⚡ Running Workflows

Running Agent Workflow

Run Physical Agent

This is a minimal example to run the physical agent in Isaac Sim, in which the camera is streamed to the VLM Surgical Agent Framework via WebRTC.

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/examples/triggered_policy_runner.py \
    --enable_cameras \
    --webrtc_cam \
    --webrtc_host 0.0.0.0 \
    --webrtc_port 8080 \
    --webrtc_fps 30 \
    --trigger_port 8081 \
    --trigger_host 0.0.0.0 \
    g1_locomanip_tray_pick_and_place \
    --object surgical_tray \
    --embodiment g1_wbc_joint

OR simply observe the surgical tray:

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/examples/observe_runner.py \
    --num_steps 15000 \
    --enable_cameras \
    --webrtc_cam \
    --webrtc_host 0.0.0.0 \
    --webrtc_port 8080 \
    --webrtc_fps 30 \
    observe_object \
    --object surgical_tray_no_lid \
    --embodiment g1_wbc_pink

Run the Vision Language Model (VLM) Agent

./tools/env_setup/install_vlm_surgical_agent_fx.sh

Open a web browser and navigate to http://127.0.0.1:8050 to see the camera stream. Click the “livestream” button to connect to http://localhost:8080 WebRTC server. Click the “play” button to start the video stream.

You can start interacting with the VLM agent by chatting with it or Use the “Start Mic” button to start speaking.

The agent will respond to your messages in real-time, as well as providing suggested actions based on the camera stream.

When you’d like to reset the environment, switch the window to the Isaac Sim window and press the “R” key to reset the environment. After the environment is reset, you also need to disconnect and reconnect the WebRTC server to reset the monitoring systems.

Individual Task Inference and Evaluation

Assemble Trocar

# Download GR00T-N1.5-RL-Rheo-AssembleTrocar model
hf download nvidia/GR00T-N1.5-RL-Rheo-AssembleTrocar --local-dir $HOME/models/GR00T-N1.5-RL-Rheo-AssembleTrocar
# Start Gr00t N1.5 container and run Assemble Trocar evaluation
./workflows/rheo/docker/run_docker.sh -g1.5 \
  python -u scripts/simulation/examples/eval_assemble_trocar.py \
    --enable_cameras \
    --task Isaac-Assemble-Trocar-G129-Dex3-Joint \
    --model_path /models/GR00T-N1.5-RL-Rheo-AssembleTrocar \
    --rl_ckpt \
    --num_episodes 10 \
    --max_steps 500

Notes:

Surgical Tray Pick and Place

# Download GR00T-N1.6-Rheo-Sim-PickNPlaceTray model
hf download nvidia/GR00T-N1.6-Rheo-PickNPlaceTray --local-dir $HOME/models/GR00T-N1.6-Rheo-PickNPlaceTray
# Use Gr00t N1.6 environment
./workflows/rheo/docker/run_docker.sh -g1.6  \
  python scripts/simulation/examples/policy_runner.py \
    --policy_type gr00t_closedloop \
    --policy_config_yaml_path scripts/config/g1_gr00t_closedloop_pick_and_place_config.yaml \
    --num_steps 15000 \
    --enable_cameras \
    --success_hold_steps 150 \
    g1_locomanip_tray_pick_and_place \
    --object surgical_tray \
    --embodiment g1_wbc_joint

Surgical Case Cart Pushing

# Download GR00T-N1.6-Rheo-Sim-PushCart model
hf download nvidia/GR00T-N1.6-Rheo-Sim-PushCart --local-dir $HOME/models/GR00T-N1.6-Rheo-Sim-PushCart
# Use Gr00t N1.6 environment
./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/examples/policy_runner.py \
    --policy_type gr00t_closedloop \
    --policy_config_yaml_path scripts/config/g1_gr00t_closedloop_push_cart_config.yaml \
    --num_steps 20000 \
    --enable_cameras \
    --success_hold_steps 45 \
    g1_locomanip_push_cart \
    --object cart \
    --embodiment g1_wbc_joint

Notes:

Create Your Own Tasks/Datasets/Models

You can create your own tasks/datasets/models by following the instructions below.

Tasks Setup

One of the core features of the Rheo blueprint is the rapid composition of new environments and tasks using the IsaacLab-Arena Concepts. Refer to g1_locomanip_tray_pick_and_place_environment.py for an example of how to define a locomotion-manipulation task—specifically, having the Unitree G1 robot pick up a surgical tray and place it onto a cart within a pre-operative room scene.

For precision, multi-stage bimanual manipulation such as Assemble Trocar, Rheo uses a focused Isaac Lab track where the OR twin is defined explicitly as a scene configuration: robot, cameras, USD scene, objects, and lighting. It follows the Isaac lab convention and uses this task package structure:

Data Collection

Locomanipulation tasks (Surgical Tray Pick and Place, Surgical Case Cart Pushing) support both keyboard and XR teleoperation. The trocar assembly task requires XR teleoperation only. For XR teleoperation, first follow the documentation to set up and connect Meta Quest.

Locomanipulation Tasks

Keyboard teleoperation:

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/record_demos_locomanip.py \
  --dataset_file /datasets/demo.hdf5 \
  --num_demos 1 \
  --num_success_steps 50 \
  --step_hz 50 \
  --pos_sensitivity 0.01 \
  --vel_sensitivity 0.2 \
  --enable_cameras \
  --mimic \
  g1_locomanip_tray_pick_and_place \
  --object surgical_tray \
  --embodiment g1_wbc_pink

Optionally, you can replay keyboard teleoperation demos:

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/replay_demos.py \
  --dataset_file /datasets/demo.hdf5 \
  --enable_cameras \
  g1_locomanip_tray_pick_and_place \
  --object surgical_tray \
  --embodiment g1_wbc_pink

XR teleoperation (Meta Quest):

./workflows/rheo/docker/run_docker.sh -g1.5 \
  python scripts/simulation/record_demos_locomanip.py \
  --dataset_file /datasets/tray_xr_demo.hdf5 \
  --num_demos 1 \
  --num_success_steps 50 \
  --enable_pinocchio \
  --enable_cameras \
  --xr \
  --teleop_device motion_controllers \
  g1_locomanip_tray_pick_and_place \
  --object surgical_tray \
  --embodiment g1_wbc_pink

Trocar Assembly Task (XR only)

./workflows/rheo/docker/run_docker.sh -g1.5 \
  python scripts/simulation/record_demos_assemble_trocar.py \
  --task Isaac-Assemble-Trocar-G129-Dex3-Teleop \
  --teleop_device motion_controllers \
  --enable_pinocchio \
  --enable_cameras \
  --num_demos 1 \
  --xr

Synthetic Data Generation

Mimic Gen

NOTE: Currently, only the locomotion tasks are supported for synthetic data generation with Isaac Lab Mimic/SkillGen and Cosmos Transfer 2.5:

First, you need to annotate the demos with the following command. This process requires a definition of the subtasks in the task package.

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/annotate_demos.py \
  --input_file /datasets/demo.hdf5 \
  --output_file /datasets/demo_annotated.hdf5 \
  --enable_cameras \
  --mimic \
  g1_locomanip_tray_pick_and_place \
  --object surgical_tray \
  --embodiment g1_wbc_pink

Then, you can generate the synthetic data with the following command with Mimic Gen

# generate 10 successful demos
./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/generate_dataset.py \
  --enable_cameras \
  --mimic \
  --num_steps 150 \
  --headless \
  --input_file /datasets/demo_annotated.hdf5 \
  --output_file /datasets/demo_generated.hdf5 \
  --generation_num_trials 10 \
  g1_locomanip_tray_pick_and_place \
  --object surgical_tray \
  --embodiment g1_wbc_pink

If you want to merge multiple generated demos into a single dataset, you can use the following command:

./workflows/rheo/docker/run_docker.sh -g1.6 \
  python scripts/simulation/merge_demos.py \
  --input /datasets/demo_annotated*.hdf5 \
  --output /datasets/demo_merged.hdf5
Cross-Scene Generalization Benchmark with Cosmos Transfer 2.5

We also leveraged Cosmos Transfer 2.5 combined with guided generation to augment training data and improve model generalization across diverse environments. Please check out this Cosmos Transfer 2.5 Tutorial for detailed instructions. We evaluate both the base model and the Cosmos-augmented model on the Surgical Tray Pick and Place task across four distinct scenes, with around 200 evaluation episodes per scene.

Benchmark Results (Success Rate):

Model Scene 1 Scene 2 Scene 3 Scene 4
Base Model 0.64 0.31 0.00 0.00
Cosmos Augmented Model 0.60 0.49 0.37 0.30

Scene Descriptions:

Fine-Tuning and Reinforcement Learning

Please check the following fine-tuning and reinforcement learning recipes for detailed instructions:

After fine-tuning or reinforcement learning, you can evaluate the success rate of the policy by following Individual Task Inference and Evaluation section.

🛠 Troubleshooting


📚 Attribution and Citation

SimReady Assets

The SimReady Assets in this workflow are powered by Lightwheel. Please refer to the Isaac for Healthcare Asset Catalog for details.

RL Training Framework

The RL training framework is powered by RLinf. If you find the RL capabilities helpful, please cite:

@article{yu2025rlinf,
  title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flexibility},
  author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi},
  journal={arXiv preprint arXiv:2509.15965},
  year={2025}
}

@article{chen2025pi_,
  title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
  author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
  journal={arXiv preprint arXiv:2510.25889},
  year={2025}
}
Previous post
无人机系列工作入选2026腾讯开悟人工智能全球公开赛
Next post
RLinf 入选具身智能 EAI-100 年度榜单十大突破项目