Overview

Network architecture

We present a first-of-a-kind synthetic multimodal multitask scene understanding dataset of healthcare facilities. Our novel dataset is composed of RGB-D images captured in varying illumination conditions and from a wide variety of scenes including 13 different rooms such as operating rooms and consultation rooms with different types of medical equipment including surgical robots and surgical tools. We provide ground truth annotations for five different tasks: object detection, semantic segmentation, instance segmentation, panoptic segmentation, and monocular depth estimation. Syn-Mediverse contains over 1.5M annotations spanning 31 distinct semantic classes.

Syn-Mediverse Generation

Camera view

The Syn-Mediverse dataset was generated using the NVIDIA Issac Sim simulator, a leading-edge simulation platform designed for robotics and AI research. This simulator provides direct access to ground truth labels for various computer vision tasks and presents relevant information about the simulated scene. We captured our dataset using a multi-camera setup, modeled according to industry-standard cameras for surgical navigation such as NDI Polaris Vega and Stryker FP8000. The setup comprises three cameras---left, center, and right---arranged such that the distance between the left and right cameras is approximately 422 mm. Both cameras are oriented to form an angle of 9.5 degree with the center camera.

Syn-Mediverse Room Examples

Room Examples

Syn-Mediverse includes 13 distinct rooms medical rooms, wheich cater to a dirverse range of scenarios, including operating theatres, consulation rooms, dental clinics impatient ward, and radiology labs. Each room in the dataset offers diverse variations, showcasing different interior designs, objects, lighting conditions, and medical stagff, aiming to capture a wide spectrum of realistic scenarios.

Semantic Classes

Semantic Classes

Syn-Mediverse is a rich compilation with 31 distinct semantic classes, capturing the intricate details of medical environments. While every class is thoughtfully represented, instance-level annotations are provided for 27 of these, allowing for more granular studies. These annotations encompass a range of objects and scenarios, from medical equipment to healthcare participants, ensuring a holistic representation of healthcare settings.

Dataset Download & Benchmarks

Follow these steps to download the dataset:

Step 1: Download the script by clicking this link or by running the command below in your terminal:
wget https://syn-mediverse.cs.uni-freiburg.de/datasets/download_syn-mediverse.sh
Step 2: Run it to fetch all parts of the dataset: bash download_mediverse.sh /your/target/path
Or add flags like --img, --depth, or --label to selectively download components.
bash download_mediverse.sh /your/target/path --img --label

You can also:
  • Download label_helper.py – This script provides a complete mapping of semantic classes used in the dataset, including label IDs, training IDs, categories, evaluation masks, and RGB colors.
  • Download Dataset in COCO Format : This dataset includes annotations formatted for COCO-style object detection and segmentation. The class mappings align with the training and evaluation categories defined in label_helper.py. It contains:
    • COCO-style annotations for thing-only objects
    • stuffthingmaps – semantic segmentation maps with class indices from 0 to 20, and 255 marking void areas
    • train, val, and test folders containing the corresponding images

Explore Benchmarks: Click on any image or button below to view detailed benchmark pages for each task, including leaderboard results and evaluation settings.

Videos

Publications

Rohit Mohan, José Arce, Sassan Mokhtar, Dr. Daniele Cattaneo, and Abhinav Valada
"Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities"
IEEE Robotics and Automation Letters (RA-L), 2024.

(PDF) (BibTex)