Network architecture

We present a first-of-a-kind synthetic multimodal multitask scene understanding dataset of healthcare facilities. Our novel dataset is composed of RGB-D images captured in varying illumination conditions and from a wide variety of scenes including 13 different rooms such as operating rooms and consultation rooms with different types of medical equipment including surgical robots and surgical tools. We provide ground truth annotations for five different tasks: object detection, semantic segmentation, instance segmentation, panoptic segmentation, and monocular depth estimation. Syn-Mediverse contains over 1.5M annotations spanning 31 distinct semantic classes.

Syn-Mediverse Generation

Camera view

The Syn-Mediverse dataset was generated using the NVIDIA Issac Sim simulator, a leading-edge simulation platform designed for robotics and AI research. This simulator provides direct access to ground truth labels for various computer vision tasks and presents relevant information about the simulated scene. We captured our dataset using a multi-camera setup, modeled according to industry-standard cameras for surgical navigation such as NDI Polaris Vega and Stryker FP8000. The setup comprises three cameras---left, center, and right---arranged such that the distance between the left and right cameras is approximately 422 mm. Both cameras are oriented to form an angle of 9.5 degree with the center camera.

Syn-Mediverse Room Examples

Room Examples

Syn-Mediverse includes 13 distinct rooms medical rooms, wheich cater to a dirverse range of scenarios, including operating theatres, consulation rooms, dental clinics impatient ward, and radiology labs. Each room in the dataset offers diverse variations, showcasing different interior designs, objects, lighting conditions, and medical stagff, aiming to capture a wide spectrum of realistic scenarios.

Semantic Classes

Semantic Classes

Syn-Mediverse is a rich compilation with 31 distinct semantic classes, capturing the intricate details of medical environments. While every class is thoughtfully represented, instance-level annotations are provided for 27 of these, allowing for more granular studies. These annotations encompass a range of objects and scenarios, from medical equipment to healthcare participants, ensuring a holistic representation of healthcare settings.

Benchmarking and Dataset Download

Explore each specific benchmarking task page to access the dataset download links, learn more about the specific challenges of each task, and gain insights into our evaluation methodologies and criteria.



Rohit Mohan, José Arce, Sassan Mokhtar, Dr. Daniele Cattaneo, and Abhinav Valada
"Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities"
arXiv preprint arXiv:2308.03193, 2023.

(PDF) (BibTex)