Multimodal Dataset

We built the DIML multi-modal benchmark with both photometric and geometric variations. All databases were taken by SONY Cyber-Shot DSC-RX100 camera in a darkroom with the lighting booth GretagMacbeth SpectraLight III. In terms of geometric deformations, we captured 10 geometry image sets by combining geometric variations of viewpoint, scale, and rotation, and each image set consists of images taken under 5 different photometric variation pairs including illumination, exposure, flash-noflash, blur, and noise. Therefore, the DIML multi-modal benchmark consists of 100 images with the size of 1200 x 800. Furthermore, we manually built ground truth object annotation maps to evaluate the performance quantitatively.


RGB-D Dataset

We introduce an RGB-D scene dataset consisting of more than 200 indoor / outdoor scenes. This dataset contains synchronized RGB-D frames from both Kinect v2 and Zed stereo camera. For the outdoor scene, we first generate disparity maps using an accurate stereo matching method and convert them using calibration parameters. A per-pixel confidence map of disparity is also provided. Our scenes are captured at various places, e.g., offices, rooms, dormitory, exhibition center, street, road etc., from Yonsei University and Chungnam National University.


