Digital Image Media Lab

Dataset

Multimodal Dataset

We built the DIML multi-modal benchmark with both photometric and geometric variations. All databases were taken by SONY Cyber-Shot DSC-RX100 camera in a darkroom with the lighting booth GretagMacbeth SpectraLight III. In terms of geometric deformations, we captured 10 geometry image sets by combining geometric variations of viewpoint, scale, and rotation, and each image set consists of images taken under 5 different photometric variation pairs including illumination, exposure, flash-noflash, blur, and noise. Therefore, the DIML multi-modal benchmark consists of 100 images with the size of 1200 x 800. Furthermore, we manually built ground truth object annotation maps to evaluate the performance quantitatively.

RGB-D Dataset

We introduce an RGB-D scene dataset consisting of more than 200 indoor / outdoor scenes. This dataset contains synchronized RGB-D frames from both Kinect v2 and Zed stereo camera. For the outdoor scene, we first generate disparity maps using an accurate stereo matching method and convert them using calibration parameters. A per-pixel confidence map of disparity is also provided. Our scenes are captured at various places, e.g., offices, rooms, dormitory, exhibition center, street, road etc., from Yonsei University and Chungnam National University.

Lane Dataset

We built the DIML lane detection benchmark dataset consisting of 470 video sequences. All data were acquired by the OV10630 image sensor, a high dynamic range system-on-chip sensor that delivers a resolution of 1280 x 800 at 15 frames per second. During the period of two months, we constructed this database in downtown and urban roads of South Korea. As our database consists of diverse driving environments such as traffic jam, pedestrians, and obstacles, it can be used to develop and evaluate various vision-based ADAS algorithms and lane detection algorithms.

Emotion Recognition Dataset (CAER)

We introduce a CAER benchmark consisting of more than 13,000 videos. This benchmark contains more than 13,000 annotated videos. You can use CAER benchmark to train deep convolution neural networks for emotion recognition. The videos are annotated with an extended list of 7 emotion categories.

CoVieW'18 Dataset

We build the Multi-task Action and Scene Recognition Dataset that consists of untrimmed videos sampled from the Youtube-8M dataset with annotated action and scene class labels for each video. It consists of about 90,000 Youtube video URLs (we will provide a feature for each video), and the distribution among training, validation, and testing is 84,853, 3,000 and 3,000 of the total videos, respectively. The number of total action and scene class labels are 285 and 29, respectively. Here, video can contain either action or scene class label, and both action and scene class labels.