CVPR 2019

Deep Defocus Map Estimation using Domain Adaptation

1POSTECH 2Sungkyunkwan University 3DGIST
overall framework

We present DMENet (Defocus Map Estimation Network), the first end-to-end CNN framework, which directly estimates a defocus map given a defocused image.


In this paper, we propose the first end-to-end convolutional neural network (CNN) architecture, Defocus Map Estimation Network (DMENet), for spatially varying defocus map estimation. To train the network, we produce a novel depth-of-field (DOF) dataset, SYNDOF, where each image is synthetically blurred with a ground-truth depth map. Due to the synthetic nature of SYNDOF, the feature characteristics of images in SYNDOF can differ from those of real defocused photos. To address this gap, we use domain adaptation that transfers the features of real defocused photos into those of synthetically blurred ones. Our DMENet consists of four subnetworks: blur estimation, domain adaptation, content preservation, and sharpness calibration networks. The subnetworks are connected to each other and jointly trained with their corresponding supervisions in an end-to-end manner. Our method is evaluated on publicly available blur detection and blur estimation datasets, and the results show the state-of-the-art performance.

Synthetic Depth of Field (SYNDOF) Dataset

To enable end-to-end learning of defocus map estimation, a high-quality dataset is crucial. However, currently available datasets [29, 4] are not enough, as they are either for blur detection [29], instead of blur estimation, or of small size [4]. To this end, we generate a defocus-blur dataset, which we call SYNDOF dataset. It would be almost impossible, even manually, to generate ground-truth defocus maps for defocused photos. So we use pinhole image datasets, where each image is accompanied by a depth map, to synthesize defocused images with corresponding ground-truth defocus maps.


Network Overview

Blur Estimation Network \(B\)

Network \(B\) is the main component of our DMENet and is supervised with ground-truth synthetic defocus maps from the SYNDOF dataset to predict blur amounts given an image.

Domain Adaptation Network \(D\)

Network \(D\) minimizes domain differences between synthetic and real features and enables the blur estimation network to measure the blur amounts on real defocused images.

Content Preservation Network \(C\)

Network \(C\) supplements the blur estimation network to avoid a blurry output. We minimize the content preservation loss to reduce blurriness in the prediction using the pretrained VGG network.

Sharpness Calibration Network \(S\)

Network \(S\) allows real domain features to induce correct sharpness in a defocus map by informing the blur estimation network whether the given real domain feature corresponds to a sharp or blurred pixel.


Ablation Study


Quantitative Comparison


Qualitative Comparison




    author    = {Junyong Lee and Sungkil Lee and Sunghyun Cho and Seungyong Lee},
    title     = {Deep Defocus Map Estimation Using Domain Adaptation},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2019}