Pyramidal Layered Scene Inference with Image Outpainting for Monocular View Synthesis

Abstract

Generating novel views from a single input is a challenging task that requires the prediction of occluded and non-visible content. Nevertheless, it is an interesting and active area of research due to its several applications such as entertainment. In this work, we propose an end-to-end architecture for monocular view synthesis based on the layered scene inference (LSI) method. The LSI uses layered depth images that can represent complex scenes with a reduced number of layers. To improve the LSI predictions, we develop two new strategies: (i) a pyramidal architecture that learns LDI predictions for different resolutions of the input and (ii) an image outpainting for filling the missing information at the LDI borders. We evaluate our method on the KITTI dataset, and show that the proposed versions outperform the baseline.

Publication
Computer Analysis of Images and Patterns