A single glimpse is hardly enough to triangulate the 3D shapes of a scene. However, training examples are readily available, so statistical models can be trained to map appearance to shape. The details matter, because 3D shapes have different representations and can have many degrees of freedom, and training data is rarely as clean as we’d wish.
I will present two separate learning based methods for shape reconstruction, developed by my team at UCL. In the first, we propose an algorithm that can complete the unobserved geometry of tabletop-sized objects from a single depth-image. This approach is based on a supervised model trained on already available volumetric elements. In the second, instead of a depth-image as input we have just an RGB image, from which we predict a depth image. This is a CNN based method that exploits epipolar geometry constraints to learn depth-prediction from binocular pairs, to overcome the absence of good ground truth depth data. The two systems are not joined, because there is still more exciting work to be done!
See more on this video at www.microsoft.com/en-us/research/video/predicting-3d-volume-depth-single-view/