View Chapter

Chapter 32 — 3-D Vision for Navigation and Grasping

Danica Kragic and Kostas Daniilidis

In this chapter, we describe algorithms for three-dimensional (3-D) vision that help robots accomplish navigation and grasping. To model cameras, we start with the basics of perspective projection and distortion due to lenses. This projection from a 3-D world to a two-dimensional (2-D) image can be inverted only by using information from the world or multiple 2-D views. If we know the 3-D model of an object or the location of 3-D landmarks, we can solve the pose estimation problem from one view. When two views are available, we can compute the 3-D motion and triangulate to reconstruct the world up to a scale factor. When multiple views are given either as sparse viewpoints or a continuous incoming video, then the robot path can be computer and point tracks can yield a sparse 3-D representation of the world. In order to grasp objects, we can estimate 3-D pose of the end effector or 3-D coordinates of the graspable points on the object.

Google's Project Tango

Author  Google, Inc.

Video ID : 120

Google's Project Tango has been collaborating with robotics laboratories from around the world to synthesize the past decade of research and computer vision into the development of a new class of mobile devices. This video contains one of the first public announcements and presentations of a device that can be used for multiple robot-perception applications described in this chapter.

Finding paths through the world's photos

Author  Noah Snavely, Rahul Garg, Steven M. Seitz, Richard Szeliski

Video ID : 121

When a scene is photographed many times by different people, the viewpoints often cluster along certain paths. These paths are largely specific to the scene being photographed and follow interesting patterns and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructs camera viewpoints, and automatically computes orbits, panoramas, canonical views, and optimal paths between views. The scene can then be interactively browsed in 3-D using these controls or with six DOF free-viewpoint control. As the user browses the scene, nearby views are continuously selected and transformed, using control-adaptive reprojection techniques.

LIBVISO: Visual odometry for intelligent vehicles

Author  Andreas Geiger

Video ID : 122

This video demonstrates a visual-odometry algorithm on the performance of the vehicle Annieway (VW Passat). Visual odometry is the estimation of a video camera's 3-D motion and orientation, which is purely based on stereo vision in this case. The blue trajectory is the motion estimated by visual odometry, and the red trajectory is the ground truth by a high-precision OXTS RT3000 GPS+IMU system. The software is available from http://www.cvlibs.net/

Parallel tracking and mapping for small AR workspaces (PTAM)

Author  Georg Klein, David Murray

Video ID : 123

Video results for an augmented-reality tracking system. A computer tracks a camera and works out a map of the environment in real time, and this can be used to overlay virtual graphics. Presented at the ISMAR 2007 conference.

DTAM: Dense tracking and mapping in real-time

Author  Richard A. Newcombe, Steven J. Lovegrove, Andrew J. Davison

Video ID : 124

This video demonstrates the system described in the paper, "DTAM: Dense Tracking and Mapping in Real-Time" by Richard Newcombe, Steven Lovegrove and Andrew Davison for ICCV 2011.

3-D models from 2-D video - automatically

Author  Marc Pollefeys

Video ID : 125

We show how a video is automatically converted into a 3-D model using computer-vision techniques. More details on this approach can be found in: M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, R. Koch: Visual modeling with a hand-held camera, Int. J. Comp. Vis. 59(3), 207-232 (2004).