www.hsi.gatech.edu

Download Report

Transcript www.hsi.gatech.edu

Duo: Towards a Wearable System that Learns about Everyday Objects and Actions

Charles C. Kemp MIT CSAIL ● Goal: help machines learn an important form of commonsense in pursuit of better AI ● wearable first person video and motion capture ● segmentation based perception ● online and offline methods for processing and annotation

The Platform

● Hardware Infrastructure – four Intersense gyros provide absolute orientation information (they use acceleration, gravity, and earth's magnetic field) which is sufficient to model the kinematic chain from the camera to the hand – 8+ hours of battery life and hard disk space – firewire camera, standard laptop ● Software Infrastructure – Python with C++ and Swig ● interpreted high-level code for fast development ● C++/C for speed when necessary – GNU/Linux, Debian testing – SQLite 3.0

– Many Libre libraries (twisted, numarray, opencv, glade, ...)

In our initial application, the human and the wearable cooperated to acquire segmentations of the objects with which the person interacted (see below). This approach used active illumination and cooperation, we are now working on more general methods, which we describe in this poster. (presented at Humanoids 2003)

The kinematic segmentation algorithm breaks actions at local minima of multi-scale smoothed hand speed estimates. These segments can serve as the units of search for detection, recognition and learning. They can also be used to summarize scenes for annotation and browsing, such as with this summarized 120 frame sequence.

The visual segmentation system initializes visual segments around the kinematically estimated hand location, tracks the segments, and filters them.

Clustering performed on the hand positions at the detected transitions between action segments (local minima in hand speed), results in clusters that correspond with significant hand states. hand at rest hand reaching hand carrying

Detect and track known visual segments

Annotation Software

for visual segments for browsing the database for action segments