Diapositiva 1

Transcript Diapositiva 1

Robust Real-time Object Detection

Paul Viola Michael Jones

SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTANIONAL THEORIES OF VISION – MODELING, LEARNING, COMPUTING AND SAMPLING VANCOUVER, CANADA, JULY 13, 2001.

Aluna: Lourdes Ramírez Cerna.

Introduction

Face recognition has become an area of active research, that spans disciplines such as image processing, pattern recognition, computer vision, neural networks and so on.

The first step in a face recognition system is the face detection. Given an image or video, a face identifier must be able to identify and locate all faces regardless their position, scale, age, orientarion and lighting conditions.

The Problem

There are hundred detection methods in the literature, but many of them don’t work in real-time so the method proposed by Viola-Jones was the first real-time robust detection system.

This paper presents new algorithms to construct a framework for robust and extremely rapid object detection, which achieves detection and false positive rates equivalent to the best published results.

Framework Scheme

Consists in two steps: 1.

Trainer: works with positive (objects with faces) and negative (objects without faces) samples. It’s a lengthy process to be calculated.

Detector: uses the trainer detector to analyze each input image. This second stage is very fast and allows real-time detection.

Features

The object detection procedure classifies images based on the value of simple features called Haar-like Features.

A feature-based system operates much faster than a pixel-based system.

Integral Image

Rectangle features can be computed very rapidly using an intermediate representation for the image which call the integral image.

Integral Image Calculate rectangular feature 7

Training The Attentional Cascade

Detection

Experiments

 The positive training set consisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.

 And 10 000 negative set examples were selected by randomly picking sub windows from 9500 images which didn’t contain faces.

 The speed of the cascaded detector is directly related to the number of features evaluated per scanned sub window.

 The final classifier had 32 layers and 4297 features total.

 Evaluated on the MIT-CMU test set an average of 8 features out of a total of 4297 are evaluated per sub window.

 The processing time of a 384 by 288 pixel image on a conventional personal computer about 0.067 seconds.

Results

Testing of the final face detector was performed using the MIT CMU frontal face test which consists of:  130 images  507 frontal faces.