Diapositiva 1

Download Report

Transcript Diapositiva 1

Robust Real-time Object Detection

Paul Viola Michael Jones

SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTANIONAL THEORIES OF VISION – MODELING, LEARNING, COMPUTING AND SAMPLING VANCOUVER, CANADA, JULY 13, 2001.

Aluna: Lourdes Ramírez Cerna.

1

Introduction

Face recognition has become an area of active research, that spans disciplines such as image processing, pattern recognition, computer vision, neural networks and so on.

The first step in a face recognition system is the face detection. Given an image or video, a face identifier must be able to identify and locate all faces regardless their position, scale, age, orientarion and lighting conditions.

2

3

The Problem

There are hundred detection methods in the literature, but many of them don’t work in real-time so the method proposed by Viola-Jones was the first real-time robust detection system.

This paper presents new algorithms to construct a framework for robust and extremely rapid object detection, which achieves detection and false positive rates equivalent to the best published results.

4

Framework Scheme

Consists in two steps: 1.

2.

Trainer: works with positive (objects with faces) and negative (objects without faces) samples. It’s a lengthy process to be calculated.

Detector: uses the trainer detector to analyze each input image. This second stage is very fast and allows real-time detection.

5

Features

The object detection procedure classifies images based on the value of simple features called Haar-like Features.

A feature-based system operates much faster than a pixel-based system.

6

Integral Image

Rectangle features can be computed very rapidly using an intermediate representation for the image which call the integral image.

Integral Image Calculate rectangular feature 7

Training The Attentional Cascade

8

9

10

11

Detection

12

Experiments

 The positive training set consisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.

 And 10 000 negative set examples were selected by randomly picking sub windows from 9500 images which didn’t contain faces.

13

 The speed of the cascaded detector is directly related to the number of features evaluated per scanned sub window.

 The final classifier had 32 layers and 4297 features total.

 Evaluated on the MIT-CMU test set an average of 8 features out of a total of 4297 are evaluated per sub window.

 The processing time of a 384 by 288 pixel image on a conventional personal computer about 0.067 seconds.

14

Results

Testing of the final face detector was performed using the MIT CMU frontal face test which consists of:  130 images  507 frontal faces.

15