Recognition of moving human figures on video sequences using disk thickness maps

2014, vol. 21, no. 5, pp. 157-166

Computer engineering. Information technology

Аuthors

Sidyakin S. V.^*, Egorov A. I.^**, Malin I. K.^***

State Institute of Aviation Systems, 7, Victorenko str., Moscow, 125319, Russia

*e-mail: sersid@bk.ru
**e-mail: dohxehapo@gmail.com
***e-mail: imalin@gosniias.ru

Abstract

At present, the problem of object detection and recognition on video has many existed approaches to its solution. Methods herewith implementing object attributes description combined with machine learning algorithms gained maximum popularity. Any one of object characteristics, such as shape, color, or texture, can play the role of an object attribute. Some attributes describe local characteristics while other can characterize an object globally. Depending on attributes in use, one can decide whether such detector suits to solve the accompanying problem of object tracking. Moreover, in the case of a human figure — its posture analysis.

One of the existing and potentially useful global object descriptors is rectangle cover, which was earlier used in [1] to solve the problem of automatic classification of moving humans. Rectangle cover represents the aggregate of all maximum empty rectangles, inscribed entirely into 2D figure. These rectangles were calculated by morphological opening operation. Correspondingly, to determine all maximum rectangles of fixed size, one should calculate an opening operation of the entire figure for each size of this kind. The entire process was extremely time consuming.

Therefore, this approach appeared to be non-applicable for a large number of objects in picture. It worked very slowly even with three objects. The other problem lies in the fact that with application of a rectangle, we lose rotation invariance. The abovementioned problems defined the goal of the research: modification of method [1] in order to increase the speed and overall quality of the human figures detector.

The algorithm for solving this problem consists of the following. Video contains moving objects and some of them are humans. During the first stage, we separate silhouettes of all moving objects based on background model building using mixture of Gaussians. This method provides a high density of pixels within the sel ected moving regions, as well as allows us to separate the shadows from the objects.

During the second stage, we use continuous morphological models (skeletons) [2]. It allows to build morphological thickness maps (covers) (Fig.) of the objects effectively not for rectangular but for disk structuring element [3]. The advantage of using disk is its invariance to translation and rotation. This is a key step in the proposed algorithm. Disk thickness map is stable descriptor, contrary to skeleton, shown in [3].

Based on the thickness map object feature vector is generated. It is a set of disks, and five normalized parameters characterize each disk: radius, center coordinates, area, a number of unique pixels that were not overlapped by other disks. Further, classification stage begins. We need to determine whether the presenting silhouette, described by feature vector, is human or not. To do it we use algorithm containing extremely randomized trees. Every available disk is recognized within every tree. Each disk relates to a certain class by a majority of votes. Then votes fr om all disks are collected to make the final decision whether this silhouette belongs to a human being or not.

Qualitative evaluations obtained allow to draw a conclusion that the proposed method successfully solves the problem of moving human silhouettes recognition. Application of disk thickness map leads to increase of quantity of correctly classified human silhouettes and reduces the number of false detections due to invariant properties of the disk thickness map.

The proposed approach is superior in terms of performance to previously known method based on rectangular covers (12 objects are process at 10 frames

Human disk thickness maps examples per second versus 3 objects at 5 frames per second). One can use this approach for video surveillance systems, installed in small placements with small human flow.

Keywords:

mathematical morphology, thickness map, covers, continuous skeleton, video surveillance, human detection

References

Dollar P., Wojek C., Schiele B., Perona P. Pedestrian Detection: An Evaluation of the State of the Art, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, pp. 743-761.
Lowe David G. Object recognition from local scale- invariant features, Proceedings of the International Conference on Computer Vision, 1999, pp. 1150-1157.
Bay H., Ess A., Tuytelaars T, Van Gool L., Speeded- Up Robust Features (SURF), Computer Vision and Image Understanding, 2008, vol. 110, no. 3, pp. 346-359.
Barnich O., Jodogne S., Van Droogenbroeck M., Robust Analysis of Silhouettes by Morphological Size Distributions, Advanced Concepts for Intelligent Vision Systems, 2006, vol. 4179, pp. 734-745.
Mestetskiy L.M. Nepreryvnaya morfologiya binarnykh izobrazhenii. Figury. Skelety. Tsirkulyary (Continuous morphology of binary images: figures, skeletons, circulars), Moscow, Fizmatlit, 2009, 288 p.
Sidyakin S.V. Razrabotka algoritmov postroeniya morfologicheskikh spektrov dlya analiza tsifrovykh izobrazhenii i videoposledovatelnostei (Morphological pattern spectra algorithm development for digital image and video sequences analysis), Ph.D. thesis, Moscow, VTs RAN, 2013, 163 p.
Vizilter Yu.V., Sidyakin S.V. Materialy 15 Mezhdunarodnoi konferentsii «Matematicheskie metodi raspoznavaniya obrazov», Moscow, MAKS Press, 2011, pp. 416-419.
Geurts P., Ernst D., Wehenkel L. Extremely randomized trees, Machine Learning Journal, April 2006, vol. 63, no. 1, pp. 3-42.
Maree R., Geurts P., Piater J., Wehenkel L. Random subwindows for robust image classification, IEEE Conference on Computer Vision and Pattern Recognition, San Diego (CA, USA), 2005, vol. 1, pp. 34-40.
KadewTraKuPong P., Bowden R. An improved adaptive background mixture model for real-time tracking with shadow detection, Video-Based Surveillance Systems, 2002, pp 135-144.
Gonzalez R., Woods R. Digital Image Processing, Prentice Hall, 2007, 976 p.
Serra J. Image Analysis and Mathematical Morphology, London, Academic Press, 1982, 610 p.
Nene S., Nayar S., Murase H. Columbia object library, available at: www.cs.columbia.edu/CAVE/software/softlib/coil-100.php, 1996.
ETISEO, Video Understanding Evaluation project, available at: http://www-sop.inria.fr/orion/ETISEO/, 2014.

mai.ru — informational site of MAI