Tutorial on "Recognition in Video"

Recognition in Video

by Dr. Dmitry O. Gorodnichy

National Research Council Canada

Tutorial

at

Canadian Conference on Electrical and Computer Engineering (CCECE'06)

May 7 to 10, 2006

Ottawa Congress Centre, Ottawa, Canada.

When: Sunday, May 7, 8:30 AM-12:00 PM

Where: Room SITE E0126, University of Ottawa

For official CCECE '05 tutorial site, visit http://www.ieee.ca/ccece06/tutorials.html.

For registration information, visit http://www.ieee.ca/ccece06/registration.html

(NB: Early registration deadline - April 7, 2006)

This page is designed for those who are interested in attending the tutorial and

would like to take the benefit of getting prepared for it in advance.

Below you will find the detailed outline of the tutorial and the related additional information.

Preliminaries:

Tutorial is designed to be beneficial to both beginners and researchers already familiar with video processing.

Each tutorial attendee will receive a printed hand-out of tutorial materials.

If you have a webcam and Visual Studio C++ installed on your laptop, you may take it to the class. You will have to download the following free software to be able to write a video recognition program:

Microsoft DirectX 8 or later SDK [required for camera functionality]:

From www.microsoft.com/directx/, or directly form

http://www.microsoft.com/downloads

/details.aspx?FamilyID=1c8dc451-2dbe-4ecc-8c57-c52eea50c20a&DisplayLang=en

Intel OpenCV (beta 4) Library for Windows [required for image capture and processing functionality]

From http://sourceforge.net/projects/opencvlibrary or directly from

http://prdownloads.sourceforge.net/opencvlibrary/OpenCV_b4a.exe?download

Abstract:

Video processing is no longer a prerogative of a few. With video-cameras now affordable, computers powerful enough to process video in real-time, and a large pool of Open Source video libraries available for everybody, it is now possible practically for anybody with basic engineering skill to create a video processing system. The variety of applications for video processing is just as impressive: from surveillance, video coding and annotation to computer-human interaction and collaborative environments to multi-media, video conferencing and computer games, to name a few. Building a vision system for a specific application however is a big challenge, because of the complexity and the versatility of the video recognition problem inherent to all video processing tasks, regardless of whether it’s tracking of objects, recognition of events or person identification. While for humans it is very easy to give a meaning to observed visual stimuli, for computers it is not, and engineering skills alone may not be sufficient to resolve the problem. This tutorial is aimed at providing a compressive background to the problem of recognition in video and presenting an overview of the available solutions to this problem. Both live demonstrations and simple video coding techniques will be shown. By the end of the tutorial, the audience will be able to create their own perceptual vision system using a web-cam in a Windows environment.

Keywords: video processing, video analysis, content extraction, intelligent video.

Outline of the tutorial:

Part I. Getting the background. How we (human visual system) do it.

    1. What makes video processing special:
    2. a) Highly demanded, many open problems - Overview of various applications: from security, to multi-media, to computer-human interaction, to video annotation and video-based search.
    3. b) No longer prerogative a few - Anybody can do it now. Introduction to Open Source Intel OpenCV library.
    4. c) Critical differences from image processing: low-resolution and low-quality,
    5. d) Real-time nature of video - constraint or advantage?
    6. e) Another critical difference and advantage: cues from biological vision.
    7. From video input to recognition results output
    8. a) Video recognition as intersection of Computer Vision, Machine Learning/Pattern Recognition and Neurobiology.
    9. b) Four fundamental video recognition problems: Detection, Tracking, Memorization & Association
    10. c) Hierarchy of video processing tasks.
    11. d) Four modalities of video information: motion, colour, luminance (orientational and spatial frequency) and d) disparity. Their suitability for video recognition problems.
    12. Live hands-on examples: Examining the tasks, modalities and the issues.
    13. a) "Hello world - I see you" example using Intel OpenCV video capture and processing library.
    14. b) Processing TV recordings.
    15. c) On video formats, sizes & quality: webcams vs. firewire cam, VCD vs. DVD, NTSC vs. PAL.
    16. Main results from Neurobiology. (Video processing and recognition by biological vision systems)
    17. 1. Dealing with low resolution/quality, video processing mechanisms:
    18. a) Visual pathway in brain (From retina, to visual cortex, to hippocampus)
    19. b) Attention-based vision, salience detection mechanisms, Accumulation over time.
    20. c) Separations into channels, where vs what processing, top-down vs bottom-up processing
    21. d) Visual illusions
    22. 2. Recognition in brain. Associative recall and thinking: from neurons via synapses to memories
    23. Main results from Associative Learning and Recognition
    24. a) Coding memories using neural networks,
    25. b) Learning Rules for tuning inter-neuronal synapses.
    26. c) Open source High-Performance Associative Neural Network library.

Part II. How to make computers do it. Building your own Perceptual Vision System.

    1. Using Motion: Main results and applications.
    2. a) Statistics-based Background computation (inc. Mixtures of Gaussians)
    3. b) Change vs. motion detection. Illumination invariant change detection (inc. non-linear change detection)
    4. c) Motion history and patterns
    5. Using Colour: Main results and applications.
    6. a) Segmentation (inc. histeresis-, statistics- based)
    7. b) Histogram-based tracking (inc. MeanShift)
    8. c) Special consideration: Skin detection
    9. Using Intensity: Main results and applications.
    10. a) Detection of salient features: edges, corners (inc. histeresis-based)
    11. b) Detection of orientational and frequency saliencies (inc. Gabor filters simulating visual cortex processing)
    12. c) Using shape-from-shading information
    13. Using depth/stereo-vision (briefly: references only)
    14. Image processing techniques.
    15. a) Manipulations with blobs: classification, grouping
    16. b) Deformable templates
    17. Accumulation of information over time:
    18. a) Histograms and Evidence accumulation (inc. Dempster-Shafer evidence theory)
    19. b) Probabilistic framework
    20. c) Correction learning framework (inc. associative neural network)
    21. Special consideration: faces in video
    22. a) Four basic face processing tasks: detection, tracking, classification & identification.
    23. b) FaceRec from video vs FaceRec from photos, Nominal face resolution.
    24. c) Overview of solutions
    25. Putting it all together: Building Perceptual Vision Systems (computer vision systems with recognition capability) to tailor your needs.
    26. a) General framework: detection->tracking->classification, motion->colour->intensity
    27. b) Spatio-temporal decisions (accumulation over time and space)
    28. c) Doing it with Intel OpenCV library
    29. d) Overview of other Open Source Video Processing libraries and datasets
    30. e) On testing and benchmarks, using your films collection.
    31. Examples: NRC-developped technologies:
    32. a) Perceptual Vision Interface Nouse
    33. b) (Multicamera, multi-person, robust, back-) tracking of faces
    34. c) Face annotation, recognition of faces in TV recordings
    35. d) Pianist hands/fingers recognition
    36. Concluding remarks:
    37. a) Areas for future development
    38. b) References

Slides

Acknowledgement of authorship is required when using the slides for this tutorial: slides.

For full set of slides, contact the author.