![]() |
Looking at People: The past, the present and the futureTutorial at CVPR 2012, Providence, Rhode Island, USA, 2012 |
General Information
| Organizers: | Thomas B. Moeslund (tbm@create.aau.dk), Aalborg University, Denmark | |
| Leonid Sigal (lsigal@disneyresearch.com), Disney Research, Pittsburgh, USA | ||
| Adrian Hilton (A.Hilton@surrey.ac.uk), University of Surrey, UK | ||
| Volker Krüger (vok@m-tech.aau.dk), Aalborg University, Denmark |
| Intructors: | Aaron Bobick, Georgia Tech, USA | |
| Amit Roy Chowdhury, UC Riverside, USA |
||
| Jeffrey Cohn, CMU, USA |
||
| Rogerio Feris, IBM T.J. Watson Research Center, New York | ||
| David Fleet, University of Toronto, Canada |
||
| Shaogang Gong, Queen Mary University, UK |
||
| Raghuraman Gopalan (on behalf of Rama Chellappa), AT&T Labs-Research, USA | ||
| Haowei Liu, Intel Santa
Clara
|
||
| Deva Ramanan, UC Irvine, USA | ||
| Fernando De la Torre, CMU,
USA |
||
| Mohan Trivedi, UC San Diego, USA |
| Time: | June 21st, 2012 |
| Duration: | Full-day (~8 hours) |
| Location: | Room 555B |
Course Description
Over the course of the last 10-20 years the field of computer vision has been preoccupied with the problem of looking at people. Hundreds, if not thousands, of papers have been published on the subject that span face detection, pose estimation, tracking, activity recognition, etc. This tutorial is designed to give an introduction to and assessment of state-of-the-art in this very active field. The tutorial builds on the book: Visual Analysis of Humans: Looking at People published by Springer in 2012. The book is a collection of chapters that are written by the top experts in the field; the organizers of the tutorial are also the editors of the upcoming book. The list of contributing authors and content of the book can be found here. The book is intended to serve the dual purpose of being a reference and a tutorial to the people entering the field. Because this tutorial is an extension of this idea, it will similarly consists of a series of talks by experts in the corresponding fields. Tutorial will be broken down into 4 parts: (1) detection and tracking, (2) articulated pose estimation and tracking, (3) activity recognition, and (4) applications. In each part we will have 2-3 invited lecturers. Each invited lecturer will give a talk on a focused subject within a larger context of looking at people lasting roughly 35 minutes. The lectures will be geared towards general CV audience and will outline the key advances and future challenges in the problems involved. The rough schedule, list of the proposed invited lecturers, and the topics covered are listed below.
Syllabus and Schedule
Below is the syllabus and a rough schedule for the tutorial.
- [8:40 - 8:50] Introduction, motivation and welcome remarks by the organizers
- [8:50 - 10:00] Detection and tracking
- Face detection (by Raghuraman Gopalan)
- Wide area tracking in single and multiple views (by Amit Roy Chowdhury)
-
coffee break (30 minutes)
- [10:30 - 11:40] Articulated pose estimation and tracking
- Motion models for people tracking (by David Fleet)
- Part-based models for pose estimation (by Deva Ramanan)
- [11:40 - 12:15] Activity recognition
- On human action (by Aaron Bobick)
lunch (1h 35min)
- [1:50 - 3:00] Activity recognition
(cont.)
- Facial expression analysis (by Fernando De la Torra and Jeffrey Cohn)
- Benchmarking datasets for human activity recognition (by Haowei Liu and Rogerio Feris)
-
coffee break (30 minutes)
- [3:30 - 4:40] Applications
- Computer vision in cars (by Mohan Trivedi)
- Security and surveillance (by Shaogang Gong)
Course Materials
Bobick
Chowdhury
Cohn
Fernando
Fleet
Copalan
Liu
Ramanan
Trivedi
Instructor Biographies
Dr. Bobick's
research spans a variety of aspects of computer vision. His primary
work has focused on video sequences where the imagery varies over time
either because of change in camera viewpoint or change in the scene
itself. He has published papers addressing many levels of the problem
from validating low level optic flow algorithms to constructing
multi-representational systems for an autonomous vehicle to the
representation and recognition of high level human activities. The
current emphasis of his work is on action understanding, where the
imagery is of a dynamic scene and the goal is to describe the action or
behavior. Three examples are the basic recognition of human movements,
natural gesture understanding, and the classification of football
plays. Each of these examples requires describing human activity in a
manner appropriate for the domain, and developing recognition
techniques suitable for those representations. Recently, Dr. Bobick has also explored the development of interactive environments where advanced sensing modalities provide input based upon the users' actions and, hopefully, intentions. The intriguing element of interactive environments is that the context of the situation can be exploited in the interpretation of the user's behavior. An example of such an environment is the KidsRoom, the world's first, interactive narrative play-space for children. The room employed large-scale video and sound to take the children through a fantasy story; all the sensing was accomplished using computer vision. A more current and ambitious project is the Aware Home Research Initiative. The goal of that effort is to impart sufficient perception and interface capabilities to a house such that it can enhance the quality of life of the inhabitants. A domestic setting provides a wealth of contextual information that will be needed to assist in understanding the activities of the people within.
Dr. Roy-Chowdhury leads the Video Computing Group at UCR. His group is studying problems
in video analysis with applications in national and homeland security,
commercial multimedia and computational biology. The underlying
approach of his research is to harness various methods in systems
theory, signal processing, machine learning, mathematics and statistics
to the analysis of images and videos in order to obtain an
understanding of their content. This scientific understanding can lead
to machine vision technologies that can provide an
automated/semi-automated analysis of the 3D environment from
images/videos, analogous to the capabilities of biological visual
systems. Currently, the group is focused on multi-agent autonomous
camera networks, modeling and recognition of complex behaviors in
video, and image-based modeling of biological growth dynamics
(specifically in plants). Prof. Roy-Chowdhury is a PI on several grants
from the National Science Foundation, Office of Naval Research, Army
Research Office, DARPA, and private industries like CISCO and
Lockheed-Martin. His recent book on Camera Networks provides an overview of current research in the field. He has served as
a program committee member and reviewer in various capacities,
organized workshops and special sessions, is an Associate Editor of the
IEEE Tans. on Systems, Man and Cybernetics - B and Machine Vision
Applications, and a Section Editor of Elsevier’s Electronic Reference
on Signal Processing. He has recently co-edited a book on the topic of
"Distributed Video Sensor Networks".For more details, please see his CV.
Jeffrey Cohn is Professor of
Psychology at the University of Pittsburgh and
Adjunct Faculty at the Robotics Institute, Carnegie Mellon University.
He
received his PhD in psychology from the University of Massachusetts at
Amherst. Dr. Cohn has led interdisciplinary and
inter-institutional
efforts to develop advanced methods of automatic analysis of facial
expression
and prosody and applied those tools to research in human emotion,
interpersonal
processes, social development, and psychopathology. He
co-developed influential databases, Cohn-Kanade, MultiPIE, and Pain Archive, co-edited two recent special
issues
of Image and Vision Computing on facial expression analysis, and
co-chaired the
8th IEEE International
Conference on Automatic Face and Gesture Recognition (FG 2008).
Dr. Rogerio
Schmidt Feris is currently a research scientist
at IBM T.J. Watson Research Center, New York, and an Affiliate
Assistant
Professor at University of Washington. He joined IBM in 2006 after
receiving a
PhD in computer science from the University of California, Santa
Barbara. In
2008, he worked as Adjunct Professor at Columbia University. His
publications
have appeared in major computer vision/graphics conferences and
journals,
including ICCV, CVPR, SIGGRAPH, and PAMI. He received several awards,
including
a recent IBM Master inventor honor and a prestigious IBM Outstanding
Innovation
Achievement Award in 2011. For more details, see http://rogerioferis.com
David J Fleet received the PhD in Computer Science from the University of Toronto in 1991. He was on faculty at Queen's University in
Kingston from
1991 to 1998, and then Area Manager and Research
Scientist at the Palo Alto
Research Center (PARC) from 1999 to 2003.In 2004 he joined the University of Toronto as Professor of Computer Science.
His research interests include computer vision, image processing, visual perception, and visual neuroscience. He has published research articles, book chapters and one book on various topics including the estimation of optical flow and stereoscopic disparity, probabilistic methods in motion analysis, modeling appearance in image sequences, motion perception and human stereopsis, hand tracking, human pose tracking, latent variable models, and physics-based models for human motion analysis. In 1996 Dr. Fleet was awarded an Alfred P. Sloan Research Fellowship for his work on computational models of perception. He has won paper awards at ICCV 1999, CVPR 2001, UIST 2003, BMVC 2009. In 2010 he was awarded the Koenderink Prize for his work with Michael Black and Hedvig Sidenbladh on human pose tracking. He has served as Area Chair for numerous computer vision and machine learning conference. He was Program Co-chair for the 2003 IEEE Conference on Computer Vision and Pattern Recognition. He will be Program Co-Chair for the 2014 European Conference on Computer Vision. He has been Associate Editor, and Associate Editor-in-Chief for IEEE TPAMI, and currently serves on the TPAMI Advisory Board.
Shaogang Gong is Professor of Visual Computation and Head of the Computer Vision Group
at Queen Mary University of London. He has published over 250 papers and
written 2 books (Visual Analysis of Behaviour: From Pixels to Semantics,
Springer, 2011; Dynamic Vision: From Images to Face Recognition, Imperial
College Press, 2000). For the last 20 years, he has led numerous UK, EU and US
academic, governmental and industrial collaborative projects on developing
computer vision systems for public security and safety applications. He served
on the UK Government Chief Scientific Adviser Beddington Science Review
Steering Panel (2008-2009). He is a founding director and chief scientist of
Vision Semantics Limited. He is a Fellow of IEE and BCS, and a member of the UK
Computing Research Committee.
Raghuraman
Gopalan is a senior member of technical staff at the AT&T
Labs-Research. He received his Ph.D. in Electrical and Computer
Engineering at the University of Maryland, College Park in 2011. His
research interests are in computer vision and machine learning, with a
specific focus on object recognition and video understanding problems.
Haowei Liu is a research engineer in
Perceptual Computing Group, Intel Santa
Clara. He received his PhD degree from University of Washington in
June, 2011. He has interned in major research organizations during his
PhD study including Intel Lab Seattle and IBM T.J. Watson Research
Center. Prior to his PhD study, he was a software design engineer in
Microsoft. He holds an MS and BS in Computer Science from University of
California, San Diego and National Taiwan University.
Deva Ramanan Deva
Ramanan is an assistant professor of Computer Science and the
co-director of the Computational Vision Lab at the University of
California at Irvine. Prior to joining UCI, he was a Research Assistant
Professor at the Toyota Technological Institute at Chicago (2005-2007).
He also held visiting researcher positions in the Robotics Institute at
Carnegie Mellon University in 2006 and Microsoft Research in 2008. He
received his B.S. degree with distinction in computer engineering from
the University of Delaware in 2000, graduating summa cum laude. He
received his Ph.D. in Electrical Engineering and Computer Science with
a Designed Emphasis in Communication, Computation, and Statistics from
UC Berkeley in 2005. His research interests span computer vision,
machine learning, and computer graphics, with a focus on the
application of understanding people through images and video. His past
work focused on articulated tracking, while recent work has focused on
object recognition. His work in this area won or received special
recognition at the PASCAL Visual Object Class Challenge, 2007-2010,
including a Lifetime Achievement Prize in 2010. His work on contextual
object modeling won the 2009 David Marr prize. He was awarded an NSF
Career Award in 2010. His work is supported by NSF, ONR, DARPA, as well
as industrial collaborations with the Intel Science and Technology
Center for Visual Computing, Google Research, and Microsoft Research.
He serves on the editorial board of the International Journal of
Computer Vision (IJCV), is a senior program committee member for the
IEEE Conference of Computer Vision and Pattern Recognition (CVPR), and
has served on multiple NSF panels for computer vision and machine
learning.
Fernando De la Torre is an Associate Research Professor in the
Robotics Institute at Carnegie Mellon University. He received his B.Sc.
degree
in Telecommunications, as well as his M.Sc. and Ph. D degrees in
Electronic
Engineering from La Salle School of Engineering at Ramon Llull
University,
Barcelona, Spain in 1994, 1996, and 2002, respectively. His research
interests
are in the fields of Computer Vision and Machine Learning.
Specifically, he is
interested in modeling and recognizing human behavior with a focus on
understanding human behavior from multimodal sensors (e.g. video, body
sensors). He has done extensive work on facial image analysis (e.g.,
facial
expression recognition, facial feature tracking). In machine
learning his
interest centers on developing efficient and robust supervised and
unsupervised
methods to model high-dimensional data. Currently, he is
directing the
Component Analysis Laboratory (http://ca.cs.cmu.edu)
and the Human Sensing Laboratory (http://humansensing.cs.cmu.edu)
at Carnegie Mellon University. He has over 100 publications in
referred
journals and conferences. He has organized and co-organized
several
workshops and has given tutorials at international conferences on the
use and
extensions of Component Analysis.
Mohan Trivedi received his PhD in Electrical Engineering from Utah State University
in 1979, after completing undergraduate work in India. At Utah State,
he received a Graduate Research Scholarship, and went on to teach at
.... He has published extensively and has edited over a dozen volumes
including books, special issues, video presentations, and conference
proceedings. Trivedi is a recipient of the Pioneer Award and the
Meritorious Service Award from the IEEE Computer Society; and the
Distinguished Alumnus Award from Utah State University. He is a Fellow
of the International Society for Optical Engineering (SPIE). He is a
founding member of the Executive Committee of the UC System-wide
Digital Media Innovation Program (DiMI). Trivedi is also
Editor-in-Chief of Machine Vision & Applications. Organizer Biographies |
Thomas B. Moeslund is Professor
of computer vision and head of the Visual Analysis of People Lab at
Aalborg University, Denmark. In 2000 - 2003 he acted as a Vision
Engineer consultant at the company Thoustrup and Overgaard, Randers,
Denmark. Prof. Moeslund's research interests include: Computer vision,
Machine vision, Looking at people (human motion capture, gesture
recognition, tracking, pose estimation), augmented reality, HCI,
computer graphics animations, and multi-modal systems.
Prof. Moeslund has been involved in nine national and international
research projects, both as coordinator, WP leader and researcher. He
has published more than 75 peer reviewed journal and conference papers.
Awards include a best IEEE paper award, a most cited paper award (from
CVIU), and a teacher of the year award. He serves as associate
editor/editorial board member of four international journals. He acts
as reviewer for all major journals within the field of computer vision
and image processing, and has been in a number PC committees. Moreover
he has co-chaired five international workshops/toturials related to
human motion analysis. He co-edited a recent CVIU SI on human motion.
Leonid Sigal is a
Research Scientist at Disney Research Pittsburgh, in conjunction with
Carnegie Mellon University. Prior to this he was a postdoctoral fellow
in the Department of Computer Science at University of Toronto. He
completed his Ph.D. under the supervision of Prof. Michael J. Black at
Brown University in 2008; he received his B.Sc. degrees in Computer
Science and Mathematics from Boston University (1999), his M.A. from
Boston University (1999), and his M.S. from Brown University (2003).
From 1999 to 2001, he worked as a senior vision engineer at Cognex
Corporation, where he developed industrial vision applications for
pattern analysis and verification.
Leonid's research interests mainly lie in the areas of computer vision,
machine learning, and computer graphics, but also borderline fields of
psychology and humanoid robotics. He has published more than 30 papers
in top venues and journals in computer vision, computer graphics and
machine learning (including publications in PAMI, IJCV, CVPR, ICCV,
ECCV, NIPS, and ACM SIGGRAPH). His work received the Best Paper Award
at the Articulate Motion and Deformable Objects Conference in 2006
(with Prof. Michael J. Black). He acts as reviewer for all major
conferences and journals within the fields of computer vision and
computer graphics, and has been consistently on PC committees for CVPR,
ICCV, ECCV, and IJCAI. He has co-edited an IJCV special issue on
Evaluation of Human Motion and Pose Estimation last year.
Adrian
Hilton is Professor of Computer Vision and Graphics and Head of the
Visual Media Research Group at the University of Surrey, UK. Over the
past decade he has published over 100 refereed journal and
international conference research articles in robust computer vision
techniques to build models of real world objects from images to meet
the requirements of the entertainment and communication industries.
Scientific contributions have been recognized by two journal and one
conference best paper awards. Innovative contributions of this research
led to the first commercial hand-held 3D scanner and the first system
for capturing animated models of people have been recognized through
two EU IST Awards for Innovation, a DTI Manufacturing Industry
Achievement Award and a Computer Graphics World Innovation Award. He
currently serves as an area editor for the journal Computer Vision and
Image Understanding, the EPSRC Peer Review College for UK funding
applications and the Executive of the IEE Professional Network in
Multimedia Communications. He is a Chartered Engineer and member of
IEE, IEEE and ACM.
Volker Krüger received his Dipl.-Inf. degree and doctor's degree from
Christian-Albrechts-Universität (CAU) Kiel, Germany, in 1997 and 2000,
respectively. He was a postdoctoral fellow at the Center for Autmation
Resarch at Univ. of Maryland from 2000-2002. Since 2002, Volker Krüger
is Assoc. Prof. at Aalborg University in Denmark. Volker Krüger is with
the Computer Vision and Machine Intelligence Lab (CVMI) at the
Copenagen Inst. of Technology (CIT) of Aalborg University. His research
focuses on computer vision and robotics based approaches for learning
and recognizing human actions and activities.