Tuesday, June 4, 2019
Techniques for Understanding Human Walking Motion
Techniques for Understanding Human Walking MotionIntroductionMultimedia is a term that conjointly describes a variety of media content in stock(predicate) in various forms of text, speech, audio, still images, video, animation, graphics, 3D models and combinations of them apply to capture real time moments. Over the recent geezerhood the technological advances have enabled wide availability and easy access of multimedia content and much look into was dedicated to perform automated computational tasks for a wide spectrum of applications such as surveillance, crime investigation, fashion and designing, traditional aerospace, publishing and advertising, medical applications, virtual reality applications to name a few. The volume of multimedia selective information is so huge now that the improvement in various tasks of representation, analyzing, searching and retrieving process has become the need of the hour. Among all the available types of media, video is one of the heavy(a) fo rms, widely utilise for analyzing multimedia content.Several types of videos can be captured by various recording devices but then even the most suitable types of devices mapd for acquiring videos have to deal with two important problems- sensorial gap and semantic gap. The sensory gap being- the difference among the real world and its representation. The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000). Content-based image convalescence at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12)13491380.. The semantic gap being- the difference between the carriage description by homosexual batch and the computational model used by the gentleman activity/behavior digest systems. The semantic gap is the lack of coincidence between the information that one can extract from the visual info and the interpretation that the same data have for a user in a tending(p) situation Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000). Content-based image recovery at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12)13491380.. Many researchers have proposed to build computational models of the human visual system to represent as close as possible to the reality. A major development was the manakin proposed by David Marr at MIT, who used a bottom-up approach to represent scene understanding D. Marr, Vision A Computational Investigation into the Human Representation and process of Visual selective information, Freeman, san Francisco (1982). Later, various state-of the-art methods evolved but, the technology that helps people to merge the content of multimedia, for meaningful expression is still lagging behind.Within the realm of multimedia content analysis, estimator visual sense methods and algorithms have been used as foundation and the coupled relation between multimedia analysis and computer vision is a well-known challenge. Currently, the most everyday research performed by various researchers is the human movement analysis. Several types of activities that are performed by humans can be captured by various recording devices and the human motion analysis systems were built with respect to context of applications. The aim of human movement analysis systems is to automatically consider and transform the input video sequences into semantic interpretation of them. The recognition of human activities has been studied by computer vision for quite some time but is far behind the capabilities of human vision. In human visual system- when a someone moving is observed, humans brain recognizes that persons action by analyzing the transition of postures adopted or interprets behavior by tracking the persons transition of postures and noting the intent of action. This analysis is complex for computer vision systems. Since the human body is non-rigid, deformable, articulated, a person can have a variety of postures over time. The works on human activity analysis have not provided satisfactory results yet.To solve problems relating human movement analysis using videos, the paradigm of data fusion is recommended. Multimedia data fusion is a way to integrate multiple media, their associated features or integrate intermediate decisions to perform an analysis task. According to B.V Dasarathy, Combining Multimedia data fusion is a formal framework in which are expressed gist and tools for alliance of data originating from different sources for the exploitation of their synergy in order to obtain information whose quality cannot be achieved otherwise. Dasarathy, B.V. (2001) information fusion- what, where, why, when, and how? Information fusion, 2, 75-76. In the existing literature several contributions are made to research on data fusion techniques used in multisensory environments and multimodal fusion with the aim of fusing and aggregating data obtained from multiple sources. depiction data has a significant characteristic of multimodal content. Combining the information gathered from multiple modalities is valid approach to increase accuracy. P.K Atrey, M. a Hossain, A.E Saddik and M.S Kankahalli. Multimodal fusion for multimedia analysis A Survey. Multimedia systems 16(6) 345-379, 2010 Multimedia fusion is useful for several tasks such as detection, recognition, identification, tracking and a wide range of applications.This research work presents multimedia analysis in combination with computer vision and data fusion perspectives to understand human walking motion in video sequences. This kind of research is challenging.MotivationFrom the view point of data fusion this research work is motivated by the observation that all living organisms have the capability to use multiple senses to learn about the environme nt and then the brain fuses all the information to perform a decision task. Human observer can slow and instantly recognize action. But, the main limitations with the visual sensory of humans are, limited range of visual perception, limitations and compromises of human brain. Whereas, automatic systems can work 24 hours a day and 7 days a week allowing accurate event detection and their cost is lower to maintain.On the other hand, from the view point of computer vision, algorithms and techniques are yet to improve performance for analyzing humans walking found in videos. Computer vision systems are far behind the capabilities of human vision and have to deal with two important problems- sensory gap and semantic gap. The sensory gap being- the difference between the real world and its representation and the semantic gap being- the difference between the behavior description by human vision and the computational model used by the human activity/behavior analysis systems.A promising s trategy consists in integrating different techniques of data fusion and computer vision in a unified framework to enhance the performance of the tasks associated with analyzing human walking motion and overcoming the drawbacks.1.3 The GoalThe aim of this research work is to conduct a detailed investigation of currently available tools and techniques for understanding human walking motion and develop a generic framework where data fusion and computer vision perspectives are used to analyze human walking actions in context to real life applications. During the process of fusing, coefficient of correlation of activities and patterns of activities can be detected to predict intent. Finally, performance will be evaluated for true positives, false positives and misclassifications.Summary of contributionsOur work in the thesis is focused on the following significant contributionsDesign of a unified framework, for combining data fusion and computer vision methodology to improve the perform ance of automatic analysis of human movements in videos.Tasks of detecting moving humans and related sub-problems in video frames using unsupervised techniques.Efficient technique to handle occlusion in the task of tracking walking humans.New strategy for accomplishing the task of correlation and predictions during detection and tracking of humans.Noticing and Interpreting stances change in walking movements.1.5 OutlineThe thesis is organized as followsChapter 2 presents background and related literature examine on various existing strategies and approaches of data fusion and computer vision while providing motivation for the proposed approaches used for the work in this thesis.Chapter 3 Provides detailed definition on the unified framework. Show how the frame work helps in accomplishing the tasks of analysis in multimedia content for correlation and prediction along with a compare of proposed frame work to JDL, Dasarthy data fusion model.Chapter 4 Presents an overview of state-o f-the art methods for detection of humans in videos, the proposed novel work, experiments and the evaluations.Chapter 5 Presents an overview of state-of-the art methods for tracking of humans in videos, the proposed novel work, experiments and the evaluations.Chapter 6 Automatic interpretation of changes in stance changes in human walking.Chapter 7 Conclusions, future directions and related open issues are discussed.ReferencesSmeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000).Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12)13491380D. Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, Freeman, san Francisco (1982)Dasarathy, B.V. (2001) information fusion- what, where, why, when, and how? Information fusion, 2, 75-76P.K Atrey, M. a Hossain, A.E Saddik and M.S Kankahalli. Multimodal fusion for multimedia analysis A Sur vey. Multimedia systems 16(6) 345-379, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment