human computer interaction using image proscessing

32
 Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition  ISB&M School of Technology Pune Page 1 CHAPTER 1 1. INTRODUCTION One of the important challenges in Human Computer Interactions is to develop more intuitive and more natural interfaces. Computing environments presently are strongly tied to the availability of a high resolution pointing device with a single, discrete two dimensional cursor. Modern Graphical user interface (GUI), which is a current standard interface on personal computers (PCs), is well-defined, and it provides an efficient interface for a user to use various applications on a computer. GUIs (graphical user interfaces) combined with devices such as mice and track pads are extremely effective at reducing the richness and variety of human communication down to a single point. While the utility of such devices in today’s interfaces cann ot be denied, there are many users who find that the capability of GUI is rather limited when they try to do some tasks by using gestures. There are opportunities to apply other kinds of sensors and techniques to enrich the user experience of such users. For example, video cameras and computer vision techniques may be used to capture many details of human shape and movement. The shape of the hand may  be analyzed over time to manipulate an onscreen object in a way analogous to the hand’s manipulation of paper on a desk. Such an approach may lead to a faster, more natural, and more fluid style of interaction for certain tasks. Ubiquitous computing is devoted to changing the relationship between humans and the computers with which we interact, towards allowing computers to become invisible and recede into the periphery of people’s lives. Our project, Human Computer Interaction Where Controlling Computer and Applications Using Image Processing and Voice recognition is an attempt in ubiquitous computing. Here, we will be using colored tapes on our fingers. One of the tapes will be used for controlling cursor movement while the relative distance between the two colored tapes will be used for click events of the mouse. And center color tape we are using for gestures Also, we will  be enriching our system with voice recognition capability to perform basic actions like shutdown, search and surfing thus, the system will provide a new experience for users in interacting with the computer

Upload: ishanshikarkhane

Post on 04-Nov-2015

12 views

Category:

Documents


0 download

DESCRIPTION

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

TRANSCRIPT

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 1

    CHAPTER 1

    1. INTRODUCTION

    One of the important challenges in Human Computer Interactions is to develop more

    intuitive and more natural interfaces. Computing environments presently are strongly tied to the

    availability of a high resolution pointing device with a single, discrete two dimensional cursor.

    Modern Graphical user interface (GUI), which is a current standard interface on personal

    computers (PCs), is well-defined, and it provides an efficient interface for a user to use various

    applications on a computer. GUIs (graphical user interfaces) combined with devices such as mice

    and track pads are extremely effective at reducing the richness and variety of human

    communication down to a single point.

    While the utility of such devices in todays interfaces cannot be denied, there are many

    users who find that the capability of GUI is rather limited when they try to do some tasks by

    using gestures. There are opportunities to apply other kinds of sensors and techniques to enrich

    the user experience of such users. For example, video cameras and computer vision techniques

    may be used to capture many details of human shape and movement. The shape of the hand may

    be analyzed over time to manipulate an onscreen object in a way analogous to the hands

    manipulation of paper on a desk. Such an approach may lead to a faster, more natural, and more

    fluid style of interaction for certain tasks.

    Ubiquitous computing is devoted to changing the relationship between humans and the

    computers with which we interact, towards allowing computers to become invisible and recede

    into the periphery of peoples lives.

    Our project, Human Computer Interaction Where Controlling Computer and

    Applications Using Image Processing and Voice recognition is an attempt in ubiquitous

    computing. Here, we will be using colored tapes on our fingers. One of the tapes will be used for

    controlling cursor movement while the relative distance between the two colored tapes will be

    used for click events of the mouse. And center color tape we are using for gestures Also, we will

    be enriching our system with voice recognition capability to perform basic actions like

    shutdown, search and surfing thus, the system will provide a new experience for users in

    interacting with the computer

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 2

    CHAPTER 2

    1. PROBLEM DEFINITION

    The project that we are trying to develop will completely change the way people are going to use

    the computer system. Presently, we are using the camera and mice to detect the hand gesture and

    voice commands to control the computer and its applications.

    Also this would lead to a new era of Human Computer Interaction

    (HCI) where no physical contact with the device is required. And it can be used in many media

    application as well as new product designs. It can be used in advertisement industry for the

    natural user interface so that user can be connected with the advertisers more effectively.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 3

    CHAPTER 3

    3. LITERATURE SURVEY

    3.1. RELATED WORK

    A lot of research is being done in the fields of Human Computer Interaction (HCI) and

    Robotics. Researchers have tried to control mouse movement using video devices for HCI.

    However, all of them used different methods to make mouse cursor movement and clicking

    events.

    3.1.1] A Method for Controlling Mouse Movement using a Real-Time Camera. [1].

    Hojoon Park

    One approach, by Hojoon Park [1] used index finger for cursor movement and angle between

    index finger and thumb for clicking events.

    Working:

    Hojoon Park [1] used index finger for cursor movement. In his system Hojon park [1] used index

    finger movement to move mouse movements on the computer. In which he used the effective

    algorithm to detect the fingers. Hojoon Park [1] showed that we can use angle between finger

    and thumb for clicking events.

    To recognize that a finger is inside of the palm area or not, He used a convex hull algorithm. The

    convex hull algorithm is used to solve the problem of finding the biggest polygon including all

    vertices. Using this feature of this algorithm, He can detect finger tips on the hand. He used this

    algorithm to recognize if a finger is folded or not. To recognize those states, He multiplied 2

    times (He got this number through multiple trials) to the hand radius value and check the

    distance between the center and a pixel which is in convex hull set. If the distance is longer than

    the radius of the hand, then a finger is spread. In addition, if two or more interesting points

    existed in the result, then He regarded the longest vertex as the index finger and the hand gesture

    is clicked when the number of the result vertex is two or more.

    Advantages:

    He developed a system to control the mouse cursor using a real-time camera. He implemented all

    mouse tasks such as left and right clicking, double clicking, and scrolling. This system is based

    on computer vision algorithms and can do all mouse tasks

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 4

    The Hojoon Park [1] used Simple Effective System which is less complicated. And the

    Algorithms are simple and take less times. The system is easy to use can be used to control small

    applications.

    Limitations:

    In this project, the problem was when the finger shook at lot. Since He used real-time video, the

    Illumination changes every frame. Hence the position of the hand changes every frame. Thus, the

    fingertip position detected by convex hull algorithm is also changed. Then the mouse cursor

    pointer shakes fast. To fix this problem, we added a code that the cursor does not move if the

    difference of the previous and the current finger tips position is within 5 pixels. This constraint

    worked well but it makes it difficult to control the mouse cursor sensitively. Another problem by

    illumination issue is segmentation of the background for extracting the hand shape. Since the

    hand reflects all light sources, the hand color is changed according to the place. If hand shape is

    not good then our algorithm cannot work well because our algorithm assumes the hand shape is

    well segmented. If the hand shape is not good then we cannot estimate the length of radius of the

    hand.

    3.1.2] Virtual mouse vision based interface, January 2004 [3]

    Robertson P., Laddaga R., Van Kleek M.[3]

    Working

    There solution was to develop a virtual mouse that enables users to control the kiosk with hand

    signs and movements. The kiosk has a standard visual user interface, with arrowcursor to

    indicate pointer movement. The user walks up to the kiosk. People approaching the kiosk are

    tracked by a robotic head called IGOR (Intelligent Gaze Oriented Robot)[3] described below.

    When the user makes a recognized hand sign the kiosk allows movement of the hand to move the

    mouse pointer on the kiosk display. Separate hand signs allow for clicking of the mouse buttons

    for making selections on the kiosk display.

    Note that the arrow pointer is the only feedback the user gets as to where the user is pointing.

    The user can use that feedback, adjusting to imperfections in tracking, without the distraction of

    a distinct and dierent other signal.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 5

    Advantages:

    In the face tracking state the face recognition is a higher priority than the sign recognition. In the

    sign tracking state the optical ow is given the higher priority. In this way good responsiveness

    is achieved in both user tracking and gesture tracking.

    Optical ow allows smooth tracking of the hand gestures. It is robust because no recognition is

    required to achieve mouse motion, and it also provides smooth motion estimates.

    Limitations:

    The System was not Capable for complex operations and It cannot control high end system and

    applications which lowers its scope.

    3.1.3] Real-time Hand Tracking and Finger Tracking for Interaction [4]

    Shahzad Malik CSC2503F Project Report, December 18, 2003[4]

    Working:

    In this system, which is primarily based on the single hand tracker presented in [Segen99].[4]

    The system can extract the 3D position and 2D orientation of the index finger for each hand, and

    when present the pose of the thumb as well. In interactive applications a single pointing gesture

    could then be used for selection operations while both the thumb and index finger could be used

    together for pinching gestures [4] in order to grasp and manipulate virtual objects.

    Advantages:

    This project presents the implementation and analysis of a real-time stereo vision hand tracking

    system that can be used for interaction purposes. The system uses two low-cost web cameras

    mounted above the work area and facing downward. In real-time, the system can track the 3D

    position and 2D orientation of the thumb and index finger of each hand without the use of special

    markers or gloves, resulting in up to 8 degrees of freedom for each hand.

    Limitations:

    Misclassification problem occurs when two hands appear close together in the captured images.

    This results in a single large region being segmented by the background subtraction and skin

    detection phases. Therefore the contour detector interprets the two hands as a single hand and

    thus the fingers are labelled as a right hand.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 6

    3.1.4] Portable Vision-Based HCI - A Real-time Hand Mouse System on Handheld Devices

    [9]

    Chu-Feng Lien,[9]

    Working:

    They assume the popularity of equipping a low-resolution camera on those handheld devices. By

    adopting their embedded cameras, the system can detect a user's hand motion in real-time, and

    results in the autonomous manipulation of corresponding programs on the device. To run the

    vision-based HCI system on the handheld devices, computing power of frame processing is a

    critical concern. They used Adaboost [9] method proposed by Viola and Jones seem to be a

    choice for the project, they did not have a good result with a low-resolution camera. Instead of

    using gesture recognition methods, then they use Motion History Images. By grabbing and

    processing the image on pixels directly, they gain an efficiency way of computing as a result.

    Advantages:

    There system can find Projected screen and that is so much useful for the Projection purpose

    The system is more convenient because it can be used on all the portable devices. The Low

    resolution Cameras Can be supported.

    Limitations:

    High error rate on fast moving motion (this issue can be improved by increasing the framing

    rate) will result in high false positive detection. If the speaker walks around the projected area

    continuously, the system can adapt to this behaviour and performs well, but if the speaker

    suddenly stops in the middle of the screen, system will result in a false alarm. If the

    environmental lights are changing or a shadow is projected within the scope of the camera, the

    system will be misleading to a difference result. If the edge color of projected screen is similar to

    its neighbour objects. The screen will not be well detected.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 7

    CHAPTER 4

    2. SOFTWARE REQUIREMENT SPECIFICATION

    4.1 INTRODUCTION

    As computer technology continues to develop, people have smaller and smaller electronic

    devices and want to use them ubiquitously. There is a need for new interfaces designed

    specifically for use with these smaller devices. Increasingly we are recognizing the importance of

    human computing interaction (HCI), and in particular vision-based gesture and object

    recognition. Simple interfaces already exist, such as embedded keyboard, folder-keyboard and

    mini-keyboard. However, these interfaces need some amount of space to use and cannot be used

    while moving. Touch screens are also a good control interface and nowadays it is used globally

    in many applications. However, touch screens cannot be applied to desktop systems because of

    cost and other hardware limitations. By applying vision technology and controlling the mouse by

    natural hand gestures, we can reduce the work space required. In this paper, we propose a novel

    approach that uses a video device to control the mouse system. This mouse system can control

    all mouse tasks, such as clicking (right and left), doubleclicking and scrolling. We employ

    several image processing algorithms to implement this.

    Our project, Human Computer Interaction Where Controlling Computer and

    Applications Using Image Processing and Voice recognition is an attempt in ubiquitous

    computing. Here, we will be using colored tapes on our fingers. One of the tapes will be used for

    controlling cursor movement while the relative distance between the two colored tapes will be

    used for click events of the mouse. And center color tape we are using for gestures Also, we will

    be enriching our system with voice recognition capability to perform basic actions like

    shutdown, search and surfing thus, the system will provide a new experience for users in

    interacting with the computer

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 8

    4.2 SYSTEM FEATURES

    We introduce an effective way of Human Computer Interaction where the proposed system has following

    modules

    Controlling the Mouse Movements through Hand Movements.

    Controlling Media Player Options through Hand Gestures.

    Controlling Application like Games, Maps, Image Viewer.

    Perform Basic Operations Through Voice Recognition like Search, Shutdown, Restart, etc.

    Here we are using Users hand gesture through webcam and process it using image processing techniques

    and we are using voice recognition to make it more natural interaction with the system.

    4.3 EXTERNAL INTERFACE REQUIREMENT

    There are many types of interfaces such as User Interface, Software Interface and Hardware

    Interface.

    User Interfaces

    The user interface for the software shall be compatible to windows operating system.

    Where user interface is developed using java media framework and Camera should be

    compatible with the system.

    Software Interfaces

    The system utilizes JDK(1.6) framework which provides it with the necessary

    components to build system components and objects, plus providing the system with the required

    data access components.

    Communication Interfaces

    Graphical user interface is most convenience way to do the interaction with the system.

    So in our proposed system we have design and developed GUI to interact with the system by

    using Java Swing classes.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 9

    4.4 FUNCTIONAL REQUIREMENT

    Class Function Requirement

    Color_Tape_Detection Color_Extraction HSV Color Detection

    Algorithm

    blob_Detection Histogram-based Skin

    Classifier

    Cursor_Move Mapping Cursor Control

    Algorithm, Weighted Speed

    Cursor Control Algorithm

    Voice_Recognition Voice_Recognise Dynamic Time Wrapping

    Algorithm

    Table 1

    4.5 NON FUNCTIONAL REQUIREMENT

    Performance Requirements

    High Speed: System should process voice messages in parallel for various users to give

    quick response then system must wait for process completion.

    Better component design: To get better performance at peak time

    Security Requirements

    Secure access of confidential data (users details). Information security means protecting

    information and information systems from unauthorized access, use, disclosure,

    disruption, modification or destruction.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 10

    1. User password must be stored in encrypted form for the security reason

    2. All the user details shall be accessible to only high authority persons.

    3. Access will be controlled with usernames and passwords.

    4. Voice database must secure.

    Extensibility

    Extensibility allows adding new component to the system, replaces the existing ones.

    This is done without affecting the components that are in their original places. Flexible

    service based architecture will be highly desirable for future extension

    Scalability

    The solution should be able to accommodate high number of customers and brokers. Both

    may be geographically distributed.

    Compatibility

    Compatibility is the measure with which user can extend the one type of application with

    another.

    Serviceability

    In software engineering and hardware engineering, serviceability also known as

    supportability, is one of the aspects (from IBM's RASU (Reliability, Availability,

    Serviceability, and Usability). It refers to the ability of technical support personnel to

    install, configure, and monitor computer products, identify exceptions or faults, debug or

    isolate faults to root cause analysis, and provide hardware or software maintenance in

    pursuit of solving a problem and restoring the product into service.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 11

    4.6 ANALYSIS MODEL

    4.6.1 DFD level 0

    Figure 1

    2.1.1 DFD level 1

    Figure 2

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 12

    2.1.2 DFD level 2

    Figure 3

    4.6.4 Class Diagram:

    Figure 4

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 13

    4.6.5 ER Diagram

    Figure 5

    4.7 SPECIFIC REQUIREMENTS

    Project System will be Windows based supporting versions, Windows XP onwards. The

    minimum configuration required for system is:

    4.7.1 Hardware:

    CPU : 2.4 GHZ (intel or AMD)

    Hard disk : 80 GB

    RAM : 1 GB.

    Camera : 2 Megapixel (minimum) and 30 FPS (Frame per Seconds)

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 14

    4.7.2 Software:

    JDK 1.6

    NetBeans IDE

    Java Advance Imaging

    Java Media Framework

    SAPI

    4.8 SYSTEM IMPLEMENTATION PLAN

    Sr. No.

    Planning

    Start Date

    Completion Date

    1. Topic search and Finalization 4th

    July 2014 25th

    July 2014

    2. Literature Survey

    8th

    August 2014

    21st August 2014

    3. Objective And Planning

    22nd

    August 2014

    28th

    August 2014

    4. Software ,Hardware Requirements & Budget

    29th

    August 2014

    5th

    September 2014

    5. DFD , UML diagrams

    8th

    September 2014

    10th

    September 2014

    6. Algorithm Design

    10th

    September 2014

    13th

    September 2014

    7. Algorithm Analysis

    15th

    September 2014

    21st October 2014

    8. Preliminary Report 22nd

    October 2014 29th

    September 2014

    9. Study of Project related Technology 1st November 2014 3rd January 2014

    10. Coding and Implementation of Project. 5th

    January 2014 20th

    February 2014

    11. Working Model and Testing. 24th

    February 2014 15th

    March 2014

    12. Tested and Executable Project Model. 16th

    March 2014 29th

    March 2014

    13. Final Report and Deployment of Project. 2nd

    April 2014 20th

    April 2014

    Table 2

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 15

    4.9 BUDGET

    Note: OSS (Open Source Software)

    Sr. No. Product Quantity Cost 1

    1. Computer 1 20000

    2. WebCamera 1 2500

    3. Windows XP 1 3000

    4. Net BeansIDE 7.0.1 1 OSS

    5. JDK 1.6 1 OSS

    6. STAR UML 1 OSS

    Total 27500

    Table 3

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 16

    CHATPER -5

    5. SYSTEM DESIGN

    5.1 SYSTEM ARCHITECTURE:

    In our Proposed System We Have 3 Main Blocks.

    1. Functional Block

    2. Voice Recognition Module

    3. Technology

    Functional Block Takes a Camera feed as a input and process it through image processing

    algorithms as well as voice recognition module takes a voice commands as a input and process

    those voice commands both the functional block and voice recognition module performs the

    processing using technology block.

    Figure 6

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 17

    4.2 UML DIAGRAMS

    4.2.1 USE-CASE Diagram

    Use case Diagram

    Use case diagrams are one of the five diagrams in the UML for modelling the dynamic

    aspects of systems (activity diagrams, state chart diagrams, sequence diagrams, and collaboration

    diagrams are four other kinds of diagrams in the UML for modelling the dynamic aspects of

    systems). Use case diagrams are central to modelling the behaviour of a system, a subsystem, or

    a class. Each one shows a set of use cases and actors and their relationships. You apply use case

    diagrams to model the use case view of a system. For the most part, this involves modelling the

    context of a system, subsystem, or class, or modelling the requirements of the behaviour of these

    elements. Use case diagrams are important for visualizing, specifying, and documenting the

    behaviour of an element. They make systems, subsystems, and classes approachable and

    understandable by presenting an outside view of how those elements may be used in context. Use

    case diagrams are also important for testing executable systems through forward engineering and

    for comprehending executable systems through reverse engineering.

    Figure 7

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 18

    4.2.2 Activity Diagram

    A class diagram shows a set of classes, interfaces, and collaborations and their

    relationships. These diagrams are the most common diagram found in modeling object-oriented

    systems. Class diagrams address the static design view of a system. Class diagrams that include

    active classes address the static process view of a system

    Figure 8

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 19

    4.2.3 Sequence Diagram

    A statechart diagram shows a state machine, consisting of states, transitions, events, and

    activities. You use statechart diagrams to illustrate the dynamic view of a system. They are

    especially important in modelling the behaviour of an interface, class, or collaboration.

    Statechart diagrams emphasize the event-ordered behaviour of an object, which is especially

    useful in modelling reactive systems.

    Figure 9

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 20

    CHAPTER -6

    5. TECHNICAL SPECIFICATION

    6.1 TECHNOLOGY USED IN PROJECT

    6.1.1 JDK 1.6

    JDK (Java Development Kit) is a free software development package from Sun

    Microsystems that implements the basic set of tools needed to write, test and debug Java

    applications and applets.

    6.1.2 Java Advance Imaging

    The Java Advanced Imaging API extends the Java 2 platform by allowing sophisticated,

    high-performance image processing to be incorporated into Java applets and applications. It is a

    set of classes providing imaging functionality beyond that of Java 2D and the Java Foundation

    classes, though it is designed for compatibility with those APIs. This API implements a set of

    core image processing capabilities including image tiling, regions of interest, deferred execution

    and a set of core image processing operators, including many common point, area, and frequency

    domain operators.

    The Java Advanced Imaging API is intended to meet the needs of technical (medical,

    seismological, remote sensing, etc.) as well as commercial imaging (such as document

    production and photography). The API can benefit all Java developers who want to incorporate

    imaging into their Java applets and applications.

    Features of JAI

    Rich set of functionality for digital imaging.

    High level of extensibility to allow arbitrary processing capabilities.

    Support for a wide variety of data types.

    Deferred Execution.

    Remote Imaging and truly distributed imaging.

    Allow multiple implementations with different trade-offs of memory usage, operator

    optimization, and hardware acceleration.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 21

    6.1.3 Java Media Framework

    JMF is a framework for handling streaming media in Java programs. JMF is an optional

    package of Java 2 standard platform. JMF provides a unified architecture and messaging protocol

    for managing the acquisition, processing and delivery of time-based media. JMF enables Java

    programs to

    Present ( playback) multimedia contents,

    Capture audio through microphone and video through Camera,

    Do real-time streaming of media over the Internet,

    Process media ( such as changing media format, adding special effects),

    Store media into a file.

    Features of JMF

    JMF supports many popular media formats such as JPEG, MPEG-1, MPEG-2,

    QuickTime, AVI, WAV, MP3, GSM, G723, H263, and MIDI. JMF supports popular

    media access protocols such as file, HTTP, HTTPS, FTP, RTP, and RTSP.

    JMF uses a well-defined event reporting mechanism that follows the Observer design

    pattern. JMF uses the Factory design pattern that simplifies the creation of JMF

    objects. The JMF support the reception and transmission of media streams using Real-

    time Transport Protocol (RTP) and JMF supports management of RTP sessions.

    JMF scales across different media data types, protocols and delivery mechanisms. JMF

    provides a plug-in architecture that allows JMF to be customized and extended.

    Technology providers can extend JMF to support additional media formats. High

    performance custom implementation of media players, or codecs possibly using hardware

    accelerators can be defined and integrated with the JMF.

    6.2 ADVANTAGE

    Hand as an acceptable tool to control a computer cursor.

    It will enable people to interact with computers without physical contact.

    It Will give more natural way to interact with computer

    Benefits from Finger Mouse & Voice in other applications as commercials and

    interactive advertisements.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 22

    6.3 APPLICATIONS

    Our system can be used for the human computer interaction

    It can also used to control computer

    It can be used to control various software applications

    Our system can be used to control power point presentation

    Our system can be used to control or play games

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 23

    CHAPTER 7

    7. CONCLUSION

    The product that we are trying to develop will improve the way people are going to use the

    computer system. Presently, the webcam, microphone and mouse are an integral part of the

    computer system. Our product which uses only two of them i.e. webcam and microphone would

    may eliminate the mouse. Also this would lead to a new era of Human Computer Interaction

    (HCI) where no physical contact with the device is required.

    This technology can be further enhanced for use in robotics, gaming and developing

    systems which could understand human behaviour based on their way of interaction.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 24

    CHAPTER -8

    REFERENCES

    1. Hojoon Park, A Method for Controlling Mouse Movement using a Real-Time Camera.

    2. Hart Lambur, Blake Shaw, Gesture Recognition, CS4731 Project, December 21,2004.

    3. Robertson P., Laddaga R., Van Kleek M., Virtual mouse vision based interface, January

    2004.

    4. Shahzad Malik, Real-time Hand Tracking and Finger Tracking for Interaction CSC2503F

    Project Report, December 18, 2003

    5. A. Erdem, E. Yardimci, Y. Atalay, V. Cetin, Computer vision based mouse, A. E.

    Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASS). IEEE

    International Conference

    6. Y. Sato, Y. Kobayashi, H. Koike. Fast tracking of hands and fingertips in infrared images

    for augmented desk interface. In Proceedings of IEEE, International Conference on

    Automatic Face and Gesture Recognition (FG), 2000. pp. 462-467.

    7. J. Segen, S. Kumar. Shadow gestures: 3D hand pose estimation using a single camera. In

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

    1999. Vol. 1, pp. 479-485.

    8. James M. Rehg, Takeo Kanade, DigitEyes: Vision-Based Hand Tracking for Human-

    Computer Interaction, Proc. of the IEEE Workshop on Motion of Non-Rigid and

    Articulated Objects, Austin, Texas, November 1994, pages 16-22.

    9. Chu-Feng Lien, Portable Vision-Based HCI - A Real-time Hand Mouse System on

    Handheld Devices.

    10. Asanterabi Malima, Erol Ozgur, and Mujdat Cetin, A Fast Algorithm for Vision-Based

    Hand Gesture Recognition for Robot Control.

    11. Stephen Tu, HSV Color Detection Algorithm, White Paper.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 25

    ANNEXURE

    ANNEXURE A:

    Project Analysis of Algorithm Design

    Project Analysis:

    Let G be a closed graph that represents our system of mouse simulation and application

    control; such that G = {E, V} where E represents the set of edges; E = {e0, e1, e2, e3 ....., e14}

    and V is a set of vertices; V = {v0, v1, v2, v3....., v10}.

    In the graphical representation of the system, vertices in the set V represent the modules

    which are connected through directed edges in the set E representing the input/output of

    modules. We define the vertices as,

    Figure 10

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 26

    VERTEX MODULE

    v0 Initialize

    v1 Image Capture

    v2 Voice Capture

    v3 Finger Tape Colour Detect

    v4 Voice Recognition

    v5 Event Detect

    v6 Mouse Operation Events

    v7 OperationGesture events

    v8 Key Strokes

    v9 Voice Operations

    v10 Aggregation

    Table 4

    We define the edges as,

    EDGE INPUT/OUTPUT

    e0 Call to camera

    e1 Call to microphone

    e2 Image frames

    e3 Voice records

    e4 Pixel position

    e5 Distance between tapes and wait time

    e6 Voice command

    e7 Operation Gesture Detected

    e8 Gesture For Key stroke

    e9 Voice Command

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 27

    e10 Call to aggregation module

    e11 -------------------------------

    e12 -------------------------------

    e13 -------------------------------

    e14 Iteration Call

    Table 5

    Let fe be a rule of E into V such that for given edge; it returns vertices. fe (E) | V.

    Thus, for our system,

    fe (e0) = {v1}.......v1 is called using e0 to capture image.

    fe (e1) = {v2}.......v2 is called using e1 to capture voice.

    fe (e2) = {v3}.......frames are passed to v3 using e2 for detection.

    fe (e3) = {v4}......voice data is passed to v4 using e3 for recognition.

    fe (e4) = {v6}......position is passed to v6 using e4 for cursor movement.

    fe (e5) = {v5}......distance between coloured tapes or wait time is passed to v5 using e5 for event

    detection.

    fe (e6) = {v5}......voice command is passed to v5 using e6 for event detection.

    fe (e7) = {v7}.......v7 is called for Operation Gesture event using e7.

    fe (e8) = {v8}.......v8 is called for key stroke event using e8.

    fe (e9) = {v9}.......v9 is called for Voice Operation event using e9.

    fe (e10) ={v10}

    fe (e11) = {v10}

    fe (e12) = {v10} e10, e11, e12, e13 aggregate at v10.

    fe (e13) = {v10}

    fe (e14) = {v0}......v0 is called again to iterate using e14.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 28

    Overloading on {e4, e5, e6}

    The mouse movement and events are overloaded using pixel position (e4), distance

    between coloured tapes and wait time (e5), and voice commands (e6).

    COMPLEXITY

    Our system involves three main modules

    Image recognition and analysis (v3)

    Voice recognition and analysis (v4)

    Event selection (v5)

    For a standard image recognition and analysis module/system, the complexity is logarithmic

    and it is given as

    ___________________________________________________________________________

    O( mn + (mn/k2 )log(mn/k

    2 ))

    where (m x n) are width and height of image and (k x k) is segmentation blob respectively.

    ___________________________________________________________________________

    Also, for a standard voice recognition module/system, the complexity is quadratic in

    nature and it is given as

    ___________________________________________________________________________

    O( n2

    v)

    where v is number of words in dictionary and n is length of sequence respectively.

    ___________________________________________________________________________

    And the complexity of the event selection module depends on the number of events involved in

    it. Thus for n events, the complexity is

    ___________________________________________________________________________

    O(n).

    ___________________________________________________________________________

    Thus, total complexity of our system is given as

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 29

    ___________________________________________________________________________

    Total complexity = Image recognition complexity + Voice recognition complexity + Event

    selection complexity

    = O( mn + (mn/k2 )log(mn/k

    2 )) + O( n

    2 v) + O(n)

    ___________________________________________________________________________

    Hence, Overall complexity of our system nearly comes out to be O(n2).

    P Class Problem

    A problem is in P class if it is solvable in polynomial time by a deterministic algorithm.

    For our system, algorithms are deterministic and the overall complexity is O(n2) which shows

    that it is in P class.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 30

    ANNEXURE B:

    Project Quality and Reliability Testing of Project Design

    Testing is the process of evaluating a system or its component(s) with the intent to find that

    whether it satisfies the specified requirements or not. This activity results in the actual, expected

    and difference between their results. In simple words testing is executing a system in order to

    identify any gaps, errors or missing requirements in contrary to the actual desire or requirements.

    There are different types of testing which may be used to test a Software during SDLC.

    Manual testing: This type includes the testing of the Software manually i.e. without using

    any automated tool or any script.

    Automation testing: Automation testing which is also known as Test Automation, is when

    the tester writes scripts and uses another software to test the software.

    There are different methods which can be use for Software testing.

    Black Box Testing: The technique of testing without having any knowledge of the interior

    workings of the application is Black Box testing.

    White Box Testing: White box testing is the detailed investigation of internal logic and

    structure of the code. White box testing is also called glass testing or open box testing.

    Grey Box Testing: Grey Box testing is a technique to test the application with limited

    knowledge of the internal workings of an application.

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 31

    ANNEXURE C

    ABBREVIATIONS

    A

    AMD: Advanced Micro Devices

    API: Application Programming Interface

    AVI: Audio Video Interleave

    C

    CPU: Central Processing Unit

    D

    DFD: Data Flow Diagram

    E

    ER Diagram: Entity Relationship Diagram

    F

    FPS: Frame per Seconds

    FTP:File Transfer Protocol

    G

    GB: GigaByte

    GUI : Graphical user Interface

    GSM: Global System for Mobile

    H

    HCI: Human Computer Interaction

    HSV: Hue Saturation Value

    HTTP: Hyper Text Transfer Protocol

    HTTPS:Hyper Text Transfer Protocol Secure

    I

    IGOR: Intelligent Gaze Oriented Robot

    IBM: International Business Machines

    IDE: Integrated Development Environment

  • Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

    ISB&M School of Technology Pune Page 32

    J

    JDK(1.6): JAVA Development Kit 1.6

    JAI: Java Advance Imaging

    JMF: Java Media Framework

    JPEG: Joint Photographic Experts Group

    M

    MPEG-1: Motion Picture Experts Group 1

    MPEG-2: Motion Picture Experts Group 2

    MP3: MPEG-2 Audio Layer III

    MIDI: Musical Instrument Digital Interface

    O

    OSS:Open Source Software

    P

    PC : Personal Computer

    R

    RASU: (Reliability, Availability, Serviceability, and Usability)

    RAM: Random Access Memory

    RTP: Real-Time Transport Protocol

    RTSP: Real-Time Streaming Protocol.

    S

    SAPI: Speech Application Programming Interface

    SDLC: Software Development Life Cycle

    U

    UML: Unified Modelling Language

    W

    WAV: WAVEform audio format

    Numbers

    3D: Three Dimensional

    2D: Two Dimensional