Computer Science Human-Computer Interaction

Gesture Recognition in Human-Computer Interaction

Description

This cluster of papers focuses on the research and development of gesture recognition systems, particularly in the context of human-computer interaction. The topics covered include hand gesture recognition, sign language recognition, depth sensor technology (e.g., Kinect), neural networks, real-time tracking, deep learning, and continuous recognition.

Keywords

Gesture Recognition; Human-Computer Interaction; Hand Gesture; Sign Language; Depth Sensor; Kinect Sensor; Neural Networks; Real-time Tracking; Deep Learning; Continuous Recognition

The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in … The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Mover's Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.
We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large viewpoints in realtime (25 FPS on a desktop without using a … We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large viewpoints in realtime (25 FPS on a desktop without using a GPU) and with high accuracy (error below 10 mm). To our knowledge, it is the first system that achieves such robustness, accuracy, and speed simultaneously, as verified on challenging real data. Our system is made of several novel techniques. We model a hand simply using a number of spheres and define a fast cost function. Those are critical for realtime performance. We propose a hybrid method that combines gradient based and stochastic optimization methods to achieve fast convergence and good accuracy. We present new finger detection and hand initialization methods that greatly enhance the robustness of tracking.
The Leap Motion Controller is a new device for hand gesture controlled user interfaces with declared sub-millimeter accuracy. However, up to this point its capabilities in real environments have not … The Leap Motion Controller is a new device for hand gesture controlled user interfaces with declared sub-millimeter accuracy. However, up to this point its capabilities in real environments have not been analyzed. Therefore, this paper presents a first study of a Leap Motion Controller. The main focus of attention is on the evaluation of the accuracy and repeatability. For an appropriate evaluation, a novel experimental setup was developed making use of an industrial robot with a reference pen allowing a position accuracy of 0.2 mm. Thereby, a deviation between a desired 3D position and the average measured positions below 0.2mmhas been obtained for static setups and of 1.2mmfor dynamic setups. Using the conclusion of this analysis can improve the development of applications for the Leap Motion controller in the field of Human-Computer Interaction.
Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI … Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI interface for mobile robots. We use a state-of-the-art big and deep neural network (NN) combining convolution and max-pooling (MPCNN) for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves. The hand contour is retrieved by color segmentation, then smoothened by morphological image processing which eliminates noisy edges. Our big and deep MPCNN classifies 6 gesture classes with 96% accuracy, nearly three times better than the nearest competitor. Experiments with mobile robots using an ARM 11 533MHz processor achieve real-time gesture recognition performance.
This paper presents a framework for hand gesture recognition based on the information fusion of a three-axis accelerometer (ACC) and multichannel electromyography (EMG) sensors. In our framework, the start and … This paper presents a framework for hand gesture recognition based on the information fusion of a three-axis accelerometer (ACC) and multichannel electromyography (EMG) sensors. In our framework, the start and end points of meaningful gesture segments are detected automatically by the intensity of the EMG signals. A decision tree and multistream hidden Markov models are utilized as decision-level fusion to get the final results. For sign language recognition (SLR), experimental results on the classification of 72 Chinese Sign Language (CSL) words demonstrate the complementary functionality of the ACC and EMG sensors and the effectiveness of our framework. Additionally, the recognition of 40 CSL sentences is implemented to evaluate our framework for continuous SLR. For gesture-based control, a real-time interactive system is built as a virtual Rubik's cube game using 18 kinds of hand gestures as control commands. While ten subjects play the game, the performance is also examined in user-specific and user-independent classification. Our proposed framework facilitates intelligent and natural control in gesture-based interaction.
Research in automatic analysis of sign language has largely focused on recognizing the lexical (or citation) form of sign gestures as they appear in continuous signing, and developing algorithms that … Research in automatic analysis of sign language has largely focused on recognizing the lexical (or citation) form of sign gestures as they appear in continuous signing, and developing algorithms that scale well to large vocabularies. However, successful recognition of lexical signs is not sufficient for a full understanding of sign language communication. Nonmanual signals and grammatical processes which result in systematic variations in sign appearance are integral aspects of this communication but have received comparatively little attention in the literature. In this survey, we examine data acquisition, feature extraction and classification methods employed for the analysis of sign language gestures. These are discussed with respect to issues such as modeling transitions between signs in continuous signing, modeling inflectional processes, signer independence, and adaptation. We further examine works that attempt to analyze nonmanual signals and discuss issues related to integrating these with (hand) sign gestures. We also discuss the overall progress toward a true test of sign recognition systems--dealing with natural signing by native signers. We suggest some future directions for this research and also point to contributions it can make to other fields of research. Web-based supplemental materials (appendicies) which contain several illustrative examples and videos of signing can be found at www.computer.org/publications/dlib.
Articulated hand-tracking systems have been widely used in virtual reality but are rarely deployed in consumer applications due to their price and complexity. In this paper, we propose an easy-to-use … Articulated hand-tracking systems have been widely used in virtual reality but are rarely deployed in consumer applications due to their price and complexity. In this paper, we propose an easy-to-use and inexpensive system that facilitates 3-D articulated user-input using the hands. Our approach uses a single camera to track a hand wearing an ordinary cloth glove that is imprinted with a custom pattern. The pattern is designed to simplify the pose estimation problem, allowing us to employ a nearest-neighbor approach to track hands at interactive rates. We describe several proof-of-concept applications enabled by our system that we hope will provide a foundation for new interactions in modeling, animation control and augmented reality.
A method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point … A method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Finally, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction.
This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background … This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application. In the training stage, after extracting the keypoints for every training image using the scale invariance feature transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multiclass SVM to build the training classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using our algorithm, then, the keypoints are extracted for every small image that contains the detected hand gesture only and fed into the cluster model to map them into a bag-of-words vector, which is finally fed into the multiclass SVM training classifier to recognize the hand gesture.
A new method is developed using the hidden Markov model (HMM) based technique. To handle nongesture patterns, we introduce the concept of a threshold model that calculates the likelihood threshold … A new method is developed using the hidden Markov model (HMM) based technique. To handle nongesture patterns, we introduce the concept of a threshold model that calculates the likelihood threshold of an input pattern and provides a confirmation mechanism for the provisionally matched gesture patterns. The threshold model is a weak model for all trained gestures in the sense that its likelihood is smaller than that of the dedicated gesture model for a given gesture. Consequently, the likelihood can be used as an adaptive threshold for selecting proper gesture model. It has, however, a large number of states and needs to be reduced because the threshold model is constructed by collecting the states of all gesture models in the system. To overcome this problem, the states with similar probability distributions are merged, utilizing the relative entropy measure. Experimental results show that the proposed method can successfully extract trained gestures from continuous hand motion with 93.14% reliability.
We apply Convolutional Networks (ConvNets) to the task of traffic sign classification as part of the GTSRB competition. ConvNets are biologically-inspired multi-stage architectures that automatically learn hierarchies of invariant features. … We apply Convolutional Networks (ConvNets) to the task of traffic sign classification as part of the GTSRB competition. ConvNets are biologically-inspired multi-stage architectures that automatically learn hierarchies of invariant features. While many popular vision approaches use hand-crafted features such as HOG or SIFT, ConvNets learn features at every level from data that are tuned to the task at hand. The traditional ConvNet architecture was modified by feeding 1 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">st</sup> stage features in addition to 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nd</sup> stage features to the classifier. The system yielded the 2nd-best accuracy of 98.97% during phase I of the competition (the best entry obtained 98.98%), above the human performance of 98.81%, using 32×32 color input images. Experiments conducted after phase 1 produced a new record of 99.17% by increasing the network capacity, and by using greyscale images instead of color. Interestingly, random features still yielded competitive results (97.33%).
The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease … The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.
Body posture and finger pointing are a natural modality for human-machine interaction, but first the system must know what it's seeing. Body posture and finger pointing are a natural modality for human-machine interaction, but first the system must know what it's seeing.
In many applications today user interaction is moving away from mouse and pens and is becoming pervasive and much more physical and tangible. New emerging interaction technologies allow developing and … In many applications today user interaction is moving away from mouse and pens and is becoming pervasive and much more physical and tangible. New emerging interaction technologies allow developing and experimenting with new interaction methods on the long way to providing intuitive human computer interaction. In this paper, we aim at recognizing gestures to interact with an application and present the design and evaluation of our sensor-based gesture recognition. As input device we employ the Wii-controller (Wiimote) which recently gained much attention world wide. We use the Wiimote's acceleration sensor independent of the gaming console for gesture recognition. The system allows the training of arbitrary gestures by users which can then be recalled for interacting with systems like photo browsing on a home TV. The developed library exploits Wii-sensor data and employs a hidden Markov model for training and recognizing user-chosen gestures. Our evaluation shows that we can already recognize gestures with a small number of training samples. In addition to the gesture recognition we also present our experiences with the Wii-controller and the implementation of the gesture recognition. The system forms the basis for our ongoing work on multimodal intuitive media browsing and are available to other researchers in the field.
Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of … Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions. Pfinder has been successfully used in a wide range of applications including wireless interfaces, video databases, and low-bandwidth coding.
Hand movement data acquisition is used in many engineering applications ranging from the analysis of gestures to the biomedical sciences. Glove-based systems represent one of the most important efforts aimed … Hand movement data acquisition is used in many engineering applications ranging from the analysis of gestures to the biomedical sciences. Glove-based systems represent one of the most important efforts aimed at acquiring hand movement data. While they have been around for over three decades, they keep attracting the interest of researchers from increasingly diverse fields. This paper surveys such glove systems and their applications. It also analyzes the characteristics of the devices, provides a road map of the evolution of the technology, and discusses limitations of current technology and trends at the frontiers of research. A foremost goal of this paper is to provide readers who are new to the area with a basis for understanding glove systems technology and how it can be applied, while offering specialists an updated picture of the breadth of applications in several engineering and biomedical sciences areas.
We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands. The first system observes … We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98 percent accuracy (97 percent with an unrestricted grammar). Both experiments use a 40-word lexicon.
The proliferation of accelerometers on consumer electronics has brought an opportunity for interaction based on gestures or physical manipulation of the devices. We present uWave, an efficient recognition algorithm for … The proliferation of accelerometers on consumer electronics has brought an opportunity for interaction based on gestures or physical manipulation of the devices. We present uWave, an efficient recognition algorithm for such interaction using a single three-axis accelerometer. Unlike statistical methods, uWave requires a single training sample for each gesture pattern and allows users to employ personalized gestures and physical manipulations. We evaluate uWave using a large gesture library with over 4000 samples collected from eight users over an elongated period of time for a gesture vocabulary with eight gesture patterns identified by a Nokia research. It shows that uWave achieves 98.6% accuracy, competitive with statistical methods that require significantly more training samples. Our evaluation data set is the largest and most extensive in published studies, to the best of our knowledge. We also present applications of uWave in gesture-based user authentication and interaction with three-dimensional mobile user interfaces using user created gestures.
With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information … With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.
This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem … This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple interacting objects.
Gesture recognition pertains to recognizing meaningful expressions of motion by a human, involving the hands, arms, face, head, and/or body. It is of utmost importance in designing an intelligent and … Gesture recognition pertains to recognizing meaningful expressions of motion by a human, involving the hands, arms, face, head, and/or body. It is of utmost importance in designing an intelligent and efficient human-computer interface. The applications of gesture recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. In this paper, we provide a survey on gesture recognition with particular emphasis on hand gestures and facial expressions. Applications involving hidden Markov models, particle filtering and condensation, finite-state machines, optical flow, skin color, and connectionist models are discussed in detail. Existing challenges and future research possibilities are also highlighted
The variables that help make a handwritten signature a unique human identifier also provide a unique digital signature in the form of a stream of latency periods between keystrokes. This … The variables that help make a handwritten signature a unique human identifier also provide a unique digital signature in the form of a stream of latency periods between keystrokes. This article describes a method of verifying the identity of a user based on such a digital signature, and reports results from trial usage of the system.
This paper presents Soli , a new, robust, high-resolution, low-power, miniature gesture sensing technology for human-computer interaction based on millimeter-wave radar. We describe a new approach to developing a radar-based … This paper presents Soli , a new, robust, high-resolution, low-power, miniature gesture sensing technology for human-computer interaction based on millimeter-wave radar. We describe a new approach to developing a radar-based sensor optimized for human-computer interaction, building the sensor architecture from the ground up with the inclusion of radar design principles, high temporal resolution gesture tracking, a hardware abstraction layer (HAL), a solid-state radar chip and system architecture, interaction models and gesture vocabularies, and gesture recognition. We demonstrate that Soli can be used for robust gesture recognition and can track gestures with sub-millimeter accuracy, running at over 10,000 frames per second on embedded hardware.
Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR … Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar. We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language. To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTH-PHOENIX-Weather 2014T1. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with >67K signs from a sign vocabulary of >1K and >99K words from a German vocabulary of >2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.
We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand … We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage.
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) … We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs and high prediction quality. MediaPipe Hands is open sourced at https://mediapipe.dev.
Hand gestures are a form of nonverbal communication that can be used in several fields such as communication between deaf-mute people, robot control, human–computer interaction (HCI), home automation and medical … Hand gestures are a form of nonverbal communication that can be used in several fields such as communication between deaf-mute people, robot control, human–computer interaction (HCI), home automation and medical applications. Research papers based on hand gestures have adopted many different techniques, including those based on instrumented sensor technology and computer vision. In other words, the hand sign can be classified under many headings, such as posture and gesture, as well as dynamic and static, or a hybrid of the two. This paper focuses on a review of the literature on hand gesture techniques and introduces their merits and limitations under different circumstances. In addition, it tabulates the performance of these methods, focusing on computer vision techniques that deal with the similarity and difference points, technique of hand segmentation used, classification algorithms and drawbacks, number and types of gestures, dataset used, detection range (distance) and type of camera used. This paper is a thorough general overview of hand gesture methods with a brief discussion of some possible applications.
Abstract Objective. In this article, we present data and methods for decoding speech articulations using surface electromyogram (EMG) signals. EMG-based speech neuroprostheses offer a promising approach for restoring audible speech … Abstract Objective. In this article, we present data and methods for decoding speech articulations using surface electromyogram (EMG) signals. EMG-based speech neuroprostheses offer a promising approach for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular diseases, stroke, or trauma-induced damage (e.g., from radiotherapy) to the speech articulators.&amp;#xD;&amp;#xD;Approach.&amp;#xD;To achieve this, we collect EMG signals from the face, jaw, and neck as subjects articulate speech, and we perform EMG-to-speech translation. &amp;#xD;&amp;#xD;Main results.&amp;#xD;Our findings reveal that the manifold of symmetric positive definite (SPD) matrices serves as a natural embedding space for EMG signals. Specifically, we provide an algebraic interpretation of the manifold-valued EMG data using linear transformations, and we analyze and quantify distribution shifts in EMG signals across individuals. &amp;#xD;&amp;#xD;Significance.&amp;#xD;Overall, our approach demonstrates significant potential for developing neural networks that are both data- and parameter-efficient—an important consideration for EMG-based systems, which face challenges in large-scale data collection and operate under limited computational resources on embedded devices.
Hand gesture recognition is an approach to comprehending human body language, applied in various fields such as human-computer interaction. However, some issues remain in edge blurring generated by complex backgrounds, … Hand gesture recognition is an approach to comprehending human body language, applied in various fields such as human-computer interaction. However, some issues remain in edge blurring generated by complex backgrounds, rotation inaccuracy induced by fast movement, and delay caused by computing cost. Recently, the emergence of deep learning has ameliorated these issues, convolution neural network (CNN) enhanced edge clarity, long-short term memory (LSTM) improved rotation accuracy, and attention mechanism optimized response time. In this context, this review starts with the deep learning models, specifically CNN, LSTM, and attention mechanisms, which are compared and discussed from the utilization rate of each, their contribution to improving accuracy or efficiency, and their role in the recognition stage, like feature extraction. Furthermore, to evaluate the performance of these deep learning models, the evaluation metrics, datasets, and ablation studies are analyzed and discussed. The choice of evaluation metrics and dataset is critical since different tasks require different evaluation parameters, and the model learns more patterns and features from diverse data. Therefore, the evaluation metrics are categorized into accuracy and efficiency. The datasets are analyzed from self-created to public datasets. The ablation study is summarized in four aspects: similar underlying models, integrating specific models, pre-processing, others. Finally, the existing research gaps and further research on accuracy, efficiency, application range, and environmental adaptation are discussed.
Accurate human pose estimation is essential for anti-cheating detection in unattended truck scale systems, where human intervention must be reliably identified under challenging conditions such as poor lighting and small … Accurate human pose estimation is essential for anti-cheating detection in unattended truck scale systems, where human intervention must be reliably identified under challenging conditions such as poor lighting and small target pixel areas. This paper proposes a human joint detection system tailored for truck scale scenarios. To enable efficient deployment, several lightweight structures are introduced, among which an innovative channel hourglass convolution module is designed. By employing a channel compression-recover strategy, the module effectively reduces computational overhead while preserving network depth, significantly outperforming traditional grouped convolution and residual compression structures. In addition, a hybrid attention mechanism based on depthwise separable convolution is constructed, integrating spatial and channel attention to guide the network in focusing on key features, thereby enhancing robustness against noise interference and complex backgrounds. Ablation studies validate the optimal insertion position of the attention mechanism. Experiments conducted on the MPII dataset show that the proposed system achieves improvements of 8.00% in percentage of correct keypoints (PCK) and 2.12% in mean absolute error (MAE), alongside a notable enhancement in inference frame rate. The proposed approach promotes computational efficiency, system autonomy, and operational sustainability, offering a viable solution for energy-efficient, intelligent transportation systems, and long-term automated supervision in logistics and freight environments.
Kareem Ahmed | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
This paper presents the design and implementation of a real-time sign language recognition system focused on detecting static hand gestures representing the English alphabets (A–Z). The project leverages computer vision … This paper presents the design and implementation of a real-time sign language recognition system focused on detecting static hand gestures representing the English alphabets (A–Z). The project leverages computer vision and machine learning to provide an accessible, low-cost tool for communication between hearing-impaired individuals and non-signers. The system captures webcam input, extracts hand landmarks using Google’s Mediapipe framework, and classifies gestures through a trained Random Forest model. A Streamlit-based user interface displays the detected letters and enables real-time sentence construction with editing controls. The model, trained on a custom dataset, achieved an accuracy of 99.71% in testing. This paper discusses the system’s methodology, architecture, dataset preparation, model training, and evaluation. The results highlight the potential of combining machine learning and vision-based solutions to support inclusive communication technologies. Key Words: Sign Language, Gesture Recognition, Mediapipe, Random Forest, Streamlit, Computer Vision, Accessibility, Human-Computer Interaction
This paper describes a real-time robotic hand control system usingMediaPipe Hand for gesture recognition and wireless control via Bluetooth. Thesystem provides three primary functionalities: individual finger control, real-timegesture simulation, and … This paper describes a real-time robotic hand control system usingMediaPipe Hand for gesture recognition and wireless control via Bluetooth. Thesystem provides three primary functionalities: individual finger control, real-timegesture simulation, and an interactive Rock-Paper-Scissors game. MediaPipeHand, configured for robust hand detection, extracts 21 3D landmarks. These areutilized directly, via a rule-based algorithm, to control individual fingerflexion/extension, and as input to a fine-tuned MobileNetV2-based CNN trainedon 3600 augmented images for Rock-Paper-Scissors gesture classification. AnHC-05 Bluetooth module (9600 bps baud rate) transmits commands to anArduino-based, 3D-printed robotic hand developed by 3D LIFE Maker JuNCollaboration. Experiments using a Logitech C920 webcam (30 FPS)demonstrate high gesture recognition accuracy (97.5% for Rock-Paper-Scissors,with an 8ms inference time) and an overall system response time under onesecond, ensuring responsive and engaging interaction. This research contributesto the development of more accessible and intuitive human-robot interfaces, withpotential benefits for assistive technologies and human-computer interaction.
Purpose This paper aims to propose a novel action recognition method for shuttlecock-kicking using wearable inertial sensors, focusing on improving recognition accuracy through the analysis of angular time-series features and … Purpose This paper aims to propose a novel action recognition method for shuttlecock-kicking using wearable inertial sensors, focusing on improving recognition accuracy through the analysis of angular time-series features and the application of deep learning models. Design/methodology/approach Skeletal data was collected using wearable inertial sensors, and time-series data of key skeletal points relevant to shuttlecock-kicking actions was extracted. An angular time-series feature analysis method was proposed to describe motion characteristics by analyzing changes in angles between key skeletal points. These features were used as input for classification models, including convolutional neural network (CNN), long short-term memory (LSTM) and support vector machine (SVM), whose performance was evaluated based on accuracy, precision, recall and F1 score. Findings The proposed CNN model, using the angular time-series recognition method (ATRM), achieved an average accuracy of 0.9681 and an F1 score of 96.99%, surpassing other input methods including accelerometer and gyroscope data. The CNN model clearly demonstrated the superior potential of combining angular time-series features for more accurate and stable recognition of shuttlecock-kicking actions better than the LSTM and SVM models. Practical implications The method provided will benefit the real-time sports virtual games and wearable technology applications. Originality/value This work proposed a novel ATRM for action recognition using wearable sensors. The method enhances recognition accuracy and efficiency, providing strong ability for real-time sports analysis and wearable technology applications.
Human Pose Estimation (HPE) has become one of the most relevant topics in computer vision research. This technology can be applied in various fields such as video surveillance, medical care, … Human Pose Estimation (HPE) has become one of the most relevant topics in computer vision research. This technology can be applied in various fields such as video surveillance, medical care, and sports motion analysis. Due to the increasing demand for HPE, many libraries for this technology have been developed in the last 20 years. Since 2017, many HPE algorithms based on skeletal model have been published and packaged into libraries for easy use by researchers. These libraries are important for researchers who want to integrate them into real-world applications for video surveillance, medical care, and sports motion analysis. This paper investigates the strengths and weaknesses of four popular HPE advanced human pose recognition libraries that can run on mobile devices: Lightweight OpenPose, PoseNet, MoveNet, and Blase Pose.
Gesture recognition plays a vital role in computer vision, especially for interpreting sign language and enabling human-computer interaction. Many existing methods struggle with challenges like heavy computational demands, difficulty in … Gesture recognition plays a vital role in computer vision, especially for interpreting sign language and enabling human-computer interaction. Many existing methods struggle with challenges like heavy computational demands, difficulty in understanding long-range relationships, sensitivity to background noise, and poor performance in varied environments. While CNNs excel at capturing local details, they often miss the bigger picture. Vision Transformers, on the other hand, are better at modeling global context but usually require significantly more computational resources, limiting their use in real-time systems. To tackle these issues, we propose a Hybrid Transformer-CNN model that combines the strengths of both architectures. Our approach begins with CNN layers that extract detailed local features from both the overall hand and specific hand regions. These CNN features are then refined by a Vision Transformer module, which captures long-range dependencies and global contextual information within the gesture. This integration allows the model to effectively recognize subtle hand movements while maintaining computational efficiency. Tested on the ASL Alphabet dataset, our model achieves a high accuracy of 99.97%, runs at 110 frames per second, and requires only 5.0 GFLOPs-much less than traditional Vision Transformer models, which need over twice the computational power. Central to this success is our feature fusion strategy using element-wise multiplication, which helps the model focus on important gesture details while suppressing background noise. Additionally, we employ advanced data augmentation techniques and a training approach incorporating contrastive learning and domain adaptation to boost robustness. Overall, this work offers a practical and powerful solution for gesture recognition, striking an optimal balance between accuracy, speed, and efficiency-an important step toward real-world applications.
Hand hygiene is paramount for public health, especially in critical sectors like healthcare and the food industry. Ensuring compliance with recommended hand washing gestures is vital, necessitating autonomous evaluation systems … Hand hygiene is paramount for public health, especially in critical sectors like healthcare and the food industry. Ensuring compliance with recommended hand washing gestures is vital, necessitating autonomous evaluation systems leveraging machine learning techniques. However, the scarcity of comprehensive datasets poses a significant challenge. This study addresses this issue by presenting an open synthetic hand washing dataset, created using 3D computer-generated imagery, comprising 96,000 frames (equivalent to 64 min of footage), encompassing eight gestures performed by four characters in four diverse environments. This synthetic dataset includes RGB images, depth/isolated depth images and hand mask images. Using this dataset, four neural network models, Inception-V3, Yolo-8n, Yolo-8n segmentation and PointNet, were trained for gesture classification. The models were subsequently evaluated on a large real-world hand washing dataset, demonstrating successful classification accuracies of 56.9% for Inception-V3, 76.3% for Yolo-8n and 79.3% for Yolo-8n segmentation. These findings underscore the effectiveness of synthetic data in training machine learning models for hand washing gesture recognition.
Pranav Kishor Irlapale | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
"Sign Language Detection" is a real-time system developed to bridge the communication gap between the deaf-mute community and the wider population by converting sign language gestures into text. The project … "Sign Language Detection" is a real-time system developed to bridge the communication gap between the deaf-mute community and the wider population by converting sign language gestures into text. The project leverages MediaPipe for accurate and efficient hand gesture tracking and integrates it with a lightweight TensorFlow model trained on datasets of Indian Sign Language (ISL) and International Sign Language (ASL) sourced from Kaggle. The system takes video input, detects and interprets hand gestures frame by frame, and translates them into meaningful text in real time. The frontend is built using HTML and CSS, with a backend powered by Flask for API integration and MongoDB for managing gesture data and user records. Designed to run efficiently even on low-resource systems, this project provides an accessible and scalable solution for enhancing communication for individuals with hearing and speech impairments.
Introdcution Parkinson's Disease (PD) is a progressive neurodegenerative disorder that primarily impacts motor function and is prevalent among older adults worldwide. Gait performance (such as speed, stride, step, and so … Introdcution Parkinson's Disease (PD) is a progressive neurodegenerative disorder that primarily impacts motor function and is prevalent among older adults worldwide. Gait performance (such as speed, stride, step, and so on) has been shown to play a significant role in diagnosis, treatment, and rehabilitation. Fortunately, advancements in computer science have provided serial ways to calculate gait-related parameters, offering a more accurate alternative to the complex and often imprecise assessments traditionally relied upon by trained professionals. However, most of the current methods depend on data preprocessing and feature engineering, often require domain knowledge and laborious human involvement, and require additional manual adjustments when dealing with new tasks. Methods To reduce the model's reliance on data preprocessing, feature engineering, and traversal rules, we employed the Spatial-Temporal Graph Convolutional Networks (ST-GCN) model. We also defined five distinct states within a complete gait cycle: standstill (S), left swing (L), double support (D), right swing (R), and turnaround (T). Using ST-GCN, we extracted spatial and temporal patterns from these five states directly from the data, thereby enhancing the accuracy of gait parameter calculation. Furthermore, to improve the interpretability of the ST-GCN model and increase its clinical relevance, we trained the model on data from both healthy individuals and PD patients. This allowed us to explore how the model's parameters (different ST-GCN Layers) could assist clinicians in understanding. Results The dataset used to evaluate the model in this paper includes motion data from 65 PD participants and 77 healthy control participants. Regarding the classification results from the 5 classifiers, ST-GCN achieved an average precision, recall, and F1-score of 93.48%, 93.21%, and 93.32%, outperforming both the Transformer-based and LSTM-based methods. Displaying the joints and edge weights from various layers of the ST-GCN, particularly when comparing data from healthy individuals and PD patients, enhances the model's feasibility and offers greater interpretability. This approach is more informative than relying on a purely black-box model. Conclusion This study demonstrated that the ST-GCN approach can effectively support accurate gait parameter assessment, assisting medical professionals in making diagnoses and reasonable rehabilitation plans for patients with PD.
El dibujo del natural de personas en movimiento exige una destreza avanzada en la ejecución de la síntesis gráfica y un planteamiento totalmente alejado del empleado en las poses fijas. … El dibujo del natural de personas en movimiento exige una destreza avanzada en la ejecución de la síntesis gráfica y un planteamiento totalmente alejado del empleado en las poses fijas. En este artículo se describe el desarrollo de un recurso online de libre acceso como complemento a la práctica del dibujo de movimiento, basado en el escaneo 3D y la animación de los modelos que posan en el Grado en Bellas Artes de la Universidad de La Laguna. Empleado como apoyo al dibujo del natural, el recurso digital ofrece un ensayo necesario para la posterior confrontación con un modelo real y casual. Con objeto de comprobar su aplicación, se realiza un diseño cuasi-experimental, con un muestreo por conveniencia. Seguidamente se evalúan los resultados antes y después de su uso y se añaden datos cualitativos mediante un cuestionario de satisfacción. Se concluye que el empleo del recurso digital desarrollado influye en la mejora del dibujo de la figura humana real en movimiento (sumando 1,47 puntos en general) y que el alumnado muestra una aceptación positiva del mismo en relación a la práctica y mejora del dibujo (3,98 sobre 5).
Accelerometers are nowadays included in almost any portable or mobile device, including smartphones, smartwatches, wrist-bands, and even smart rings. The data collected from them is therefore an ideal candidate to … Accelerometers are nowadays included in almost any portable or mobile device, including smartphones, smartwatches, wrist-bands, and even smart rings. The data collected from them is therefore an ideal candidate to tackle human motion recognition, as it can easily and unobtrusively be acquired. In this work we analyze the performance of a hand-gesture classification system implemented using LSTM neural networks on a resource-constrained microcontroller platform, which required trade-offs between network accuracy and resource utilization. Using a publicly available dataset, which includes data for 20 different hand gestures recorded from 10 subjects using a wrist-worn device with a 3-axial accelerometer, we achieved nearly 90.25% accuracy while running the model on an STM32L4-series microcontroller, with an inference time of 418 ms for 4 s sequences, corresponding to an average CPU usage of about 10% for the recognition task.
Anushka Sunil Shewale | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
his paper presents a comprehensive real-time American Sign Language (ASL) recognition system that leverages deep learning and computer vision technologies to bridge communication gaps for the deaf and hard of … his paper presents a comprehensive real-time American Sign Language (ASL) recognition system that leverages deep learning and computer vision technologies to bridge communication gaps for the deaf and hard of hearing community. The system integrates MediaPipe for efficient hand landmark detection, OpenCV for image processing, and a custom Convolutional Neural Network (CNN) architecture implemented in TensorFlow for gesture classification. The solution employs an augmented AtoZ_3.1 dataset enhanced with geometric transformations, color variations, and noise-based augmentations to improve model robustness. The CNN architecture incorporates multiple convolutional layers with batch normalization, max pooling, dropout regularization, and a Softmax classifier, achieving over 98% validation accuracy on the ASL alphabet recognition task. The system features a PyQt5-based graphical user interface that provides real-time gesture visualization, text-to-speech conversion, word suggestions, and customizable themes. Multi-threading implementation ensures smooth user experience while maintaining prediction accuracy through temporal smoothing algorithms. Despite challenges including lighting sensitivity and single-hand gesture limitations, the system demonstrates significant potential for assistive communication applications and human-computer interaction interfaces. Key Words: American Sign Language, Deep Learning, Computer Vision, Real-time Recognition, Convolutional Neural Networks, MediaPipe, Assistive Technology, Human-Computer Interaction
This paper introduces a novel approach to hand motion gesture recognition by integrating the Fourier transform with hypergraph convolutional networks (HGCNs). Traditional recognition methods often struggle to capture the complex … This paper introduces a novel approach to hand motion gesture recognition by integrating the Fourier transform with hypergraph convolutional networks (HGCNs). Traditional recognition methods often struggle to capture the complex spatiotemporal dynamics of hand gestures. HGCNs, which are capable of modeling intricate relationships among joints, are enhanced by Fourier transform to analyze gesture features in the frequency domain. A hypergraph is constructed to represent the interdependencies among hand joints, allowing for dynamic adjustments based on joint movements. Hypergraph convolution is applied to update node features, while the Fourier transform facilitates frequency-domain analysis. The T-Module, a multiscale temporal convolution module, aggregates features from multiple frames to capture gesture dynamics across different time scales. Experiments on the dynamic hypergraph (DHG14/28) and shape retrieval contest (SHREC’17) datasets demonstrate the effectiveness of the proposed method, achieving accuracies of 96.4% and 97.6%, respectively, and outperforming traditional gesture recognition algorithms. Ablation studies further validate the contributions of each component in enhancing recognition performance.
Rather than using speech to communicate with one another, the deaf and dumb use a set of signs known as “sign language”. Yet, utilizing signs to interact with this society … Rather than using speech to communicate with one another, the deaf and dumb use a set of signs known as “sign language”. Yet, utilizing signs to interact with this society is too difficult for non-sign language speakers. To facilitate communication for the deaf public, an application that can identify sign language motions must be developed. Regarding its importance, there are approaches with differing degrees of accuracy for recognizing American Sign Language ASL. The study aims to enhance the accuracy of current ASL identification approaches by putting forward a deep-learning model. A CNN was developed and trained to correctly recognize hand gestures that describe the ASL letters (A-Z). The proposed model performs exceptionally well, attaining high accuracy on the dataset, with a test accuracy of 99.97%. The model is a possible tool for practical applications in assistive technology for the hearing impaired since the results show that it can distinguish between distinct ASL hand signs.
Purpose This study aims to achieve precise hand rotation angle regression using an economical RGB camera and apply this technology for the teleoperation of Elite robot in tasks requiring posture … Purpose This study aims to achieve precise hand rotation angle regression using an economical RGB camera and apply this technology for the teleoperation of Elite robot in tasks requiring posture adjustment. Design/methodology/approach Leveraging the benefits of affordable RGB cameras while preserving the natural movement of human hands in teleoperation, the authors propose a novel regression model. It features an innovative hand distance metric and a dual-stream convolutional architecture for extracting global visual features. These are integrated using a weighted fusion structure to enhance angle regression accuracy. Findings Through ablation studies and teleoperation application experiments, the author demonstrated that distance metrics based on the middle finger, coupled with ResNet-18 within our dual-stream model, significantly improve performance. Incorporating a weighted fusion structure achieved a minimal Mean Absolute Error of 1.53° and a maximum accuracy of 75.00% within a 2° error threshold. Furthermore, this model is suitable for teleoperating Elite in precision tasks like spooning and pouring sugar, demonstrating the effectiveness of the proposed method. Originality/value This paper presents an advanced model for hand rotation angle regression using an RGB camera, combining global visual and local geometric features. Through a weighted fusion strategy, regression precision is significantly improved. To the best of the authors’ knowledge, this is the first instance of such high-accuracy regression achieved with a low-cost camera, offering a viable and innovative solution for teleoperation systems with limited budgets that also need to maintain human-like habits, which lays a data foundation for the application of Large Models in robotics.
| International Research Journal of Modernization in Engineering Technology and Science
Paavani Jain | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract - The absence of standardized and easily available technology solutions for people with disabilities—especially those who use Indian Sign Language (ISL) to access vital services like healthcare—means that there … Abstract - The absence of standardized and easily available technology solutions for people with disabilities—especially those who use Indian Sign Language (ISL) to access vital services like healthcare—means that there are still significant communication hurdles in India. Unlike American Sign Language (ASL), which is mostly one-handed, ISL relies on intricate two-handed motions, which creates unique difficulties for software-based interpretation systems. The lack of extensive, standardized ISL datasets, which are essential for developing precise machine learning and gesture recognition models, exacerbates these difficulties even further. The lack of a complete ISL-based solution still prevents ISL users from accessing essential services like healthcare, even with improvements in sign language recognition technology. Although several platforms provide sign language translation, the majority are not prepared to deal with the particular needs of ISL. In addition to investigating recent developments in ISL translation, gesture recognition, and letter recognition, this project seeks to create an ISL communication system especially suited for hospital situations. The foundation for improved ISL accessibility in the healthcare industry and beyond will be laid by investigating fundamental techniques including deep learning, machine learning, and real-time processing. Key Words: Indian Sign Language (ISL), gesture recognition, real-time translation, accessibility, deep learning.
Shahab Fatima , M. Ali | International jounal of information technology and computer engineering.
Effective communication is a vital aspect of humaninteraction, yet individuals with speech and hearingdisabilities frequently encounter barriers in conveyingtheir thoughts or understanding others. Conventionalsolutions such as sign language interpreters or … Effective communication is a vital aspect of humaninteraction, yet individuals with speech and hearingdisabilities frequently encounter barriers in conveyingtheir thoughts or understanding others. Conventionalsolutions such as sign language interpreters or visualaids are not always accessible, which can result insocial isolation and reduced opportunities forparticipation. To address this issue, Allie – AI Voiceand Sign Assistant has been developed as a smartAndroid-based mobile application that leveragesartificial intelligence (AI) and mobile technology toenable seamless two-way communication.Allie offers two key features: (1) Voice/Text to SignTranslation, where spoken or typed input is translatedinto sign language using animated GIFs for commonlyused words or phrases (e.g., “Hello”, “Thank you”)and static alphabet images (A–Z) for letter-by-letterexpression; and (2) Sign to Text/Voice Recognition,which utilizes MediaPipe’s HandLandmarker todetect hand landmarks and classifies them through acustom-trained TensorFlow Lite (TFLite) model. Thisenables users to see the recognized gesture as text andperform context-based actions such as launching apps(e.g., WhatsApp, YouTube, Spotify) through specificsigns.The app is designed for flexibility and usability in bothonline and offline environments, with sign mediastored in the res/drawable directory and optionally incloud storage using Firebase or Supabase. The customgesture classification model is built using landmarkdata collected through a dedicated in-app feature thatrecords hand movements into CSV format. The app’sinterface is built using Kotlin, incorporating JetpackCompose and ViewBinding to ensure a responsive andaccessible user experience.This paper explores the app’s architecture,development methodology, technological choices, andimplementation details. It highlights how machinelearning and computer vision contribute to buildinginclusive communication tools for differently-abledindividuals. The target users include not only thosewith speech or hearing impairments but also theirfamilies, educators, and learners of sign language.Future enhancements include integration of text-tospeechfunctionality for gesture outputs, animatedavatars for real-time sign rendering, and a built-inchatbot for conversational assistance. By combiningAI and mobile innovation, Allie represents a stepforward in bridging communication gaps andfostering inclusivity through technology.
This research presents an innovative real-time method for detecting leg postural abnormalities using deep learning techniques and smartphone sensors. The objectives are to: (1) develop a smartphone-based system for real-time … This research presents an innovative real-time method for detecting leg postural abnormalities using deep learning techniques and smartphone sensors. The objectives are to: (1) develop a smartphone-based system for real-time classification of leg postures using accelerometer and gyroscope data, (2) evaluate the effectiveness of three deep learning models DNN, CNN, and CNN-LSTM in identifying spatial and temporal features, and (3) offer a low-cost, objective alternative to traditional assessment methods by addressing issues such as observer inconsistency and computational complexity. Accelerometer and gyroscope data from smartphones were used to develop a system that classified four leg postures: Pronation, Supination, Normal, and Postural Sway. Participants from various age groups carried a smartphone in their left pocket while standing and walking for 10, 20, and 30 seconds. This process produced a dataset of 29,823 records, which were verified and labeled by medical professionals based on observed postural characteristics. The CNN-LSTM model achieved the highest accuracy (96.4%) with strong class differentiation, demonstrating its effectiveness in capturing temporal dependencies. All three models were employed for unknown instances, and a majority voting approach was used for final classification. This proposed smartphone-based assessment system addresses limitations of traditional methods, such as inconsistencies due to subjective visual evaluations. This approach supports applications where leg posture is critical, such as in military, sports assessments, and disability certification, by offering an objective and accessible solution. Unlike video-based methods, it leverages widely available mobile technology, offering a low-computation, tamper-proof, and nonintrusive real-time surveillance system. Designed for automated and transparent evaluation, it has the potential to enhance the integrity of physical disability certifications.
Human communication generally relied on speech. However, this was not applicable to the deaf people, who depended on sign language for daily interactions. Unfortunately, not everyone had the ability to … Human communication generally relied on speech. However, this was not applicable to the deaf people, who depended on sign language for daily interactions. Unfortunately, not everyone had the ability to understand sign language. In higher education environments, the lack of individuals proficient in sign language often created inequality in the learning process for deaf students. This limitation could be addressed by fostering a more inclusive environment, one of which was through the implementation of a sign language translation system. Therefore, this study aimed to develop a machine learning model capable of detecting and translating Indonesian Sign Language (BISINDO) alphabet gestures. The model was built using the Xception transfer learning method from Convolutional Neural Networks (CNN). The dataset consisted of 26 BISINDO alphabet gestures with a total of 650 images. The model was evaluated using K-Fold cross-validation and achieved an F1-score of 94% during testing.
This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The … This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The proposed approach is based on unsupervised clustering of lip movement trajectories to identify consistent articulatory patterns across different time profiles. The methodology is not limited to using a single model. Still, it includes the exploration of varying cluster configurations and an assessment of their robustness, as well as a detailed analysis of the correspondence between individual alphabet letters and specific clusters. In contrast to direct classification based on raw visual features, this approach pre-tests clustered representations using a model-based assessment of their discriminative potential. This structured approach enhances the interpretability and robustness of the extracted features, highlighting the importance of lip dynamics as an auxiliary modality in multimodal sign language recognition. The obtained results demonstrate that trajectory clustering can serve as a practical method for generating features, providing more accurate and context-sensitive gesture interpretation.
Arun D. Mahindrakar | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract - Deaf and hard-of-hearing people communicate with each other and their society by using sign language. As this is a very natural method for pupils to stay in touch … Abstract - Deaf and hard-of-hearing people communicate with each other and their society by using sign language. As this is a very natural method for pupils to stay in touch with computers, many academics are working on it to make it less complex and more convenient for use. So the main objective of gesture recognition research is to make systems which can interact and communicate while understanding human gestures and use them. In simple words to communicate information. Fast and extremely accurate hand detection and realtime hand gestures. Identification should be possible with vision-based hand gesture circumstances and interfaces. Learning and knowing sign movements and gestures is the kick start in making words and sentences for computer assisted sign language interpretation. Both dynamic and static sign actions are open and available. Both ways of gesture recognition are crucial to human culture, even if static gesture recognition is much easier than dynamic gesture recognition. When a human enters the value of alphabets and numerical value as input, the system immediately displays or outputs the appropriate recognised character shows the gesture on the monitor screen. In the following, research projects that have led to a proper system that uses convolutional neural networks to identify handwriting on the basis of the depth pictures and Hand Languages (Brain lipi) the collects. Keywords- Convolutional neural network,Text recognition ,Convert to image,StackGAN,simultaneous Tracking text and converting to image, gesture recognition,display output as sign gesture,training machine to(A-z to 1-0).
Rania Binth Zubair , Sania Faheem , Mohammad Hashmi | International jounal of information technology and computer engineering.
The evolution of human-computer interaction (HCI) has led to the emergence of more natural, intuitive methods of control, especially in the domain of gaming. Traditional input devices such as keyboards, … The evolution of human-computer interaction (HCI) has led to the emergence of more natural, intuitive methods of control, especially in the domain of gaming. Traditional input devices such as keyboards, mice, and joysticks, though reliable, can limit user immersion and interactivity. This project presents a novel hand gesture-based virtual driving simulator that utilizes Open-CV to detect and interpret real-time hand gestures, thereby offering an innovative and engaging alternative to conventional control mechanisms in driving games. The proposed system employs a standard webcam to capture live video input of the user’s hand gestures. Using a combination of computer vision techniques such as contour analysis and convex hull algorithms the system accurately identifies gestures corresponding to steering, acceleration, and braking. These recognized gestures are then mapped to simulated keyboard events, enabling control of a wide range of commercially available driving games. By eliminating the need for additional hardware, the system offers an affordable and accessible solution for enhancing user experience. Experimental results demonstrate the system’s efficiency, achieving over 90% accuracy in gesture recognition and maintaining a response latency under 100 milliseconds, thereby ensuring smooth gameplay. The system was successfully tested with multiple racing simulators, receiving positive feedback from users who appreciated its intuitive and immersive nature. This project not only showcases the potential of vision-based gesture recognition in gaming but also sets a foundation for future enhancements involving complex gesture sets and virtual reality (VR) integration.
C. Srinivasa Kumar | INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract-The Two-way Sign Language Translator is a Python-based desktop application designed to bridge communication gaps for individuals who are deaf or hard of hearing by enabling bidirectional translation between text … Abstract-The Two-way Sign Language Translator is a Python-based desktop application designed to bridge communication gaps for individuals who are deaf or hard of hearing by enabling bidirectional translation between text and sign language. The system supports two primary functionalities: text-to-sign, which converts typed text into animated sign language gestures, and sign-to-text, which recognizes hand gestures captured via a webcam using a convolutional neural network (CNN). Developed with Tkinter for the graphical user interface, OpenCV for real-time video processing, Keras for machine learning, and Pytesseract for optical character recognition (OCR), the system integrates advanced computer vision and image processing techniques. Keywords—Bidirectional translation, CNN, graphical user interface, computer vision, Tkinter, image processing techniques.
Shobha S. Raskar | International Journal for Research in Applied Science and Engineering Technology
Sign language serves as a vital means of communication for individuals who are deaf or speech-impaired. Despite its growing use, a communication barrier still exists between signers and non-signers. Recent … Sign language serves as a vital means of communication for individuals who are deaf or speech-impaired. Despite its growing use, a communication barrier still exists between signers and non-signers. Recent advances in computer vision and deep learning have enabled the development of gesture recognition systems that can bridge this gap. In this research, we propose a real-time sign language recognition system that uses transfer learning with MobileNetV2 and a custom classification head. The system captures American Sign Language (ASL) gestures through a webcam and converts them into corresponding text in real time. The model is trained on a preprocessed ASL dataset and achieves high accuracy using efficient neural architectures, data augmentation techniques, and optimized training workflows. The system includes gamified learning levels—ranging from easy to complex—that provide feedback, scoring, and progress tracking to promote consistent user engagement and structured skill development
Abstract - Sign language is a vital communication tool for individuals who are deaf or hard of hearing, yet it remains largely inaccessible to the wider population. This project aims … Abstract - Sign language is a vital communication tool for individuals who are deaf or hard of hearing, yet it remains largely inaccessible to the wider population. This project aims to address this barrier by developing a sign language recognition system that converts hand gestures into text, followed by text-to-speech (TTS) conversion. The system utilizes Convolutional Neural Networks (CNNs) to recognize static hand gestures and translate them into corresponding textual representations. The text is then processed by a TTS engine, which generates spoken language, making it comprehensible to individuals who are not familiar with sign language. The approach leverages deep learning techniques to improve gesture recognition accuracy, particularly in diverse real-world scenarios. By training the CNN on a comprehensive dataset of sign language gestures, the model is able to learn important features such as hand shape, orientation, and motion, which are critical for identifying specific signs. Keywords: Sign Language Recognition-Gesture to Text-Text to Speech (TTS)-Convolutional Neural Networks (CNN)-Deep Learning-Hand Gesture Recognition-Assistive Technology-Real-Time Translation-Speech Synthesis-Accessibility-Inclusivity-Communication Aid-Deaf and Hard of Hearing-Human-Computer Interaction-Static Hand Gestures
Yang Zou , Yanguang Wan , Y. Zhang +6 more | Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies
In the realm of VR/MR interactions, gestures serve as a critical bridge between users and interfaces, with custom gestures enhancing creativity and providing a personalized immersive experience. We introduce a … In the realm of VR/MR interactions, gestures serve as a critical bridge between users and interfaces, with custom gestures enhancing creativity and providing a personalized immersive experience. We introduce a novel gesture definition and recognition framework that allows users to customize a wide array of gestures by demonstrating them just three times. A major challenge lies in effectively representing gestures computationally. To address this, we have pre-trained a hand posture representation model using a Vector Quantized Variational Autoencoder (VQ-VAE) with a codebook of adaptive size, allowing hand postures defined by 23 joint positions of the hand to be projected into a latent space. In this space, different postures are formed into clusters, and a testing posture can be assigned to a cluster by a specific distance metric. The dynamic gestures are then represented as sequences of discrete hand postures and wrist positions. Employing a straightforward sequence matching algorithm, our framework achieves highly efficient recognition with minimal computational demands. We evaluated this system through a user study that includes 16 pre-defined gestures and 106 user-defined gestures. The results confirm that our system can provide robust real-time gesture recognition and effectively supports the customization of gestures according to user preferences. Our approach surpasses previous methods by enhancing gesture diversity and reducing constraints on gesture customization. Project page: https://iscas3dv.github.io/GestureBuilder/.
Xie Zhang , Chengxiao Li , Chenshu Wu | Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies
This paper presents the design and implementation of TAPOR, a privacy-preserving, non-contact, and fully passive sensing system for accurate and robust 3D hand pose reconstruction for around-device interaction using a … This paper presents the design and implementation of TAPOR, a privacy-preserving, non-contact, and fully passive sensing system for accurate and robust 3D hand pose reconstruction for around-device interaction using a single low-cost thermal array sensor. Thermal sensing using inexpensive and miniature thermal arrays emerges with an excellent utility-privacy balance, offering an imaging resolution significantly lower than cameras but far superior to RF signals like radar or WiFi. The design of TAPOR, however, is challenging, mainly because the captured temperature maps are low-resolution and textureless. To overcome the challenges, we investigate thermo-depth and thermo-pose properties, proposing a novel physics-inspired neural network that learns effective 3D spatial representations of potential hand poses. We then formulate the 3D pose reconstruction problem as a distinct retrieval task, enabling accurate hand pose determination from the input temperature map. To deploy TAPOR on IoT devices, we introduce an effective heterogeneous knowledge distillation method, reducing computation by 377×. TAPOR is fully implemented and tested in real-world scenarios, showing remarkable performance, supported by four gesture control and finger tracking case studies. We envision TAPOR to be a ubiquitous interface for around-device control and have open-sourced it at https://github.com/aiot-lab/TAPOR.
Gesture recognition technology is a pivotal element in human-computer interaction, enabling users to communicate with machines in a natural and intuitive manner. This paper introduces a novel approach to gesture … Gesture recognition technology is a pivotal element in human-computer interaction, enabling users to communicate with machines in a natural and intuitive manner. This paper introduces a novel approach to gesture recognition that enhances accuracy and robustness by integrating multiscale feature extraction and spatial attention mechanisms. Specifically, we have developed a multiscale feature extraction module inspired by the Inception architecture, which captures comprehensive features across various scales, providing a more holistic feature representation. Additionally, We incorporate a spatial attention mechanism that focuses on image regions most relevant to the current gesture, thereby improving the discriminative power of the features. Extensive experiments conducted on multiple benchmark datasets demonstrate that our method significantly outperforms existing gesture recognition techniques in terms of accuracy.