C022

EMOTION RECOGNITION FROM AUDIO AND VIDEO USING DEEP LEARNING

Mr. Ashwin A/L Govindha Rajan, Prof. Ts. Dr. Tan Shing Chiang

AFFILIATION
Faculty of Information Science & Technology, Multimedia University

Description of Invention

This project focuses on developing a deep learning framework for emotion recognition using fused audio and video data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The framework extracts features from audio using Mel Frequency Cepstral Coefficients (MFCC) and from video using ResNet or EfficientNet. These features are combined, and models such as AlexNet, 2DCNN, and DenseNet are trained and evaluated. The emotion recognition system has the potential use in real world applications such as customer service.