Mr. Ashwin A/L Govindha Rajan, Prof. Ts. Dr. Tan Shing Chiang
Description of Invention
This project focuses on developing a deep learning framework for emotion recognition using fused audio and video data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The framework extracts features from audio using Mel Frequency Cepstral Coefficients (MFCC) and from video using ResNet or EfficientNet. These features are combined, and models such as AlexNet, 2DCNN, and DenseNet are trained and evaluated. The emotion recognition system has the potential use in real world applications such as customer service.