Implementation of Music Emotion Classification using Deep Learning

Qing Xiang Sow; Swee Kheng Eng

doi:10.58915/ijaris.v1i1.2258

Authors

Qing Xiang Sow Universiti Malaysia Perlis
Swee Kheng Eng Centre of Excellence for Intelligent Robotics and Autonomous System (CIRAS), Universiti Malaysia Perlis, 02600 Arau, Perlis, Malaysia

DOI:

https://doi.org/10.58915/ijaris.v1i1.2258

Keywords:

Music Emotion Classification, Deep Learning, CNN, CNN-LSTM, CNN-GRU, MFCC Extraction, Spectral Contrast

Abstract

Music plays a crucial role in shaping emotions and experiences, making its classification an important area of research with applications in therapy, recommendation systems, and affective computing. This study develops a deep learning-based system to classify music into three emotional categories: "Angry," "Happy," and "Sad." The dataset, consisting of 22 audio files collected from YouTube, was manually labelled, segmented into 30-second clips, and augmented using pitch shifting and time stretching to enhance diversity. Features were extracted using Mel-Frequency Cepstral Coefficients (MFCC) and spectral contrast to analyse the harmonic and timbral characteristics of the audio. Three deep learning models, CNN, CNN-LSTM, and CNN-GRU, were evaluated. CNN-GRU achieved the highest weighted accuracy of 99.10%, demonstrating superior performance. Future work includes adding more emotion categories, diversifying the dataset, exploring advanced architectures like transformers, optimising hyperparameters, implementing real-time applications, and conducting user studies to assess effectiveness. This research successfully developed and evaluated a music emotion classification system, contributing to advancements in the field.