Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization

Read the full blog post about this research

Authors: A. Koh, X. Fuzhao, C. E. Siong

Published in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

DOI: 10.1109/ICASSP43922.2022.9747676

Abstract #

This paper presents a novel approach to automated audio captioning that leverages transfer learning and introduces reconstruction latent space similarity regularization. The method improves caption quality by learning robust audio representations through a combination of transfer learning and latent space constraints.

Key Contributions #

Transfer learning framework for audio captioning
Novel reconstruction latent space similarity regularization technique
Improved audio representation learning
Enhanced caption quality and coherence

Technologies & Methods #

Transfer learning from pre-trained models
Encoder-decoder architectures
Latent space regularization
Reconstruction-based learning
Audio feature extraction and processing

Research Impact #

This work was the first doctoral paper from the author’s PhD research, introducing innovative regularization techniques that improve the quality of automatically generated audio captions.

Citation #

A. Koh, X. Fuzhao and C. E. Siong, "Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization,"
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2022, doi: 10.1109/ICASSP43922.2022.9747676.