Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization
·1 min
Read the full blog post about this research
Authors: A. Koh, X. Fuzhao, C. E. Siong
Published in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/ICASSP43922.2022.9747676
Abstract #
This paper presents a novel approach to automated audio captioning that leverages transfer learning and introduces reconstruction latent space similarity regularization. The method improves caption quality by learning robust audio representations through a combination of transfer learning and latent space constraints.
Key Contributions #
- Transfer learning framework for audio captioning
- Novel reconstruction latent space similarity regularization technique
- Improved audio representation learning
- Enhanced caption quality and coherence
Technologies & Methods #
- Transfer learning from pre-trained models
- Encoder-decoder architectures
- Latent space regularization
- Reconstruction-based learning
- Audio feature extraction and processing
Research Impact #
This work was the first doctoral paper from the author’s PhD research, introducing innovative regularization techniques that improve the quality of automatically generated audio captions.
Citation #
A. Koh, X. Fuzhao and C. E. Siong, "Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization,"
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2022, doi: 10.1109/ICASSP43922.2022.9747676.