Skip to main content

Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization

·1 min

Read the full blog post about this research

Authors: A. Koh, X. Fuzhao, C. E. Siong

Published in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

DOI: 10.1109/ICASSP43922.2022.9747676

Abstract #

This paper presents a novel approach to automated audio captioning that leverages transfer learning and introduces reconstruction latent space similarity regularization. The method improves caption quality by learning robust audio representations through a combination of transfer learning and latent space constraints.

Key Contributions #

  • Transfer learning framework for audio captioning
  • Novel reconstruction latent space similarity regularization technique
  • Improved audio representation learning
  • Enhanced caption quality and coherence

Technologies & Methods #

  • Transfer learning from pre-trained models
  • Encoder-decoder architectures
  • Latent space regularization
  • Reconstruction-based learning
  • Audio feature extraction and processing

Research Impact #

This work was the first doctoral paper from the author’s PhD research, introducing innovative regularization techniques that improve the quality of automatically generated audio captions.

Citation #

A. Koh, X. Fuzhao and C. E. Siong, "Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization,"
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2022, doi: 10.1109/ICASSP43922.2022.9747676.