ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms

Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi

We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evaluate generative image models to the audio domain. FAD is validated using a wide variety of artificial distortions and is compared to the signal based metrics signal to distortion ratio (SDR), cosine distance, and magnitude L2 distance. We show that, with a correlation coefficient of 0.52, FAD correlates more closely with human perception than either SDR, cosine distance or magnitude L2 distance, with correlation coefficients of 0.39, -0.15 and -0.01 respectively.


doi: 10.21437/Interspeech.2019-2219

Cite as: Kilgour, K., Zuluaga, M., Roblek, D., Sharifi, M. (2019) Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms. Proc. Interspeech 2019, 2350-2354, doi: 10.21437/Interspeech.2019-2219

@inproceedings{kilgour19_interspeech,
  author={Kevin Kilgour and Mauricio Zuluaga and Dominik Roblek and Matthew Sharifi},
  title={{Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2350--2354},
  doi={10.21437/Interspeech.2019-2219}
}