We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evaluate generative image models to the audio domain. FAD is validated using a wide variety of artificial distortions and is compared to the signal based metrics signal to distortion ratio (SDR), cosine distance, and magnitude L2 distance. We show that, with a correlation coefficient of 0.52, FAD correlates more closely with human perception than either SDR, cosine distance or magnitude L2 distance, with correlation coefficients of 0.39, -0.15 and -0.01 respectively.
Cite as: Kilgour, K., Zuluaga, M., Roblek, D., Sharifi, M. (2019) Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms. Proc. Interspeech 2019, 2350-2354, doi: 10.21437/Interspeech.2019-2219
@inproceedings{kilgour19_interspeech, author={Kevin Kilgour and Mauricio Zuluaga and Dominik Roblek and Matthew Sharifi}, title={{Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={2350--2354}, doi={10.21437/Interspeech.2019-2219} }