Vocal at 1.5 kbps
Audio Reconstruction from Discrete Tokens using different models and configurations.
+CFG2
means classifier-free guidance with a guidance coefficient of 2.0
+STFT
means applied STFT loss
Ground Truth
EnCodec
MBD
RFWave
RFWave + STFT
RFWave + CFG2
RFWave + CFG2 + STFT
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
not support
Back to Home