Vocal at 1.5 kbps

Audio Reconstruction from Discrete Tokens using different models and configurations.

Ground Truth
EnCodec
MBD
RFWave
RFWave + STFT
RFWave + CFG2
RFWave + CFG2 + STFT