Creative Text-to-Audio Generation via Synthesizer Programming

Singh*, N., Cherep*, M., & Shand, J. (2023, December). Creative Text-to-Audio Generation via Synthesizer Programming. In NeurIPS Machine Learning for Audio Workshop


Sound designers have long harnessed the power of abstraction to distill and highlight the semantic essence of real-world auditory phenomena, akin to how simple sketches can vividly convey visual concepts. However, current neural audio synthesis methods lean heavily towards capturing acoustic realism. We introduce an open-source novel method centered on meaningful abstraction. Our approach takes a text prompt and iteratively refines the parameters of a virtual modular synthesizer to produce sounds with high semantic alignment, as predicted by a pretrained audio-language model. Our results underscore the distinctiveness of our method compared with both real recordings and state-of-the-art generative models. 

Related Content