I'm working on a project with text to voice features (besides image recognition, voice input and estim features). For the text to speech (TTS) approach I'm using the Web Speech API. This API is well documented and not complex to implement (e.g. https://developer.mozilla.org/en-US/doc ... Speech_API for a starter).
If you don't want to use TTS 'on the fly' and instead convert text into mp3-files, there are several free options availiable (e.G. http://www.fromtexttospeech.com).
I'm always interested to share ideas about kinky software. Feel free to PM me.