Speech synthesis

Deepvoice3 PyTorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models.

WaveNet vocoder - Can generate high quality raw speech samples conditioned on linguistic or acoustic features.

Papercup - Translate your content into other languages with a voice that sounds like yours.

WaveNet implementation in Keras

nv-wavenet - CUDA reference implementation of autoregressive WaveNet inference.

PyTorch implementation of Tacotron speech synthesis model

Yet another WaveNet implementation in PyTorch

Flowtron - Auto-regressive flow-based generative network for text to speech synthesis.

A highly efficient, real-time text-to-speech system deployed on CPUs (2020) (HN)

Sonatic - Emotionally Expressive Text to Speech.

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare? (2020)

DiffWave - Fast, high-quality neural vocoder and waveform synthesizer.

Voice Conversion with Non-Parallel Data

Speech Synthesis Papers

VoiceFilter - Unofficial PyTorch implementation of Google AI's VoiceFilter system. (Web)

ForwardTacotron - Generating speech in a single forward pass without any attention. (Web)

HiFi-GAN - Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.

Parakeet - Text-to-speech toolKIT (supporting WaveFlow, ClariNet, WaveNet, Deep Voice 3, Transformer TTS and FastSpeech).

pyttsx3 - Offline Text To Speech synthesis for python.

SOVA TTS - Speech syntthesis solution based on Tacotron 2 architecture.

eSpeak NG - Open source speech synthesizer that supports more than hundred languages and accents.

PRiSM SampleRNN - Neural sound synthesis with TensorFlow 2.

Flite - Small fast portable speech synthesis system.

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (2020) (Code) (Code)

Neural Granular Sound Synthesis (Code)

CLEESE - Combinatorial Expressive Speech Engine.

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (2021) (Code)

A Survey on Neural Speech Synthesis (2021) (Code)

Binaural Speech Synthesis - Code to train a mono-to-binaural neural sound renderer.

NN-SVS - Neural network-based singing voice synthesis library for research.

Larynx - End to end text to speech system using gruut and onnx, 50 voices, 9 languages.

WellSaid Labs - Voice Narration. Simplified.

Neural Wave shaping Synthesis - Efficient neural audio synthesis in the waveform domain. (Article)

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example (Code)

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis (2020) (Code)

EdiTTS: Score-based Editing for Controllable Text-to-Speech

PortaSpeech: Portable and High-Quality Generative Text-to-Speech (2021) (Code)

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations (2021) (Code)

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge (2021) (Code)

Grail-rs - Rust speech synth.

RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (2021) (Code)

WaveFlow: A Compact Flow-based Model for Raw Audio (2020) (Code)

VoiceFixer - Framework for general speech restoration.

TTS-RS - High-level Text-To-Speech (TTS) interface supporting various backends.

Speech synthesis using AVSpeechSynthesizer (2021)

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (2021) (Code)

TTS - Library for advanced Text-to-Speech generation. (Web) (HN)

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

SubSync - Subtitle Speech Synchronizer. (Overview) (HN)

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation (2021) (Code)

NATSpeech - Non-Autoregressive Text-to-Speech Framework.

VocBench: A Neural Vocoder Benchmark for Speech Synthesis (2021) (Code)

TransformerTTS - Text-to-Speech Transformer in TensorFlow 2.

Awesome Speech Recognition Speech Synthesis Papers

Neural Instrument Cloning from very few samples (2022) (Code)

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (2021) (Code)

IMS Toucan - Toolkit to train state-of-the-art Speech Synthesis models.

BDDM: Bilateral Denoising Diffusion Models for Fast and High-quality Speech Synthesis (2022)

Deep Learning for Emotional Text-to-speech - Summary on our attempts at using Deep Learning approaches for Emotional Text to Speech.

Nix-TTS - Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation.

xVA Synth - Machine learning based speech synthesis Electron app, with voices from specific characters from video games.

Bandwidth Extension is All You Need (2021) (Code)

TorToiSe - Multi-voice TTS system trained with an emphasis on quality.

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech (2020) (Code)

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation (2021) (Code)

TikTok TTS - Generate the funny TiKTok lady voice (& more) in your browser. (Code)

TikTok Text-to-speech API - Simple Python script to interact with the TikTok TTS API.

Unreal Speech - Text-to-Speech API. Better & 8x Cheaper than AWS.

15.ai - Natural TTS with minimal viable data. (HN)

JDC-PitchExtractor - Deep Neural Pitch Extractor for Voice Conversion and TTS Training.

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech (2021) (Code)

Publicly Available Emotional Speech Dataset (ESD) for Speech Synthesis and Voice Conversion

Mimic 3 - Fast local neural text to speech engine for Mycroft. (Intro) (HN)

DiffWave: A Versatile Diffusion Model for Audio Synthesis (2021) (Code)

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

HiFi-GAN - Training and inference scripts for the vocoder models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

Acoustic-Model - Training and inference scripts for the acoustic models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

HuBERT - Training and inference scripts for the HuBERT content encoders in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (2021) (Code)

Diffsound: Discrete Diffusion Model for Text-to-sound Generation (Code)

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation (2022) (Code)

Links​

Links