Speech synthesis
Links
- Deepvoice3 PyTorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models.
- WaveNet vocoder - Can generate high quality raw speech samples conditioned on linguistic or acoustic features.
- Papercup - Translate your content into other languages with a voice that sounds like yours.
- WaveNet implementation in Keras
- nv-wavenet - CUDA reference implementation of autoregressive WaveNet inference.
- PyTorch implementation of Tacotron speech synthesis model
- Yet another WaveNet implementation in PyTorch
- Flowtron - Auto-regressive flow-based generative network for text to speech synthesis.
- A highly efficient, real-time text-to-speech system deployed on CPUs (2020) (HN)
- Sonatic - Emotionally Expressive Text to Speech.
- GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
- Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare? (2020)
- DiffWave - Fast, high-quality neural vocoder and waveform synthesizer.
- Voice Conversion with Non-Parallel Data
- Speech Synthesis Papers
- VoiceFilter - Unofficial PyTorch implementation of Google AI's VoiceFilter system. (Web)
- ForwardTacotron - Generating speech in a single forward pass without any attention. (Web)
- HiFi-GAN - Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.
- Parakeet - Text-to-speech toolKIT (supporting WaveFlow, ClariNet, WaveNet, Deep Voice 3, Transformer TTS and FastSpeech).
- pyttsx3 - Offline Text To Speech synthesis for python.
- SOVA TTS - Speech syntthesis solution based on Tacotron 2 architecture.
- eSpeak NG - Open source speech synthesizer that supports more than hundred languages and accents.
- PRiSM SampleRNN - Neural sound synthesis with TensorFlow 2.
- Flite - Small fast portable speech synthesis system.
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (2020) (Code) (Code)
- Neural Granular Sound Synthesis (Code)
- CLEESE - Combinatorial Expressive Speech Engine.
- LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
- LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (2021) (Code)
- A Survey on Neural Speech Synthesis (2021) (Code)
- Binaural Speech Synthesis - Code to train a mono-to-binaural neural sound renderer.
- NN-SVS - Neural network-based singing voice synthesis library for research.
- Larynx - End to end text to speech system using gruut and onnx, 50 voices, 9 languages.
- WellSaid Labs - Voice Narration. Simplified.
- Neural Wave shaping Synthesis - Efficient neural audio synthesis in the waveform domain. (Article)
- Catch-A-Waveform: Learning to Generate Audio from a Single Short Example (Code)
- TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis (2020) (Code)
- EdiTTS: Score-based Editing for Controllable Text-to-Speech
- PortaSpeech: Portable and High-Quality Generative Text-to-Speech (2021) (Code)
- Speech Resynthesis from Discrete Disentangled Self-Supervised Representations (2021) (Code)
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge (2021) (Code)
- Grail-rs - Rust speech synth.
- RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (2021) (Code)
- WaveFlow: A Compact Flow-based Model for Raw Audio (2020) (Code)
- VoiceFixer - Framework for general speech restoration.
- TTS-RS - High-level Text-To-Speech (TTS) interface supporting various backends.
- Speech synthesis using AVSpeechSynthesizer (2021)
- Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (2021) (Code)
- TTS - Library for advanced Text-to-Speech generation. (Web) (HN)
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
- SubSync - Subtitle Speech Synchronizer. (Overview) (HN)
- Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation (2021) (Code)
- NATSpeech - Non-Autoregressive Text-to-Speech Framework.
- VocBench: A Neural Vocoder Benchmark for Speech Synthesis (2021) (Code)
- TransformerTTS - Text-to-Speech Transformer in TensorFlow 2.
- Awesome Speech Recognition Speech Synthesis Papers
- Neural Instrument Cloning from very few samples (2022) (Code)
- MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (2021) (Code)
- IMS Toucan - Toolkit to train state-of-the-art Speech Synthesis models.
- BDDM: Bilateral Denoising Diffusion Models for Fast and High-quality Speech Synthesis (2022)
- Deep Learning for Emotional Text-to-speech - Summary on our attempts at using Deep Learning approaches for Emotional Text to Speech.
- Nix-TTS - Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation.
- xVA Synth - Machine learning based speech synthesis Electron app, with voices from specific characters from video games.
- Bandwidth Extension is All You Need (2021) (Code)
- TorToiSe - Multi-voice TTS system trained with an emphasis on quality.
- Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech (2020) (Code)
- UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation (2021) (Code)
- TikTok TTS - Generate the funny TiKTok lady voice (& more) in your browser. (Code)
- TikTok Text-to-speech API - Simple Python script to interact with the TikTok TTS API.
- Unreal Speech - Text-to-Speech API. Better & 8x Cheaper than AWS.
- 15.ai - Natural TTS with minimal viable data. (HN)
- JDC-PitchExtractor - Deep Neural Pitch Extractor for Voice Conversion and TTS Training.
- Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech (2021) (Code)
- Publicly Available Emotional Speech Dataset (ESD) for Speech Synthesis and Voice Conversion
- Mimic 3 - Fast local neural text to speech engine for Mycroft. (Intro) (HN)
- DiffWave: A Versatile Diffusion Model for Audio Synthesis (2021) (Code)
- FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
- HiFi-GAN - Training and inference scripts for the vocoder models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- Acoustic-Model - Training and inference scripts for the acoustic models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- HuBERT - Training and inference scripts for the HuBERT content encoders in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (2021) (Code)
- Diffsound: Discrete Diffusion Model for Text-to-sound Generation (Code)
- DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation (2022) (Code)