Coqui TTS is a text-to-speech (TTS) library that enables the
conversion of regular text into speech and is completely free to use.
This is not true of the other text to speech engines used by
text2speech
.
Coqui TTS provides pre-trained tts and vocoder models as part of its package. To get a sense of the best tts and vocoder models, take a look at this GitHub Discussion post. In the Coqui TTS Hugging Face Space, you have the opportunity to experiment with a few of these models by inputting text and receiving corresponding audio output.
The underlying technology of text-to-speech is highly intricate and will not be the focus of this vignette. However, if you’re interested in delving deeper into the subject, here are some recommended talks:
Coqui TTS includes pre-trained models like Spectogram models (such as Tacotron2 and FastSpeech2), End-to-End Models (including VITS and YourTTS), and Vocoder models (like MelGAN and WaveGRAD).
To install Coqui TTS, you will need to enter the following command in the terminal:
$ pip install TTS
Note: If you are using a Mac with an M1 chip, initial step is to execute the following command in terminal:
$ brew install mecab
Afterward, you can proceed to install TTS by executing the following command:
$ pip install TTS
To use Coqui TTS, text2speech needs to know the correct path to the Coqui TTS executable. This path can be obtained through two methods: manual and automatic.
You have the option to manually specify the path to the Coqui TTS
executable in R. This can be done by setting a global option using the
set_coqui_path()
function:
To determine the location of the Coqui TTS executable, you can enter
the command which tts
in the terminal.
Internally, the set_coqui_path()
function runs
options("path_to_coqui" = path)
to set the provided path as
the value for the path_to_coqui
global option, as long as
the Coqui TTS executable exists at that location.
The functions tts_auth(service = "coqui")
,
tts_voices(service = "coqui")
, and
tts(service = "coqui")
incorporate a way to search through
a predetermined list of known locations for the Coqui TTS executable. If
none of these paths yield a valid TTS executable, an error message will
be generated, directing you to use set_coqui_path()
to
manually set the correct path.
The function tts_voices(service = "coqui")
is a wrapper
for the system command tts --list_models
, which lists the
released Coqui TTS models.
The result is a tibble with the following columns:
language
, dataset
, model_name
,
and service
.
language
column contains the language code associated
with the speaker.dataset
column indicates the specific dataset on which
the text-to-speech model, denoted by model_name
, was
trained.model_name
column refers to the name of the
text-to-speech model.service
column refers to the specific TTS service used
(Amazon, Google, Microsoft, or Coqui TTS)You can find a list of papers associated with some of the implemented models for Coqui TTS here.
By providing the values from this tibble (language
,
dataset
, and model_name
) in
tts()
, you can select the specific voice you want for
text-to-speech synthesis.
To convert text to speech, you can use the function
tts(text = "Hello world!", service = "coqui")
.
The result is a tibble with the following columns:
index
, original_text
, text
,
wav
, file
, audio_type
,
duration
, and service
. Some of the noteworthy
ones are:
text
: If the original_text
exceeds the
character limit, text
represents the outcome of splitting
original_text
. Otherwise, text
remains the
same as original_text
.file
: The location where the audio output is
saved.audio_type
: The format of the audio file, either mp3 or
wav.By default, the function tts(service = "coqui")
uses the
tacotron2-DDC_ph
model and the
ljspeech/univnet
vocoder. You can specify a different model
with the argument model_name
, or a different vocoder with
the argument vocoder_name
.
tts(text = "Hello world, using a different voice!",
service = "coqui",
model_name = "fast_pitch",
vocoder_name = "ljspeech/hifigan_v2")
Another default is that tts(service = "coqui")
saves the
audio output in a temporary folder and its path is shown in the
file
column of the resulting tibble. However, a temporary
directory lasts only as long as the current R session, which means that
when you restart your R session, that path will not exist!
A more sustainable workflow would be to save the audio output in a
local folder. To save the audio output in a local folder, set the
arguments save_local = TRUE
and
save_local_dest = /full/path/to/local/folder
. Make sure to
provide the full path to the local folder.
tts(text = "Hello world! I am saving the audio output in a local folder",
service = "coqui",
save_local = TRUE,
save_local_dest = "/full/path/to/local/folder")
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] text2speech_1.0.0 rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.3 knitr_1.49 rlang_1.1.4
#> [5] xfun_0.50 purrr_1.0.2 generics_0.1.3 jsonlite_1.8.9
#> [9] glue_1.8.0 buildtools_1.0.0 htmltools_0.5.8.1 maketools_1.3.1
#> [13] sys_3.4.3 sass_0.4.9 tibble_3.2.1 evaluate_1.0.3
#> [17] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
#> [21] compiler_4.4.2 dplyr_1.1.4 pkgconfig_2.0.3 tidyr_1.3.1
#> [25] digest_0.6.37 R6_2.5.1 tidyselect_1.2.1 pillar_1.10.1
#> [29] magrittr_2.0.3 bslib_0.8.0 withr_3.0.2 tools_4.4.2
#> [33] cachem_1.1.0