Title: | Text to Speech Conversion |
---|---|
Description: | Converts text into speech using various text-to-speech (TTS) engines and provides an unified interface for accessing their functionality. With this package, users can easily generate audio files of spoken words, phrases, or sentences from plain text data. The package supports multiple TTS engines, including Google's 'Cloud Text-to-Speech API', 'Amazon Polly', Microsoft's 'Cognitive Services Text to Speech REST API', and a free TTS engine called 'Coqui TTS'. |
Authors: | Howard Baek [cre] , John Muschelli [aut, ctb] |
Maintainer: | Howard Baek <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2025-01-13 02:47:14 UTC |
Source: | https://github.com/jhudsl/text2speech |
Accepts PCM audio data as input and generates a corresponding WAV file
pcm_to_wav( input, output = tempfile(fileext = ".wav"), sample_rate = 16000, extensible = FALSE )
pcm_to_wav( input, output = tempfile(fileext = ".wav"), sample_rate = 16000, extensible = FALSE )
input |
output from 'get_synthesis“ from |
output |
output file for Wav file |
sample_rate |
Sampling rate for tuneR::Wave |
extensible |
passed to tuneR::writeWave |
A filename of the output
## Not run: fname = system.file("extdata", "pcm_file.wav", package = "text2speech") res = pcm_to_wav(fname) testthat::expect_error(tuneR::readWave(fname)) testthat::expect_is(tuneR::readWave(res), "Wave") ## End(Not run) ## Not run: if (requireNamespace("aws.polly", quietly = TRUE)) { text = "hey, ho, let's go!" if (tts_amazon_auth()) { res = tts_amazon(text, output_format = "wav") } } ## End(Not run)
## Not run: fname = system.file("extdata", "pcm_file.wav", package = "text2speech") res = pcm_to_wav(fname) testthat::expect_error(tuneR::readWave(fname)) testthat::expect_is(tuneR::readWave(res), "Wave") ## End(Not run) ## Not run: if (requireNamespace("aws.polly", quietly = TRUE)) { text = "hey, ho, let's go!" if (tts_amazon_auth()) { res = tts_amazon(text, output_format = "wav") } } ## End(Not run)
This uses HTML5 audio tags to play audio in your browser.
play_audio(audio = "output.wav", html = "player.html")
play_audio(audio = "output.wav", html = "player.html")
audio |
The file location of the audio file. Must be supported by HTML5. |
html |
The html file location that will be created to host the audio file. |
Borrowed from googleLanguageR::gl_talk_player()
## Not run: play_audio(audio = "audio.wav", html = "player.html") ## End(Not run)
## Not run: play_audio(audio = "audio.wav", html = "player.html") ## End(Not run)
Function to set an option that points to the local coqui tts Executable File
tts
.
set_coqui_path(path)
set_coqui_path(path)
path |
path to the local coqui tts Executable File |
List of possible file path locations for the local coqui tts Executable File
/usr/bin/tts, /usr/local/bin/tts
/opt/homebrew/Caskroom/miniforge/base/bin/tts
C:\Program Files\tts
Returns nothing, function sets the option variable
path_to_coqui
.
set_coqui_path("~/path/to/tts")
set_coqui_path("~/path/to/tts")
Convert text-to-speech using various engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
With the exception of Coqui TTS, all these engines are accessible as R packages:
aws.polly is a client for Amazon Polly.
googleLanguageR is a client to the Google Cloud Text-to-Speech API.
conrad is a client to the Microsoft Cognitive Services Text to Speech REST API
tts( text, output_format = c("mp3", "wav"), service = c("amazon", "google", "microsoft", "coqui"), bind_audio = TRUE, ... ) tts_amazon( text, output_format = c("mp3", "wav"), voice = "Joanna", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_google( text, output_format = c("mp3", "wav"), voice = "en-US-Standard-C", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_microsoft( text, output_format = c("mp3", "wav"), voice = NULL, bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_coqui( text, exec_path, output_format = c("wav", "mp3"), model_name = "tacotron2-DDC_ph", vocoder_name = "ljspeech/univnet", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... )
tts( text, output_format = c("mp3", "wav"), service = c("amazon", "google", "microsoft", "coqui"), bind_audio = TRUE, ... ) tts_amazon( text, output_format = c("mp3", "wav"), voice = "Joanna", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_google( text, output_format = c("mp3", "wav"), voice = "en-US-Standard-C", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_microsoft( text, output_format = c("mp3", "wav"), voice = NULL, bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... ) tts_coqui( text, exec_path, output_format = c("wav", "mp3"), model_name = "tacotron2-DDC_ph", vocoder_name = "ljspeech/univnet", bind_audio = TRUE, save_local = FALSE, save_local_dest = NULL, ... )
text |
A character vector of text to be spoken |
output_format |
Format of output files: "mp3" or "wav" |
service |
Service to use (Amazon, Google, Microsoft, or Coqui) |
bind_audio |
Should the |
... |
Additional arguments |
voice |
Full voice name |
save_local |
Should the audio file be saved locally? |
save_local_dest |
If to be saved locally, destination where output file will be saved |
exec_path |
System path to Coqui TTS executable |
model_name |
(Coqui TTS only) Deep Learning model for Text-to-Speech Conversion |
vocoder_name |
(Coqui TTS only) Voice coder used for speech coding and transmission |
A standardized tibble
featuring the following columns:
index
: Sequential identifier number
original_text
: The text input provided by the user
text
: In case original_text exceeds the character limit, text represents the outcome of splitting original_text. Otherwise, text remains the same as original_text.
wav
: Wave object (S4 class)
file
: File path to the audio file
audio_type
: The audio format, either mp3 or wav
duration
: The duration of the audio file
service
: The text-to-speech engine used
## Not run: # Amazon Polly tts("Hello world! This is Amazon Polly", service = "amazon") tts("Hello world! This is Coqui TTS", service = "coqui") tts("Hello world! This is Google Cloud", service = "google") tts("Hello world! This is Microsoft", service = "microsoft") ## End(Not run)
## Not run: # Amazon Polly tts("Hello world! This is Amazon Polly", service = "amazon") tts("Hello world! This is Coqui TTS", service = "coqui") tts("Hello world! This is Google Cloud", service = "google") tts("Hello world! This is Microsoft", service = "microsoft") ## End(Not run)
Verify the authentication status of different text-to-speech engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
tts_auth( service = c("amazon", "google", "microsoft", "coqui"), key_or_json_file = NULL, ... ) tts_amazon_auth(key_or_json_file = NULL, ...) tts_google_auth(key_or_json_file = NULL, ...) tts_microsoft_auth(key_or_json_file = NULL, ...) tts_coqui_auth()
tts_auth( service = c("amazon", "google", "microsoft", "coqui"), key_or_json_file = NULL, ... ) tts_amazon_auth(key_or_json_file = NULL, ...) tts_google_auth(key_or_json_file = NULL, ...) tts_microsoft_auth(key_or_json_file = NULL, ...) tts_coqui_auth()
service |
Service to use (Amazon, Google, Microsoft, or Coqui) |
key_or_json_file |
Either an API key (for Microsoft) or JSON file (for Google) |
... |
Additional arguments |
To determine the availability of Coqui TTS, tts_auth()
examines whether the tts
executable exists on local system.
A logical indicator of authorization
# Amazon Polly tts_auth("amazon") # Google Cloud Text-to-Speech API tts_auth("google") # Microsoft Cognitive Services Text to Speech REST API tts_auth("microsoft") # Coqui TTS tts_auth("coqui")
# Amazon Polly tts_auth("amazon") # Google Cloud Text-to-Speech API tts_auth("google") # Microsoft Cognitive Services Text to Speech REST API tts_auth("microsoft") # Coqui TTS tts_auth("coqui")
As the data are split due to limits of the API, tts_bind_wav()
allows the text and the results to be harmonized
tts_bind_wav(result, same_sample_rate = TRUE)
tts_bind_wav(result, same_sample_rate = TRUE)
result |
A |
same_sample_rate |
A logical value indicating whether to force the same sample rate. |
A data.frame
with the same structure as that of tts
## Not run: # Same sample rate tts_bind_wav(res, same_sample_rate = TRUE) # Different sample rate tts_bind_wav(res, same_sample_rate = FALSE) ## End(Not run)
## Not run: # Same sample rate tts_bind_wav(res, same_sample_rate = TRUE) # Different sample rate tts_bind_wav(res, same_sample_rate = FALSE) ## End(Not run)
Default voice for text-to-speech engine
tts_default_voice(service = c("amazon", "google", "microsoft", "coqui"))
tts_default_voice(service = c("amazon", "google", "microsoft", "coqui"))
service |
Text-to-speech engine |
knitr
Speak Engine for knitr
tts_speak_engine(options)
tts_speak_engine(options)
options |
A list of chunk options. Usually this is just the object
options passed to the engine function; see |
A character string generated from the source code and output using the appropriate output hooks.
## Not run: knitr::knit_engines$set(speak = tts_speak_engine) options = list( code = "hey let's go to the park", eval = FALSE, label = "random", fig.path = tempdir(), echo = TRUE, results = "asis", engine = "speak") tts_speak_engine(options) if (tts_auth("google")) { options$eval = TRUE tts_speak_engine(options) } ## End(Not run)
## Not run: knitr::knit_engines$set(speak = tts_speak_engine) options = list( code = "hey let's go to the park", eval = FALSE, label = "random", fig.path = tempdir(), echo = TRUE, results = "asis", engine = "speak") tts_speak_engine(options) if (tts_auth("google")) { options$eval = TRUE tts_speak_engine(options) } ## End(Not run)
Various services offer a range of voice options:
Amazon Polly : https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
Microsoft Cognitive Services Text to Speech REST API : https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts#voice-styles-and-roles
Google Cloud Text-to-Speech API : https://cloud.google.com/text-to-speech/docs/voices
Coqui TTS : https://huggingface.co/spaces/coqui/CoquiTTS
tts_voices(service = c("amazon", "google", "microsoft", "coqui"), ...) tts_amazon_voices(...) tts_microsoft_voices(region = "westus") tts_google_voices(...) tts_coqui_voices()
tts_voices(service = c("amazon", "google", "microsoft", "coqui"), ...) tts_amazon_voices(...) tts_microsoft_voices(region = "westus") tts_google_voices(...) tts_coqui_voices()
service |
Service to use (Amazon, Google, Microsoft, or Coqui) |
... |
Additional arguments to service voice listings. |
region |
(Microsoft only) Region of your Microsoft Speech Service API Key |
(Amazon, Microsoft, and Google) A standardized data.frame
featuring
the following columns:
voice
: Name of the voice
language
: Spoken language
language_code
: Abbreviation for the language of the speaker
gender
: Male or female
service
: The text-to-speech engine used
(Coqui TTS) A tibble
featuring the following columns:
language
: Spoken language
dataset
: Dataset the deep learning model was trained on
model_name
: Name of deep learning model
service
: The text-to-speech engine used
# Amazon Polly if (tts_auth(service = "amazon")) { tts_voices(service = "amazon") } # Microsoft Cognitive Services Text to Speech REST API if (tts_auth(service = "microsoft")) { tts_voices(service = "microsoft") } # Google Cloud Text-to-Speech API if (tts_auth(service = "google")) { tts_voices(service = "google") } # Coqui TTS if (tts_auth(service = "coqui")) { tts_voices(service = "coqui") }
# Amazon Polly if (tts_auth(service = "amazon")) { tts_voices(service = "amazon") } # Microsoft Cognitive Services Text to Speech REST API if (tts_auth(service = "microsoft")) { tts_voices(service = "microsoft") } # Google Cloud Text-to-Speech API if (tts_auth(service = "google")) { tts_voices(service = "google") } # Coqui TTS if (tts_auth(service = "coqui")) { tts_voices(service = "coqui") }