API Reference
Toplevel API
- class phonemizer.phonemize(text, language: str = 'en-us', backend: ~typing.Literal['espeak', 'espeak-mbrola', 'festival', 'segments'] = 'espeak', separator: ~phonemizer.separator.Separator | None = <phonemizer.separator.Separator object>, strip: bool = False, prepend_text: bool = False, preserve_empty_lines: bool = False, preserve_punctuation: bool = False, punctuation_marks: str | ~typing.Pattern = ';:, .!?¡¿—…"«»“”(){}[]', with_stress: bool = False, tie: bool | str = False, language_switch: ~typing.Literal['keep-flags', 'remove-flags', 'remove-utterance'] = 'keep-flags', words_mismatch: ~typing.Literal['warn', 'ignore'] = 'ignore', njobs: int = 1, logger: ~logging.Logger = <Logger phonemizer (WARNING)>)
Multilingual text to phonemes converter
Return a phonemized version of an input text, given its language and a phonemization backend.
Note
To improve the processing speed it is better to minimize the calls to this function: provide the input text as a list and call phonemize() a single time is much more efficient than calling it on each element of the list. Indeed the initialization of the phonemization backend can be expensive, especially for espeak. In one example,
Do this:
>>> text = [line1, line2, ...] >>> phonemize(text, ...)
Not this:
>>> for line in text: >>> phonemize(line, ...)
- Parameters:
text (str or list of str) – The text to be phonemized. Any empty line will be ignored. If
text
is an str, it can be multiline (lines being separated by\n
). Iftext
is a list, each element is considered as a separated line. Each line is considered as a text utterance.language (str) – The language code of the input text, must be supported by the backend. If
backend
is ‘segments’, the language can be a file with a grapheme to phoneme mapping.backend (str, optional) – The software backend to use for phonemization, must be ‘festival’ (US English only is supported, coded ‘en-us’), ‘espeak’, ‘espeak-mbrola’ or ‘segments’.
separator (Separator) –
string separators between phonemes, syllables and words, default to separator.default_separator. Syllable separator is considered only for the festival backend. Word separator is ignored by the ‘espeak-mbrola’ backend. Initialize it as follows:
>>> from phonemizer.separator import Separator >>> separator = Separator(phone='-', word=' ')
strip (bool, optional) – If True, don’t output the last word and phone separators of a token, default to False.
prepend_text (bool, optional) – When True, returns a pair (input utterance, phonemized utterance) for each line of the input text. When False, returns only the phonemized utterances. Default to False
preserve_empty_lines (bool, optional) – When True, will keep the empty lines in the phonemized output. Default to False and remove all empty lines.
preserve_punctuation (bool, optional) – When True, will keep the punctuation in the phonemized output. Not supported by the ‘espeak-mbrola’ backend. Default to False and remove all the punctuation.
punctuation_marks (str or re.Pattern, optional) – The punctuation marks to consider when dealing with punctuation, either for removal or preservation. Can be defined as a string or regular expression. Default to Punctuation.default_marks().
with_stress (bool, optional) – This option is only valid for the ‘espeak’ backend. When True the stresses on phonemes are present (stresses characters are ˈ’ˌ). When False stresses are removed. Default to False.
tie (bool or char, optional) – This option is only valid for the ‘espeak’ backend with espeak>=1.49. It is incompatible with phone separator. When not False, use a tie character within multi-letter phoneme names. When True, the char ‘U+361’ is used (as in d͡ʒ), ‘z’ means ZWJ character, default to False.
language_switch (str, optional) – Espeak can output some words in another language (typically English) when phonemizing a text. This option setups the policy to use when such a language switch occurs. Three values are available : ‘keep-flags’ (the default), ‘remove-flags’ or ‘remove-utterance’. The ‘keep-flags’ policy keeps the language switching flags, for example “(en) or (jp)”, in the output. The ‘remove-flags’ policy removes them and the ‘remove-utterance’ policy removes the whole line of text including a language switch. This option is only valid for the ‘espeak’ backend.
words_mismatch (str, optional) – Espeak can join two consecutive words or drop some words, yielding a word count mismatch between orthographic and phonemized text. This option setups the policy to use when such a words count mismatch occurs. Three values are available: ‘ignore’ (the default) which do nothing, ‘warn’ which issue a warning for each mismatched line, and ‘remove’ which remove the mismatched lines from the output.
njobs (int) – The number of parallel jobs to launch. The input text is split in
njobs
parts, phonemized on parallel instances of the backend and the outputs are finally collapsed.logger (logging.Logger) – the logging instance where to send messages. If not specified, use the default system logger.
- Returns:
phonemized text – The input
text
phonemized for the givenlanguage
andbackend
. The returned value has the same type of the input text (either a list or a string), excepted ifprepend_input
is True where the output is forced as a list of pairs (input_text, phonemized text).- Return type:
str or list of str
- Raises:
RuntimeError – if the
backend
is not valid or is valid but not installed, if thelanguage
is not supported by thebackend
, if any incompatible options are used.
- class phonemizer.separator.Separator(word: str = ' ', syllable: str | None = None, phone: str | None = None)
Defines phone, syllable and word boundary tokens
- input_output_separator(field_separator: str | bool) str | bool
Returns a suitable input/output separator based on token separator
The input/output separator split orthographic and phonetic texts when using the –prepend-text option from command-line.
- Parameters:
field_separator (bool or str) – If str, ensures it’s value is not already defined as a token separator. If True choose one of “|”, “||”, “|||”, “||||” (the first one that is not defined as a token separator)
- Return type:
The input/output separator, or False if
field_separator
is False- Raises:
RuntimeError – if
field_separator
is a str but is already registered as token separator
- property phone
Phones separator
- property syllable
Syllables separator
- property word
Words separator
Backends
- class phonemizer.backend.base.BaseBackend(language: str, punctuation_marks: str | Pattern | None = None, preserve_punctuation: bool = False, logger: Logger | None = None)
Abstract base class of all the phonemization backends
Provides a common interface to all backends. The central method is phonemize()
- Parameters:
language (str) – The language code of the input text, must be supported by the backend. If
backend
is ‘segments’, the language can be a file with a grapheme to phoneme mapping.preserve_punctuation (bool) – When True, will keep the punctuation in the phonemized output. Not supported by the ‘espeak-mbrola’ backend. Default to False and remove all the punctuation.
punctuation_marks (str) – The punctuation marks to consider when dealing with punctuation, either for removal or preservation. Can be defined as a string or regular expression. Default to Punctuation.default_marks().
logger (logging.Logger) – the logging instance where to send messages. If not specified, use the default system logger.
- Raises:
RuntimeError – if the backend is not available of if the language cannot be initialized.
- abstract classmethod is_available()
Returns True if the backend is installed, False otherwise
- classmethod is_supported_language(language: str)
Returns True if language is supported by the backend
- property language
The language code configured to be used for phonemization
- property logger
A logging.Logger instance where to send messages
- abstract static name()
The name of the backend
- phonemize(text: List[str], separator: Separator | None = None, strip: bool = False, njobs: int = 1) List[str]
Returns the text phonemized for the given language
- Parameters:
text (list of str) – The text to be phonemized. Each string in the list is considered as a separated line. Each line is considered as a text utterance. Any empty utterance will be ignored.
separator (Separator) – string separators between phonemes, syllables and words, default to separator.default_separator. Syllable separator is considered only for the festival backend. Word separator is ignored by the ‘espeak-mbrola’ backend.
strip (bool) – If True, don’t output the last word and phone separators of a token, default to False.
njobs (int) – The number of parallel jobs to launch. The input text is split in
njobs
parts, phonemized on parallel instances of the backend and the outputs are finally collapsed.
- Returns:
phonemized text – The input
text
phonemized for the givenlanguage
andbackend
.- Return type:
list of str
- Raises:
RuntimeError – if something went wrong during the phonemization
- abstract static supported_languages() Dict[str, str]
Return a dict of language codes -> name supported by the backend
- abstract classmethod version()
Return the backend version as a tuple (major, minor, patch)
- class phonemizer.backend.espeak.espeak.EspeakBackend(language: str, punctuation_marks: str | Pattern | None = None, preserve_punctuation: bool = False, with_stress: bool = False, tie: bool | str = False, language_switch: Literal['keep-flags', 'remove-flags', 'remove-utterance'] = 'keep-flags', words_mismatch: Literal['warn', 'ignore'] = 'ignore', logger: Logger | None = None)
Espeak backend for the phonemizer
- static name()
The name of the backend
- classmethod supported_languages()
Return a dict of language codes -> name supported by the backend
- class phonemizer.backend.espeak.mbrola.EspeakMbrolaBackend(language: str, logger: Logger | None = None)
Espeak-mbrola backend for the phonemizer
- classmethod is_available() bool
Mbrola backend is available for espeak>=1.49
- static name()
The name of the backend
- classmethod supported_languages() Dict[str, str]
Returns the list of installed mbrola voices
- class phonemizer.backend.segments.SegmentsBackend(language: str, punctuation_marks: str | Pattern | None = None, preserve_punctuation: bool = False, logger: Logger | None = None)
Segments backends for the phonemizer
The phonemize method will raise a ValueError when parsing an unknown morpheme.
- classmethod is_available()
Returns True if the backend is installed, False otherwise
- classmethod is_supported_language(language: str) bool
Returns True if language is supported by the backend
- static name()
The name of the backend
- static supported_languages()
Returns a dict of language: file supported by the segments backend
The supported languages have a grapheme to phoneme conversion file bundled with phonemizer. Users can also use their own file as parameter of the phonemize() function.
- classmethod version()
Return the backend version as a tuple (major, minor, patch)
- class phonemizer.backend.festival.festival.FestivalBackend(language: str, punctuation_marks: str | Pattern | None = None, preserve_punctuation: bool = False, logger: Logger | None = None)
Festival backend for the phonemizer
- classmethod executable() Path
Returns the absolute path to the festival executable used as backend
The following precedence rule applies for executable lookup:
As specified by FestivalBackend.set_executable()
Or as specified by the environment variable PHONEMIZER_FESTIVAL_EXECUTABLE
Or the default ‘festival’ binary found on the system with
shutil.which('festival')
- Raises:
RuntimeError – if the festival executable cannot be found or if the environment variable PHONEMIZER_FESTIVAL_EXECUTABLE is set to a non-executable file
- classmethod is_available()
True if the festival executable is available, False otherwise
- static name()
The name of the backend
- classmethod set_executable(executable: str)
Sets the festival backend to use executable
If this is not set, the backend uses the default festival executable from the system installation.
- Parameters:
(str) (executable) – backend. Set executable to None to restore the default.
- Raises:
RuntimeError if executable is not an executable file. –
- static supported_languages() Dict[str, str]
A dictionnary of language codes -> name supported by festival
Actually only en-us (American English) is supported.
- classmethod version()
Festival version as a tupe of integers (major, minor, patch)
- Raises:
RuntimeError if FestivalBackend.is_available() is False or if the – version cannot be extracted for some reason.