Changelog
Version numbers follow semantic versioning.
phonemizer-3.3.0
- **This version depends on python>=3.8 (was 3.6 in previous versions). Tests now
requires pytest>=6.0**
improvements
Replaced dependency to deprecated
pkg_resourcesbyimportlib(requires python>=3.8).Replaced deprecated
setup.pybypyproject.toml.
bug fix
espeakbackend: words mismatch now works when using custom word separators. See issue #169.
phonemizer-3.2.1
bug fixes
Fixed a bug when trying to restore punctuation on a multiline text. See issue #129.
phonemizer-3.2.0
bug fixes
Fixed a bug when trying to restore punctuation on very long text. See issue #108.
improvements
Improved consistency with the handling of word separators when preserving punctuation, and when using a word separator that is not a literal space character. See issue #106.
new features
Added the option to define punctuation with a regular expression. Previously only strings were accepted. See PR #120
In the python API, the
punctuation_marksparameter can now be passed tophonemize(or a backend constructor) as are.Patternthat defines which characters will be matched as punctuation. Passingpunctuation_marksas a str will continue to function as before, treating each character in the string as a punctuation mark.Added the optional parameter
--punctuation_marks_is_regexto the CLI interface. When used, the CLI will attempt to compile are.Patternfrom the value passed to--punctuation-marks.
phonemizer-3.1.1
improvements
Preserve empty lines in texts when using
--preserve-empty-lines. Without this option, empty lines used to be automatically dropped. See PR #103
new features
Type hinted most of
phonemizer’s API. This makes the usage of our API a bit clearer, and can be easily leveraged by IDE’s and type checkers to prevent typing issues.
phonemizer-3.0.1
bug fixes
The method
BaseBackend.phonemizenow raises aRuntimeErrorif the input text is a str instead of a list of of str (was only logging an error message).Preserve punctuation alignement when using
--preserve-punctuation, was inserting a space before each punctuation token, see issue #97.
phonemizer-3.0
breaking change
Do not remove empty lines from output. For example:
# this is now phonemize(["hello", "!??"]) == ['həloʊ ', ''] # this was phonemize(["hello", "!??"]) == ['həloʊ ']
Default backend in the
phonemizefunction is nowespeak(wasfestival).espeak-mbrolabackend now requiresespeak>=1.49.--espeak-pathoption renamed as--espeak-libraryandPHONEMIZER_ESPEAK_PATHenvironment variable renamed asPHONEMIZER_ESPEAK_LIBRARY.--festival-pathoption renamed as--festival-executableandPHONEMIZER_FESTIVAL_PATHenvironment variable renamed asPHONEMIZER_FESTIVAL_EXECUTABLE.The methods
backend.phonemize()from the backend classes take only a list of str a input text (was either a str or a list of str).The methods
backend.version()from the backend classes returns a tuple of int instead of a str.
improvements
espeakandmbrolabackends now rely on theespeakshared library using thectypesPython module, instead of reliying on theespeakexecutable through subprocesses. This implies drastic speed improvments, up to 40 times faster.
new features
New option
--prepend-textto prepend the input text to phonemized utterances, so as to have both orthographic and phonemized available at output.New option
--tiefor theespeakbackend to display a tie character within multi-letter phonemes. (see issue #74).New option
--words-mismatchfor theespeakbackend. This allows to detect when espeak merge consecutive words or drop a word from the orthographic text. Possible actions are to ignore those misatches, to issue a warning for each line where a mismatch is detectd, or to remove those lines from the output.
bugfixes
phonemizer’s logger no more conflicts with other loggers when imported from Python (see PR #61).
phonemizer-2.2.2
bugfixes
fixed installation from source (bug introduced in 2.2.1, see issue #52).
Fixed a bug when trying to restore punctuation on an empty text (see issue #54).
Fixed an edge case bug when using custom punctuation marks (see issue #55).
Fixed regex issue that causes digits to be considered punctuation (see issue #60).
phonemizer-2.2.1
improvements
From Python import the phonemize function using
from phonemizer import phonemizeinstead offrom phonemizer.phonemize import phonemize. The second import is still available for compatibility.bugfixes
phonemizer-2.2
new features
New option
--list-languagesto list the available languages for a given backend from the command line.The
--sampaoption of theespeakbackend has been replaced by a new backendespeak-mbrola.The former
--sampaoption (introduced in phonemizer-2.0) outputs phones that are not standard SAMPA but are adapted to the espeak TTS front-end.On the other hand the
espeak-mbrolabackend allows espeak to output phones in standard SAMPA (adapted to the mbrola TTS front-end). This backend requires mbrola to be installed, as well as additional mbrola voices to support needed languages. This backend does not support word separation nor punctuation preservation.
bugfixes
phonemizer-2.1
new features
Possibility to preserve the punctuation (ignored and silently removed by default) in the phonemized output with the new option
--preserve-punctuationfrom command line (or the equivalentpreserve-punctuationfrom Python API). With thepunctuation-marksoption, one can overload the default marls considered as punctuation.It is now possible to specify the path to a custom
espeakorfestivalexecutable (for instance to use a local installation or to test different versions). Either specify thePHONEMIZER_ESPEAK_PATHenvironment variable, the--espeak-pathoption from command line or use theEspeakBackend.set_espeak_pathmethod from the Python API. Similarly for festival usePHONEMIZER_FESTIVAL_PATH,--festival-pathorFestivalBackend.set_festival_path.The
--sampaoption is now available for espeak (was available only for espeak-ng).When using
espeakwith SAMPA output, some SAMPA phones are corrected to correspond to the normalized SAMPA alphabet (espeak seems not to respect it). The corrections are language specific. A correction file must be placed inphonemizer/share/espeak. This have been implemented only for French by now.
bugfixes
parses correctly the version of
espeak-ngeven for dev versions (e.g.1.51-dev).fixed an issue with
espeakbackend, where multiple phone separators can be present at the end of a word, see #31.added an additional stress symbol
-forespeak.
phonemizer-2.0.1
bugfixes
keep-flagswas not the default argument forlanguage_switchin the classEspeakBackend.fixed an issue with punctuation processing in the espeak backend, see #26
improvements
log a warning if using
python2.
phonemizer-2.0
incompatible change
Starting with
phonemizer-2.0only python3 is supported. Compatibility with python2 is no more ensured nor tested. https://pythonclock.org.bugfixes
new
--language-switchoption to use withespeakbackend to deals with language switching on phonemized output. In previous version there was a bug in detection of the language switching flags (sometimes removed, sometimes not). Now you can choose to keep the flags, to remove them, or to delete the whole utterance.bugfix in a test with
espeak>=1.49.3.bugfix using
NamedTemporaryFileon windows, see #21.bugfix when calling festival or espeak subprocesses on Windows, see #17.
bugfix in detecting recent versions of espeak-ng, see #18.
bugfix when using utf8 input on espeak backend (python2), see #19.
new features and improvements
new
--sampaoption to output phonemes in SAMPA alphabet instead of IPA, available for espeak-ng only.new
--with-stressoption to use withespeakbackend to not remove the stresses on phonemized output. For instance:$ echo "hello world" | phonemize həloʊ wɜːld $ echo "hello world" | phonemize --with-stress həlˈoʊ wˈɜːld
improved logging: by default only warnings are displayed, use the new
--quietoption to inhibate all log messages or--verboseto see all of them. Log messages now display level name (debug/info/warning).improved code organization:
backends are now implemented in the
backendsubmodule as separated source files.improved version string (displays uninstalled backends, moved outside of main for use from Python).
improved logger implemented in its own module so as a call to phonemizer from CLI or API yields the same log messages.
phonemizer-1.0
incompabile changes
The following changes break the compatibility with previous versions of phonemizer (0.X.Y):
command-line
phonemizeprogram: new--backend <espeak|festival|segments>option, default language is now espeak en-us (was festival en-us),it is now illegal to have the same separator at different levels (for instance a space for both word and phone),
from Python, must import the phonemize function as
from phonemizer.phonemize import phonemize, wasfrom phonemizer import phonemize.
New backend segments for phonemization based on grapheme-to-phoneme mappings.
Major refactoring of the backends implementation and separators (as Python classes).
Input to phonemizer now supports utf8.
Better handling of errors (display of a meaningful message).
Fixed a bug in fetching espeak version on macos, see #14.
phonemizer-0.3.3
Fix a bug introduced in phonemizer-0.3.2 (apostrophes in festival backend). See #12.
phonemizer-0.3.2
Continuous integration with tracis-ci.
Support for docker.
Better support for different versions of espeak/festival.
Minor bugfixes and improved tests.
phonemizer-0.3.1
New espeak or espeak-ng backend with more than 100 languages.
Support for Python 2.7 and 3.5.
Integration with zenodo for citation.
Various bugfixes and minor improvments.
phonemizer-0.2
First public release.
Support for festival backend, American English only.