====================== Command Line Examples ====================== The ``phonemize`` command can be used to replicate most of the features exposed in :ref:`python-examples`. For a complete list of available options, have a: .. code:: shell phonemize --help See the installed backends with the ``--version`` option: .. code:: shell $ phonemize --version phonemizer-3.0 available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3 Input/output examples --------------------- - from stdin to stdout: .. code:: shell $ echo "hello world" | phonemize həloʊ wɜːld - Prepend the input text to output: .. code:: shell $ echo "hello world" | phonemize --prepend-text hello world | həloʊ wɜːld $ echo "hello world" | phonemize --prepend-text=';' hello world ; həloʊ wɜːld - from file to stdout .. code:: shell $ echo "hello world" > hello.txt $ phonemize hello.txt həloʊ wɜːld - from file to file .. code:: shell $ phonemize hello.txt -o hello.phon --strip $ cat hello.phon həloʊ wɜːld Backends -------- - The default is to use **espeak** us-english: .. code:: shell $ echo "hello world" | phonemize həloʊ wɜːld $ echo "hello world" | phonemize -l en-us -b espeak həloʊ wɜːld $ echo 'hello world' | phonemize -l en-us -b espeak --tie həlo͡ʊ wɜːld - Use **festival** US English instead .. code:: shell $ echo "hello world" | phonemize -l en-us -b festival hhaxlow werld - In French, using **espeak** and **espeak-mbrola**, with custom token separators (see below). **espeak-mbrola** does not support words separation. .. code:: shell $ echo "bonjour le monde" | phonemize -b espeak -l fr-fr -p ' ' -w '/w ' b ɔ̃ ʒ u ʁ /w l ə /w m ɔ̃ d /w $ echo "bonjour le monde" | phonemize -b espeak-mbrola -l mb-fr1 -p ' ' -w '/w ' b o~ Z u R l @ m o~ d - In Japanese, using **segments** .. code:: shell $ echo 'konnichiwa' | phonemize -b segments -l japanese konnitʃiwa $ echo 'konnichiwa' | phonemize -b segments -l ./phonemizer/share/japanese.g2p konnitʃiwa Supported languages ------------------- The exhaustive list of supported languages is available with the command ``phonemize --list-languages [--backend ]``. - Languages supported by **espeak** are available `here `__. - Languages supported by **espeak-mbrola** are available `here `__. Please note that the mbrola voices are not bundled with the phonemizer nor the mbrola binary and must be installed separately. - Languages supported by **festival** are: :: en-us -> english-us - Languages supported by the **segments** backend are: :: chintang -> ./phonemizer/share/segments/chintang.g2p cree -> ./phonemizer/share/segments/cree.g2p inuktitut -> ./phonemizer/share/segments/inuktitut.g2p japanese -> ./phonemizer/share/segments/japanese.g2p sesotho -> ./phonemizer/share/segments/sesotho.g2p yucatec -> ./phonemizer/share/segments/yucatec.g2p Instead of a language you can also provide a file specifying a grapheme to phone mapping (see the files above for examples). Token separators ---------------- You can specify separators for phones, syllables (**festival** only) and words (excepted **espeak-mbrola**). .. code:: shell $ echo "hello world" | phonemize -b festival -w ' ' -p '' hhaxlow werld $ echo "hello world" | phonemize -b festival -p ' ' -w '' hh ax l ow w er l d $ echo "hello world" | phonemize -b festival -p '-' -s '|' hh-ax-l-|ow-| w-er-l-d-| $ echo "hello world" | phonemize -b festival -p '-' -s '|' --strip hh-ax-l|ow w-er-l-d $ echo "hello world" | phonemize -b festival -p ' ' -s ';esyll ' -w ';eword ' hh ax l ;esyll ow ;esyll ;eword w er l d ;esyll ;eword You cannot specify the same separator for several tokens (for instance a space for both phones and words): .. code:: shell $ echo "hello world" | phonemize -b festival -p ' ' -w ' ' fatal error: illegal separator with word=" ", syllable="" and phone=" ", must be all differents if not empty Punctuation ----------- By default the punctuation is removed in the phonemized output. You can preserve it using the ``--preserve-punctuation`` option (not supported by the **espeak-mbrola** backend): .. code:: shell $ echo "hello, world!" | phonemize --strip həloʊ wɜːld $ echo "hello, world!" | phonemize --preserve-punctuation --strip həloʊ, wɜːld! The default punctuation marks are each of the following characters: ``;:,.!?¡¿—…"«»“”``. These can be overridden by the ``--punctuation-marks`` option. .. code-block:: shell $ echo "hello, world!" | phonemize --preserve-punctuation --strip --punctuation-marks '!?' həloʊ wɜːld! The punctuation marks can be specified as a regular expression by additionally using the ``--punctuation-marks-is-regex`` option. For example, to preserve the default punctuation marks except for commas and periods in the middle of numbers, the following will work: .. code-block:: shell $ echo "1,000, or so." | phonemize --preserve-punctuation --strip --punctuation-marks '[;:!?¡¿—…"«»“”]|[,.](?!\d)' --punctuation-marks-is-regex wʌn θaʊzənd, ɔːɹ soʊ. Espeak specific options ----------------------- - The espeak backend can output the **stresses** on phones: .. code:: shell $ echo "hello world" | phonemize -l en-us -b espeak --with-stress həlˈoʊ wˈɜːld - The espeak backend can add **tie** on multi-characters phonemes: .. code:: shell $ echo "hello world" | phonemize -l en-us -b espeak --tie həlo͡ʊ wɜːld .. warning:: The espeak backend can **switch languages** during phonemization (below from French to English), use the ``--language-switch`` option to deal with it: .. code:: shell $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch keep-flags [WARNING] fount 1 utterances containing language switches on lines 1 [WARNING] extra phones may appear in the "fr-fr" phoneset [WARNING] language switch flags have been kept (applying "keep-flags" policy) ʒɛm lə- (en)fʊtbɔːl(fr) $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch remove-flags [WARNING] fount 1 utterances containing language switches on lines 1 [WARNING] extra phones may appear in the "fr-fr" phoneset [WARNING] language switch flags have been removed (applying "remove-flags" policy) ʒɛm lə- fʊtbɔːl $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch remove-utterance [WARNING] removed 1 utterances containing language switches (applying "remove-utterance" policy) .. warning:: The espeak backend sometimes **merge words together** in the output, use the ``--words-mismatch`` option to deal with it: .. code:: shell $ echo "that's it, words are merged" | phonemize -l en-us -b espeak [WARNING] words count mismatch on 100.0% of the lines (1/1) ðætsɪt wɜːdz ɑːɹ mɜːdʒd