这是Pig Latin方言,考虑了单词的发音方式:
#!/usr/bin/env python# -*- coding: utf-8 -*-import resentences = ["Pig qoph an egg.", "Quiet European rhythms.", "My nth happy hour.", "Herb unit -- a dynasty heir."]for sent in sentences: entsay = " ".join(["".join(map(to_piglatin, re.split("(W+)", nonws))) for nonws in sent.split()]) print(u'"{}" → "{}"'.format(sent, entsay))输出量
“猪qoph一个鸡蛋。” →“ igpay ophqay anway鸡蛋路。”“安静的欧洲节奏。” →“ ietquay uropeaneay ythmsrhay。”“我的第n个欢乐时光。” →“ ymay nthway appyhay小时制。”“草药单位-王朝继承人。” →“草帽itunay-离开ynastyday继承人”。
注意:
"-way"
后缀用于以元音开头的单词qu
在“安静”中被视为一个单位European
,unit
从辅音开始y
在“节奏”中,“王朝”是元音nth
,hour
,herb
,heir
开始以元音
在哪里
to_piglatin():
from nltk.corpus import cmudict # $ pip install nltk# $ python -c "import nltk; nltk.download('cmudict')"def to_piglatin(word, pronunciations=cmudict.dict()): word = word.lower() #NOTE: ignore Unipre casefold i = 0 # find out whether the word start with a vowel sound using # the pronunciations dictionary for syllables in pronunciations.get(word, []): for i, syl in enumerate(syllables): isvowel = syl[-1].isdigit() if isvowel: break else: # no vowels assert 0 if i == 0: # starts with a vowel return word + "way" elif "y" in word: # allow 'y' as a vowel for known words return to_piglatin_naive(word, vowels="aeiouy", start=i) break # use only the first pronunciation return to_piglatin_naive(word, start=i)def to_piglatin_naive(word, vowels="aeiou", start=0): word = word.lower() i = 0 for i, c in enumerate(word[start:], start=start): if c in vowels: break else: # no vowel in the word i += 1 return word[i:] + word[:i] + "w"*(i == 0) + "ay"*word.isalnum()要将文本拆分为句子,可以使用
nltk标记符来分隔单词。可以修改代码以尊重字母的大小写(大写/小写),紧缩。



