| 序号 | 字符集 | 字数 | Unicode 编码范围 |
|---|---|---|---|
| 1 | 基本汉字 | 20902 | 4E00-9FA5 |
| 2 | 基本汉字补充 | 74 | 9FA6-9FEF |
| 3 | 扩展A | 6582 | 3400-4DB5 |
| 4 | 扩展B | 42711 | 20000-2A6D6 |
| 5 | 扩展C | 4149 | 2A700-2B734 |
| 6 | 扩展D | 222 | 2B740-2B81D |
| 7 | 扩展E | 5762 | 2B820-2CEA1 |
| 8 | 扩展F | 7473 | 2CEB0-2EBE0 |
| 9 | 康熙部首 | 214 | 2F00-2FD5 |
| 10 | 部首扩展 | 115 | 2E80-2EF3 |
| 11 | 兼容汉字 | 477 | F900-FAD9 |
| 12 | 兼容扩展 | 542 | 2F800-2FA1D |
| 13 | PUA(GBK)部件 | 81 | E815-E86F |
| 14 | 部件扩展 | 452 | E400-E5E8 |
| 15 | PUA增补 | 207 | E600-E6CF |
| 16 | 汉字笔画 | 36 | 31C0-31E3 |
| 17 | 汉字结构 | 12 | 2FF0-2FFB |
| 18 | 汉语注音 | 43 | 3105-312F |
| 19 | 注音扩展 | 22 | 31A0-31BA |
| 20 | 〇 | 1 | 3007 |
#只要是检测到一个非汉字字符就返回
#if条件一大堆,肯定有更简单的写法,再学吧!
def is_ch(word):
for ch in word:
if not('u4e00' <= ch <= 'u9fef') and not ('u3400' <= ch <= 'u4db5')
and not ('u20000' <= ch <= 'u2a6d6') and not ('u2a700' <= ch <= 'u2b734')
and not ('u2b740' <= ch <= 'u2b81d') and not ('u2b820' <= ch <= 'u2cea1')
and not ('u2ceb0' <= ch <= 'u2ebe0') and not ('u2f00' <= ch <= 'u2fd5')
and not ('u2e80' <= ch <= 'u2ef3') and not ('uf900' <= ch <= 'ufad9')
and not ('u2f800' <= ch <= 'u2fa1d') and not ('ue815' <= ch <= 'ue86f')
and not ('ue400' <= ch <= 'ue5e8') and not ('ue600' <= ch <= 'ue6cf')
and not ('u31c0' <= ch <= 'u31e3') and not ('u2ff0' <= ch <= 'u2ffb')
and not ('u3105' <= ch <= 'u312f') and not ('u31a0' <= ch <= 'u31ba'):
return False
break
return True
3. 有时间时可以扩展
(1)比如:全部为汉字时返回True和原字符串,有非汉字时返回False和非汉字字符串。
(2)if中判断条件一大堆,肯定有简单的写法,找到一个简单的写法或是优雅点的写法。
(3)更简单的实现方法?这些Unicode 编码连续吗?找时间研究一下!



