Note3 字符串

字符串(字符序列)和字节序列

字符

由于历史原因，将字符定义为unicode字符还不准确，但是未来字符的定义一定是unicode字符

字节

就是字符的二进制表现形式

码位

我们计算机显示的实际的是码位

>>> '你好'.encode("unicode_escape").decode()
'\u4f60\u597d'
>>> 'u4f60u597d'
'你好'

UNICODE标准中以4-6个十六进制数字表示

编码

字符序列(string)->字节序列(bytes)--------------编码(encode)

>>> "你好".encode("utf-8")
b'xe4xbdxa0xe5xa5xbd'

字节序列(bytes)->字符序列(string)--------------解码(decode)

>>> b
b'xe4xbdxa0xe5xa5xbd'
>>> b.decode("utf-8")
'你好'

编码错误

乱码和混合编码

检查编码

没有办法通过二进制字节序列来得出编码格式，都是通过统计学来预估当前编码

#安装chardet
pip install chardet

#导入chardet
>>> import chardet
>>> chardet.detect(b)

解决乱码和混合编码

忽略错误编码

>>> b_2 = "你好".encode("utf-8") + "啊".encode("gbk")

>>> b_2
b'xe4xbdxa0xe5xa5xbdxb0xa1'

>>> b_2.decode("utf-8",errors = 'ignore')
'你好'

利用鬼符来替换

>>>b_2.decode("utf-8",errors = 'replace')
'你好��'

字符串的CRUD操作

通过dir("")可以查看当前字符串的操作方法

Create(创建)

>>> a = "a"
>>> id(a)
1798051035376
>>> a = a+"b"
>>> id(a)
1798054084336

a +="b" 就是 a = a + "b" 的省略写法

Retrieve(检索)

根据索引获取字符

在计算机语言当中，索引值是从0开始数的

>>> a = "Hello world"
>>> a[1]
'e'

find和index(获取目标字符的索引值)

>>> a.find("e")
1

>>> a.index("d")
10

>>> a.find("!")
-1

#找不到目标字符时,index会报错
>>> a.index("!")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: substring not found

starstwith和endswith

>>> f = "2020-11-22-xxxxx"
>>> f.startswith("2020-11-22")
True

>>> f = "xxxx.jpg"
>>> f.endswith("jpg")
True

UPDATE(更新)

replace(替换)

返回的是一个新的字符串

>>> a = "Hello werld,hello werld"
>>> a.replace("wer","wor")
'Hello world,hello world'

split(分割)

>>> a = "<>,<>,<>"
>>> a.split(",")
['<>', '<>', '<>']

join(拼接)

>>> b
['<>', '<>', '<>']
>>> ",".join(b)
'<>,<>,<>'

DELETE(删除)

strip

>>> a ="      hello,world          "
>>> a
'      hello,world          '
>>> a.strip()
'hello,world'

lstrip

>>> a = "         hello world"
>>> a
'         hello world'
>>> a.lstrip()
'hello world'

rstrip

>>> a = "hello world               "
>>> a
'hello world               '
>>> a.rstrip()
'hello world'

字符串的输出和输入

保存到文件

# open函数打开一个文件，没有文件会新建，但路径不对会报错
# 指定文件名，方法(读，写，追加)，编码格式
output = open("output.txt","w",encoding="utf-8")
content = "hello,world"
# 正式写入文件
output.write(content)
# 关闭文件句柄
output.close()

读取文件

input = open("output.txt","r",encoding="utf-8")
#获取文件中的内容
content = input.read()
print(content)

#暂时理解为只读取一遍
content_2 = input.read()
print(content_2)

追加文件

output = open("output.txt","a",encoding="utf-8")
content = "nhello,world"
# 正式写入文件
output.write(content)
# 关闭文件句柄
output.close()

字符串的格式化输出

format

按传入参数默认顺序

a = "playing"
b = "basketball"

print("I like {} {}!".format(a,b))

按指定参数索引

a = "playing"
b = "basketball"

print("I like {0} {1}!{0} {1} is fun!".format(a,b))

按关键词参数

print("I like {a} {b}!{a} {b} is fun!".
      format(a = "playing",b = "basketball"))

按变量(推荐,但是只有3.6以上才可以使用)

a = "playing"
b = "basketball"

print(f"I like {a} {b}!")

小数的表示

a ="{:.2f}".format(3.289)
print(a)

>>> "playing %s %s"%("ping","pong")
'playing ping pong'

课后作业

练习字符串的编码与解码

编码

>>> "牛批".encode("utf-8")
b'xe7x89x9bxe6x89xb9'

解码

>>> a="牛批".encode("utf-8")
>>> a
b'xe7x89x9bxe6x89xb9'
>>> a.decode("utf-8")
'牛批'

练习字符串的CRUD

Create

>>> a = "哈哈哈哈哈"
>>> a
'哈哈哈哈哈'

Retrieve

find和index

>>> a.find("哈")
0
>>> a[1]
'哈'
>>> a.index("哈")
0

Update

split

>>> a = "p o p o p "
>>> a.split(" ")
['p', 'o', 'p', 'o', 'p', '']

join

>>> ",".join(a.split(" "))
'p,o,p,o,p,'

Delete

>>> a = "     哈麻批      "
>>> a
'     哈麻批      '
>>> a.strip()
'哈麻批'

练习字符串的格式化

a = "你哈皮"
b = "哈？"

print(f"{a},{b}")

将content内容保存到本地文件

content = "你哈皮,哈？"
output = open("output.txt","w",encoding="utf-8")
output.write(content)
output.close()

Note3 字符串

Java相关栏目本月热门文章