python字符集转换

Post author:xfxia
Post published:2023年9月10日
Post category:python

1、python3 encode和decode：

# –*– coding:utf8 –*–
- 在python文件的开头，告知编译器使用哪一种编码格式来解释文件；
encode() 和 decode()
- 字符串 <=> 字节码，编码和解码就是在字符串和字节码之间进行转换；
- encode() 文件编码格式，默认为utf8，也可以指定其他格式encode(“gbk”)
- decode() 文件解码格式，默认utf8，也可以指定解码格式decode(“gbk”)
python3中，取消了unicode类型，代替它的是使用unicode的字符串类型str
python3中，对文本和二进制数据做了更清晰的区分，不再对bytes字串进行自动编码。文本总是unicode，由str表示，二进制数据由bytes表示。
python3中，不同字符集之间转换，都要先转换成unicode。

# --*-- coding:utf8 --*--

str = "你好"
print(type(str))

# 字符串str，由unicode转换成gb2312
gb_str = str.encode("gb2312")
print(type(gb_str))

#字符串gb_str 直接转成utf-8
"""
utf_str = gb_str.encode("utf8")
"""

# 不同字符集之间不能直接转换，需要先解码转成unicode
unicode_str = gb_str.decode("gb2312")
print(type(unicode_str))
utf_str = unicode_str.encode("utf8")
print(type(utf_str))
print("============================")

# utf8 不能直接转成其他格式，需要解码为unicode
unicode_str = utf_str.decode("utf8")
print(type(unicode_str))
gb_str = unicode_str.encode("gb2312")
print(type(utf_str))

运行结果：

<class 'str'>
<class 'bytes'>
<class 'str'>
<class 'bytes'>
============================
<class 'str'>
<class 'bytes'>

Process finished with exit code 0

2、python中Socket：

套接字之间发送消息方法中，发送的数据为bytes类型，需要进行字符集转换。

3、python中 u、r、 b：

r：非转义的原始字符串，取消字符串中转义字符的效果，不进行转义 ‘\’;
u: 表示unicode字符串，在python3中字符串就是unicode编码，但还是建议中文前添加u”你好”
b：把字符串转成bytes，但是仅限于ASCII内包含的，否则会报错：SyntaxError: bytes can only contain ASCII literal characters。通过b转成bytes的字节，可以通过解码转换成字符串unicode

# --*-- coding:utf8 --*--

b_str = b"hello world"
print(type(b_str))

# 使用utf8 解码为字符串unicode
unicode_str = b_str.decode("utf8")
print(type(unicode_str))

b_str = b"hello world"
print(type(b_str))

# 使用gb2312 解码为字符串unicode
unicode_str = b_str.decode("gb2312")
print(type(unicode_str))

Result：

<class 'bytes'>
<class 'str'>
=========================
<class 'bytes'>
<class 'str'>

你可能也喜欢