utf8
8 分钟阅读
Package utf8 implements functions and constants to support text encoded in UTF-8. It includes functions to translate between runes and UTF-8 byte sequences. See https://en.wikipedia.org/wiki/UTF-8
utf8
包实现了支持使用 UTF-8 编码的文本的函数和常量。它包括了在符文和 UTF-8 字节序列之间进行转换的函数。参见 https://en.wikipedia.org/wiki/UTF-8。
常量
|
|
Numbers fundamental to the encoding.
这些数字是编码中的基本要素。
变量
This section is empty.
函数
func AppendRune <- go1.18
|
|
AppendRune appends the UTF-8 encoding of r to the end of p and returns the extended buffer. If the rune is out of range, it appends the encoding of RuneError.
AppendRune
函数将 r
的 UTF-8 编码附加到 p
的结尾并返回扩展后的缓冲区。如果符文超出范围,则附加 RuneError 的编码。
AppendRune Example
|
|
func DecodeLastRune
|
|
DecodeLastRune unpacks the last UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeLastRune
函数解码p
中的最后一个UTF-8编码,并返回该符文及其占用的字节数。如果p
为空,则返回(RuneError, 0)
。否则,如果编码无效,则返回(RuneError, 1)
。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeLastRune Example
|
|
func DecodeLastRuneInString
|
|
DecodeLastRuneInString is like DecodeLastRune but its input is a string. If s is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeLastRuneInString
函数类似于DecodeLastRune
,但其输入为字符串。如果s
为空,则返回(RuneError, 0)
。否则,如果编码无效,则返回(RuneError, 1)
。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeLastRuneInString Example
|
|
func DecodeRune
|
|
DecodeRune unpacks the first UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeRune
函数解码p
中的第一个UTF-8编码,并返回该符文及其占用的字节数。如果p
为空,则返回(RuneError, 0)
。否则,如果编码无效,则返回(RuneError,1)
。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeRune Example
|
|
func DecodeRuneInString
|
|
DecodeRuneInString is like DecodeRune but its input is a string. If s is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeRuneInString
函数类似于DecodeRune
函数,但其输入为字符串。如果s
为空,则返回(RuneError, 0)
。否则,如果编码无效,则返回(RuneError, 1)
。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeRuneInString Example
|
|
func EncodeRune
|
|
EncodeRune writes into p (which must be large enough) the UTF-8 encoding of the rune. If the rune is out of range, it writes the encoding of RuneError. It returns the number of bytes written.
EncodeRune
函数将rune
的UTF-8编码写入p
(p
必须足够大)。如果rune
超出范围,则写入RuneError
的编码。返回写入的字节数。
EncodeRune Example
|
|
EncodeRune Example(OutOfRange)
|
|
func FullRune
|
|
FullRune reports whether the bytes in p begin with a full UTF-8 encoding of a rune. An invalid encoding is considered a full Rune since it will convert as a width-1 error rune.
FullRune
函数报告p
中的字节是否以完整的UTF-8符文编码开头。无效的编码被认为是完整的符文,因为它们将转换为宽度为1
的错误符文。
FullRune Example
|
|
func FullRuneInString
|
|
FullRuneInString is like FullRune but its input is a string.
FullRuneInString
函数类似于FullRune
函数,但其输入是字符串。
FullRuneInString Example
|
|
func RuneCount
|
|
RuneCount returns the number of runes in p. Erroneous and short encodings are treated as single runes of width 1 byte.
RuneCount
函数返回p
中符文的数量。错误和短编码被视为宽度为1个字节的单个符文。
RuneCount Example
|
|
func RuneCountInString
|
|
RuneCountInString is like RuneCount but its input is a string.
RuneCountInString
函数类似于RuneCount
,但其输入是字符串。
RuneCountInString Example
|
|
func RuneLen
|
|
RuneLen returns the number of bytes required to encode the rune. It returns -1 if the rune is not a valid value to encode in UTF-8.
RuneLen
函数返回编码符文所需的字节数。如果符文不是UTF-8的有效值,则返回-1
。
RuneLen Example
|
|
func RuneStart
|
|
RuneStart reports whether the byte could be the first byte of an encoded, possibly invalid rune. Second and subsequent bytes always have the top two bits set to 10.
RuneStart
函数报告这个字节是否可能是编码的(可能是无效的)rune
的第一个字节。第二个和随后的字节总是将最高的两位设置为10
。
RuneStart Example
|
|
func Valid
|
|
Valid reports whether p consists entirely of valid UTF-8-encoded runes.
Valid
函数报告p
是否完全由有效的UTF-8编码符文组成。
Valid Example
|
|
func ValidRune <- go1.1
|
|
ValidRune reports whether r can be legally encoded as UTF-8. Code points that are out of range or a surrogate half are illegal.
ValidRune
函数报告r
是否可以合法地编码为UTF-8。超出范围或代理对的一半的代码点是非法的。
个人注释
surrogate half 是指“代理对的一半”。在UTF-16编码中,某些Unicode字符需要用两个16位代码单元来表示,这种特殊的两个代码单元的组合被称为代理对(surrogate pair)。而"surrogate half"就是指这个代理对中的一个16位代码单元。
代理项(Surrogate),是Unicode编码方式之一UTF-16中的特殊概念,主要用于表示那些无法用单个16位单元完全表示的字符。在UTF-16编码中,为补充字符分配两个16位的Unicode代码单元:第一个代码单元被称为高代理项代码单元或前导代码单元;而第二个代码单元则被称为低代理项代码单元或后随代码单元。当某个字符的编号大于65536时,就会使用这两个代理项来共同表示,这种表示方法称为"代理对"。如果某程序在处理这类16位项目时遇到值在0xD800到0xDFFF范围内的数值,那么它就会知道需要将其与前一个或后一个16位值配对,从而获取完整的字符信息。
ValidRune Example
|
|
func ValidString
|
|
ValidString reports whether s consists entirely of valid UTF-8-encoded runes.
ValidString
函数报告s
是否完全由有效的UTF-8编码符文组成。
ValidString Example
|
|
类型
This section is empty.