utf8
8 分钟阅读
Package utf8 implements functions and constants to support text encoded in UTF-8. It includes functions to translate between runes and UTF-8 byte sequences. See https://en.wikipedia.org/wiki/UTF-8
utf8包实现了支持使用 UTF-8 编码的文本的函数和常量。它包括了在符文和 UTF-8 字节序列之间进行转换的函数。参见 https://en.wikipedia.org/wiki/UTF-8。
常量
| |
Numbers fundamental to the encoding.
这些数字是编码中的基本要素。
变量
This section is empty.
函数
func AppendRune <- go1.18
| |
AppendRune appends the UTF-8 encoding of r to the end of p and returns the extended buffer. If the rune is out of range, it appends the encoding of RuneError.
AppendRune函数将 r 的 UTF-8 编码附加到 p 的结尾并返回扩展后的缓冲区。如果符文超出范围,则附加 RuneError 的编码。
AppendRune Example
| |
func DecodeLastRune
| |
DecodeLastRune unpacks the last UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeLastRune函数解码p中的最后一个UTF-8编码,并返回该符文及其占用的字节数。如果p为空,则返回(RuneError, 0)。否则,如果编码无效,则返回(RuneError, 1)。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeLastRune Example
| |
func DecodeLastRuneInString
| |
DecodeLastRuneInString is like DecodeLastRune but its input is a string. If s is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeLastRuneInString函数类似于DecodeLastRune,但其输入为字符串。如果s为空,则返回(RuneError, 0)。否则,如果编码无效,则返回(RuneError, 1)。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeLastRuneInString Example
| |
func DecodeRune
| |
DecodeRune unpacks the first UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeRune函数解码p中的第一个UTF-8编码,并返回该符文及其占用的字节数。如果p为空,则返回(RuneError, 0)。否则,如果编码无效,则返回(RuneError,1)。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeRune Example
| |
func DecodeRuneInString
| |
DecodeRuneInString is like DecodeRune but its input is a string. If s is empty it returns (RuneError, 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8.
DecodeRuneInString函数类似于DecodeRune函数,但其输入为字符串。如果s为空,则返回(RuneError, 0)。否则,如果编码无效,则返回(RuneError, 1)。对于正确的非空UTF-8,这两种情况都是不可能的。
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
如果编码不正确,编码超出范围或不是该值的最短可能UTF-8编码,则编码无效。不执行其他验证。
DecodeRuneInString Example
| |
func EncodeRune
| |
EncodeRune writes into p (which must be large enough) the UTF-8 encoding of the rune. If the rune is out of range, it writes the encoding of RuneError. It returns the number of bytes written.
EncodeRune函数将rune的UTF-8编码写入p(p必须足够大)。如果rune超出范围,则写入RuneError的编码。返回写入的字节数。
EncodeRune Example
| |
EncodeRune Example(OutOfRange)
| |
func FullRune
| |
FullRune reports whether the bytes in p begin with a full UTF-8 encoding of a rune. An invalid encoding is considered a full Rune since it will convert as a width-1 error rune.
FullRune函数报告p中的字节是否以完整的UTF-8符文编码开头。无效的编码被认为是完整的符文,因为它们将转换为宽度为1的错误符文。
FullRune Example
| |
func FullRuneInString
| |
FullRuneInString is like FullRune but its input is a string.
FullRuneInString函数类似于FullRune函数,但其输入是字符串。
FullRuneInString Example
| |
func RuneCount
| |
RuneCount returns the number of runes in p. Erroneous and short encodings are treated as single runes of width 1 byte.
RuneCount函数返回p中符文的数量。错误和短编码被视为宽度为1个字节的单个符文。
RuneCount Example
| |
func RuneCountInString
| |
RuneCountInString is like RuneCount but its input is a string.
RuneCountInString函数类似于RuneCount,但其输入是字符串。
RuneCountInString Example
| |
func RuneLen
| |
RuneLen returns the number of bytes required to encode the rune. It returns -1 if the rune is not a valid value to encode in UTF-8.
RuneLen函数返回编码符文所需的字节数。如果符文不是UTF-8的有效值,则返回-1。
RuneLen Example
| |
func RuneStart
| |
RuneStart reports whether the byte could be the first byte of an encoded, possibly invalid rune. Second and subsequent bytes always have the top two bits set to 10.
RuneStart函数报告这个字节是否可能是编码的(可能是无效的)rune的第一个字节。第二个和随后的字节总是将最高的两位设置为10。
RuneStart Example
| |
func Valid
| |
Valid reports whether p consists entirely of valid UTF-8-encoded runes.
Valid函数报告p是否完全由有效的UTF-8编码符文组成。
Valid Example
| |
func ValidRune <- go1.1
| |
ValidRune reports whether r can be legally encoded as UTF-8. Code points that are out of range or a surrogate half are illegal.
ValidRune函数报告r是否可以合法地编码为UTF-8。超出范围或代理对的一半的代码点是非法的。
个人注释
surrogate half 是指“代理对的一半”。在UTF-16编码中,某些Unicode字符需要用两个16位代码单元来表示,这种特殊的两个代码单元的组合被称为代理对(surrogate pair)。而"surrogate half"就是指这个代理对中的一个16位代码单元。
代理项(Surrogate),是Unicode编码方式之一UTF-16中的特殊概念,主要用于表示那些无法用单个16位单元完全表示的字符。在UTF-16编码中,为补充字符分配两个16位的Unicode代码单元:第一个代码单元被称为高代理项代码单元或前导代码单元;而第二个代码单元则被称为低代理项代码单元或后随代码单元。当某个字符的编号大于65536时,就会使用这两个代理项来共同表示,这种表示方法称为"代理对"。如果某程序在处理这类16位项目时遇到值在0xD800到0xDFFF范围内的数值,那么它就会知道需要将其与前一个或后一个16位值配对,从而获取完整的字符信息。
ValidRune Example
| |
func ValidString
| |
ValidString reports whether s consists entirely of valid UTF-8-encoded runes.
ValidString函数报告s是否完全由有效的UTF-8编码符文组成。
ValidString Example
| |
类型
This section is empty.