scanner
6 分钟阅读
scanner
https://pkg.go.dev/text/scanner@go1.20.1
Package scanner provides a scanner and tokenizer for UTF-8-encoded text. It takes an io.Reader providing the source, which then can be tokenized through repeated calls to the Scan function. For compatibility with existing tools, the NUL character is not allowed. If the first character in the source is a UTF-8 encoded byte order mark (BOM), it is discarded.
Package scanner提供了用于UTF-8编码文本的扫描器和标记器。它接受一个提供源代码的io.Reader,然后可以通过重复调用Scan函数来对其进行标记化。为了与现有工具兼容,不允许出现NUL字符。如果源代码中的第一个字符是UTF-8编码的字节顺序标记(BOM),它将被丢弃。
By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different identifier and white space characters.
默认情况下,Scanner会跳过空白字符和Go注释,并识别符合Go语言规范定义的所有字面量。它可以定制为仅识别这些字面量的子集,并识别不同的标识符和空白字符。
Example
|
|
Example
|
|
Example
|
|
Example
|
|
常量
|
|
Predefined mode bits to control recognition of tokens. For instance, to configure a Scanner such that it only recognizes (Go) identifiers, integers, and skips comments, set the Scanner’s Mode field to:
预定义的模式位,用于控制对标记的识别。例如,要配置一个Scanner,使其仅识别(Go)标识符、整数,并跳过注释,可以将Scanner的Mode字段设置为:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is set, unrecognized tokens are not ignored. Instead, the scanner simply returns the respective individual characters (or possibly sub-tokens). For instance, if the mode is ScanIdents (not ScanStrings), the string “foo” is scanned as the token sequence ‘"’ Ident ‘"’.
除了如果设置了SkipComments,则跳过注释,否则不会忽略无法识别的标记。相反,扫描器只会返回相应的单个字符(或可能是子标记)。例如,如果模式是ScanIdents(而不是ScanStrings),则字符串"foo"将被扫描为标记序列’"’ Ident ‘"。
Use GoTokens to configure the Scanner such that it accepts all Go literal tokens including Go identifiers. Comments will be skipped.
使用GoTokens来配置Scanner,使其接受包括Go标识符在内的所有Go字面量标记。注释将被跳过。
|
|
The result of Scan is one of these tokens or a Unicode character.
Scan的结果是这些标记之一或Unicode字符。
|
|
GoWhitespace is the default value for the Scanner’s Whitespace field. Its value selects Go’s white space characters.
GoWhitespace是Scanner的Whitespace字段的默认值。它的值选择了Go
变量
This section is empty.
函数
func TokenString
|
|
TokenString returns a printable string for a token or Unicode character.
TokenString返回标记或Unicode字符的可打印字符串。
类型
type Position
|
|
Position is a value that represents a source position. A position is valid if Line > 0.
Position是表示源代码位置的值。如果Line > 0,则位置是有效的。
(*Position) IsValid
|
|
IsValid reports whether the position is valid.
IsValid报告位置是否有效。
(Position) String
|
|
type Scanner
|
|
A Scanner implements reading of Unicode characters and tokens from an io.Reader.
Scanner实现了从io.Reader中读取Unicode字符和标记。
(*Scanner) Init
|
|
Init initializes a Scanner with a new source and returns s. Error is set to nil, ErrorCount is set to 0, Mode is set to GoTokens, and Whitespace is set to GoWhitespace.
Init使用新的源代码初始化Scanner并返回s。Error设置为nil,ErrorCount设置为0,Mode设置为GoTokens,Whitespace设置为GoWhitespace。
(*Scanner) Next
|
|
Next reads and returns the next Unicode character. It returns EOF at the end of the source. It reports a read error by calling s.Error, if not nil; otherwise it prints an error message to os.Stderr. Next does not update the Scanner’s Position field; use Pos() to get the current position.
Next读取并返回下一个Unicode字符。在源代码的末尾返回EOF。如果不为nil,则通过调用s.Error报告读取错误;否则,它将打印错误消息到os.Stderr。Next不会更新Scanner的Position字段;使用Pos()获取当前位置。
(*Scanner) Peek
|
|
Peek returns the next Unicode character in the source without advancing the scanner. It returns EOF if the scanner’s position is at the last character of the source.
Peek返回源代码中的下一个Unicode字符,但不推进扫描器。如果扫描器的位置在源代码的最后一个字符处,则返回EOF。
(*Scanner) Pos
|
|
Pos returns the position of the character immediately after the character or token returned by the last call to Next or Scan. Use the Scanner’s Position field for the start position of the most recently scanned token.
Pos返回在上次调用Next或Scan时返回的字符或标记之后的字符的位置。使用Scanner的Position字段获取最近扫描的标记的起始位置。
(*Scanner) Scan
|
|
Scan reads the next token or Unicode character from source and returns it. It only recognizes tokens t for which the respective Mode bit (1«-t) is set. It returns EOF at the end of the source. It reports scanner errors (read and token errors) by calling s.Error, if not nil; otherwise it prints an error message to os.Stderr.
Scan从源代码中读取下一个标记或Unicode字符并返回。它仅识别设置了相应Mode位(1«-t)的标记t。在源代码的末尾返回EOF。如果不为nil,则通过调用s.Error报告扫描器错误(读取和标记错误);否则,它将打印错误消息到os.Stderr。
(*Scanner) TokenText
|
|
TokenText returns the string corresponding to the most recently scanned token. Valid after calling Scan and in calls of Scanner.Error.
TokenText返回与最近扫描的标记对应的字符串。在调用Scan后和调用Scanner.Error时有效。