用于协议缓冲区的新 go API

A new Go API for Protocol Buffers - 用于协议缓冲区的新 go API

https://go.dev/blog/protobuf-apiv2

Joe Tsai, Damien Neil, and Herbie Ong 2 March 2020

Introduction 简介

We are pleased to announce the release of a major revision of the Go API for protocol buffers, Google’s language-neutral data interchange format.

我们很高兴地宣布,针对谷歌语言中立的数据交换格式–协议缓冲区的Go API的重大修订已经发布。

Motivations for a new API 新API的动因

The first protocol buffer bindings for Go were announced by Rob Pike in March of 2010. Go 1 would not be released for another two years.

Rob Pike 在 2010 年 3 月宣布了 Go 的第一个协议缓冲区绑定。Go 1将在两年后才发布。

In the decade since that first release, the package has grown and developed along with Go. Its users’ requirements have grown too.

在第一次发布后的十年里,该软件包与Go一起成长和发展。其用户的需求也在增长。

Many people want to write programs that use reflection to examine protocol buffer messages. The reflect package provides a view of Go types and values, but omits information from the protocol buffer type system. For example, we might want to write a function that traverses a log entry and clears any field annotated as containing sensitive data. The annotations are not part of the Go type system.

许多人想要编写使用反射来检查协议缓冲区信息的程序。反射包提供了一个Go类型和值的视图,但省略了协议缓冲区类型系统的信息。例如,我们可能想写一个函数,遍历一个日志条目并清除任何被注释为包含敏感数据的字段。这些注释不是Go类型系统的一部分。

Another common desire is to use data structures other than the ones generated by the protocol buffer compiler, such as a dynamic message type capable of representing messages whose type is not known at compile time.

另一个常见的愿望是使用协议缓冲区编译器生成的数据结构以外的数据结构,例如能够表示在编译时不知道类型的消息的动态消息类型。

We also observed that a frequent source of problems was that the proto.Message interface, which identifies values of generated message types, does very little to describe the behavior of those types. When users create types that implement that interface (often inadvertently by embedding a message in another struct) and pass values of those types to functions expecting a generated message value, programs crash or behave unpredictably.

我们还观察到,一个经常出现的问题来源是proto.Message接口,它识别了生成的消息类型的值,但对这些类型的行为描述很少。当用户创建了实现该接口的类型(通常是通过在另一个结构中嵌入一个消息),并将这些类型的值传递给期望生成一个消息值的函数时,程序会崩溃或表现得不可预测。

All three of these problems have a common cause, and a common solution: The Message interface should fully specify the behavior of a message, and functions operating on Message values should freely accept any type that correctly implements the interface.

这三个问题都有一个共同的原因,也有一个共同的解决方案。消息接口应该完全指定消息的行为,对消息值进行操作的函数应该自由地接受任何正确实现该接口的类型。

Since it is not possible to change the existing definition of the Message type while keeping the package API compatible, we decided that it was time to begin work on a new, incompatible major version of the protobuf module.

由于不可能在保持包的API兼容的情况下改变现有的Message类型的定义,我们决定是时候开始在protobuf模块的一个新的、不兼容的主要版本上工作了。

Today, we’re pleased to release that new module. We hope you like it.

今天,我们很高兴发布这个新模块。我们希望您喜欢它。

Reflection 反射

Reflection is the flagship feature of the new implementation. Similar to how the reflect package provides a view of Go types and values, the google.golang.org/protobuf/reflect/protoreflect package provides a view of values according to the protocol buffer type system.

反射是新实现的旗舰功能。与reflect包提供Go类型和值的视图类似,google.golang.org/protobuf/reflect/protoreflect包根据协议缓冲区类型系统提供值的视图。

A complete description of the protoreflect package would run too long for this post, but let’s look at how we might write the log-scrubbing function we mentioned previously.

对protoreflect包的完整描述对这篇文章来说太长了,但让我们来看看我们如何编写之前提到的log-scrubbing函数。

First, we’ll write a .proto file defining an extension of the google.protobuf.FieldOptions type so we can annotate fields as containing sensitive information or not.

首先,我们要写一个.proto文件,定义google.protobuf.FieldOptions类型的扩展,这样我们就可以把字段注释为是否包含敏感信息。

syntax = "proto3";
import "google/protobuf/descriptor.proto";
package golang.example.policy;
extend google.protobuf.FieldOptions {
    bool non_sensitive = 50000;
}

We can use this option to mark certain fields as non-sensitive.

我们可以使用这个选项将某些字段标记为非敏感。

message MyMessage {
    string public_name = 1 [(golang.example.policy.non_sensitive) = true];
}

Next, we will write a Go function which accepts an arbitrary message value and removes all the sensitive fields.

接下来,我们将编写一个Go函数,接受一个任意的消息值,并删除所有的敏感字段。

1
2
3
4
// Redact clears every sensitive field in pb.
func Redact(pb proto.Message) {
   // ...
}

This function accepts a proto.Message, an interface type implemented by all generated message types. This type is an alias for one defined in the protoreflect package:

这个函数接受一个proto.Message,一个由所有生成的消息类型实现的接口类型。这个类型是protoreflect包中定义的一个的别名:

1
2
3
type ProtoMessage interface{
    ProtoReflect() Message
}

To avoid filling up the namespace of generated messages, the interface contains only a single method returning a protoreflect.Message, which provides access to the message contents.

为了避免填满生成消息的命名空间,该接口只包含一个返回protoreflect.Message的方法,它提供了对消息内容的访问。

(Why an alias? Because protoreflect.Message has a corresponding method returning the original proto.Message, and we need to avoid an import cycle between the two packages.)

(为什么要有一个别名?因为protoreflect.Message有一个相应的方法返回原始的proto.Message,而我们需要避免两个包之间的导入循环)。

The protoreflect.Message.Range method calls a function for every populated field in a message.

protoreflect.Message.Range方法为消息中每个填充的字段调用一个函数。

1
2
3
4
5
m := pb.ProtoReflect()
m.Range(func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
    // ...
    return true
})

The range function is called with a protoreflect.FieldDescriptor describing the protocol buffer type of the field, and a protoreflect.Value containing the field value.

范围函数是用一个描述字段的协议缓冲区类型的protoreflect.FieldDescriptor和一个包含字段值的protoreflect.Value调用的。

The protoreflect.FieldDescriptor.Options method returns the field options as a google.protobuf.FieldOptions message.

protoreflect.FieldDescriptor.Options方法以google.protobuf.FieldOptions消息返回字段选项。

opts := fd.Options().(*descriptorpb.FieldOptions)

(Why the type assertion? Since the generated descriptorpb package depends on protoreflect, the protoreflect package can’t return the concrete options type without causing an import cycle.)

(为什么要做类型断言?因为生成的descriptorpb包依赖于protoreflect,protoreflect包不能返回具体的选项类型而不引起导入循环)。

We can then check the options to see the value of our extension boolean:

然后我们可以检查选项,看看我们的扩展布尔值:

1
2
3
if proto.GetExtension(opts, policypb.E_NonSensitive).(bool) {
    return true // don't redact non-sensitive fields
}

Note that we are looking at the field descriptor here, not the field value. The information we’re interested in lies in the protocol buffer type system, not the Go one.

注意,我们在这里看的是字段描述符,而不是字段值。我们感兴趣的信息是在协议缓冲区类型系统中,而不是在Go系统中。

This is also an example of an area where we have simplified the proto package API. The original proto.GetExtension returned both a value and an error. The new proto.GetExtension returns just a value, returning the default value for the field if it is not present. Extension decoding errors are reported at Unmarshal time.

这也是我们简化proto包API的一个例子。原来的proto.GetExtension同时返回一个值和一个错误。新的proto.GetExtension只返回一个值,如果不存在,则返回该字段的默认值。扩展解码错误会在Unmarshal时报告。

Once we have identified a field that needs redaction, clearing it is simple:

一旦我们确定了一个需要编辑的字段,清除它就很简单了:

1
m.Clear(fd)

Putting all the above together, our complete redaction function is:

将上述所有内容放在一起,我们完整的编辑功能是:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Redact clears every sensitive field in pb.
func Redact(pb proto.Message) {
    m := pb.ProtoReflect()
    m.Range(func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
        opts := fd.Options().(*descriptorpb.FieldOptions)
        if proto.GetExtension(opts, policypb.E_NonSensitive).(bool) {
            return true
        }
        m.Clear(fd)
        return true
    })
}

A more complete implementation might recursively descend into message-valued fields. We hope that this simple example gives a taste of protocol buffer reflection and its uses.

一个更完整的实现可能会递归到消息值字段。我们希望这个简单的例子能让大家体会到协议缓冲区反射及其用途。

Versions 版本

We call the original version of Go protocol buffers APIv1, and the new one APIv2. Because APIv2 is not backwards compatible with APIv1, we need to use different module paths for each.

我们把Go协议缓冲区的原始版本称为APIv1,而新版本称为APIv2。因为APIv2与APIv1并不向后兼容,所以我们需要为两者使用不同的模块路径。

(These API versions are not the same as the versions of the protocol buffer language: proto1, proto2, and proto3. APIv1 and APIv2 are concrete implementations in Go that both support the proto2 and proto3 language versions.)

(这些API版本与协议缓冲区语言的版本不同:proto1、proto2和proto3)。APIv1和APIv2是Go中的具体实现,都支持proto2和proto3的语言版本)。

The github.com/golang/protobuf module is APIv1.

github.com/golang/protobuf模块是APIv1。

The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider. (We considered google.golang.org/protobuf/v2, to make it clear that this is the second major version of the API, but settled on the shorter path as being the better choice in the long term.)

google.golang.org/protobuf模块是APIv2。我们利用了改变导入路径的需要,换成了不与特定主机提供商绑定的路径。(我们考虑过 google.golang.org/protobuf/v2,以表明这是 API 的第二个主要版本,但最终选择了较短的路径,因为从长远来看是更好的选择。)

We know that not all users will move to a new major version of a package at the same rate. Some will switch quickly; others may remain on the old version indefinitely. Even within a single program, some parts may use one API while others use another. It is essential, therefore, that we continue to support programs that use APIv1.

我们知道并不是所有的用户都会以同样的速度转移到一个新的软件包的主要版本。有些人会很快转换;有些人可能会无限期地留在旧版本上。即使在一个程序中,有些部分可能使用一个API,而其他部分则使用另一个。因此,我们必须继续支持使用APIv1的程序。

  • github.com/golang/protobuf@v1.3.4 is the most recent pre-APIv2 version of APIv1. github.com/golang/protobuf@v1.3.4 是APIv1的最新的前APIv2版本。
  • github.com/golang/protobuf@v1.4.0 is a version of APIv1 implemented in terms of APIv2. The API is the same, but the underlying implementation is backed by the new one. This version contains functions to convert between the APIv1 and APIv2 proto.Message interfaces to ease the transition between the two.github.com/golang/protobuf@v1.4.0 是以APIv2实现的APIv1的版本。API是相同的,但底层实现是由新的API支持的。这个版本包含了在APIv1和APIv2的proto.Message接口之间转换的函数,以方便两者之间的转换。
  • google.golang.org/protobuf@v1.20.0 is APIv2. This module depends upon github.com/golang/protobuf@v1.4.0, so any program which uses APIv2 will automatically pick a version of APIv1 which integrates with it. google.golang.org/protobuf@v1.20.0 是APIv2。这个模块依赖于github.com/golang/protobuf@v1.4.0,所以任何使用APIv2的程序都会自动选择一个与之整合的APIv1版本。

(Why start at version v1.20.0? To provide clarity. We do not anticipate APIv1 to ever reach v1.20.0, so the version number alone should be enough to unambiguously differentiate between APIv1 and APIv2.)

(为什么从v1.20.0版本开始?为了提供明确的信息。我们预计APIv1不会达到v1.20.0版本,所以单单版本号就足以明确区分APIv1和APIv2了)。

We intend to maintain support for APIv1 indefinitely.

我们打算无限期地维持对APIv1的支持。

This organization ensures that any given program will use only a single protocol buffer implementation, regardless of which API version it uses. It permits programs to adopt the new API gradually, or not at all, while still gaining the advantages of the new implementation. The principle of minimum version selection means that programs may remain on the old implementation until the maintainers choose to update to the new one (either directly, or by updating a dependency).

这种组织方式确保任何给定的程序将只使用一个协议缓冲区的实现,而不管它使用的是哪个API版本。它允许程序逐步采用新的API,或者根本不采用,同时仍然获得新实现的优势。最小版本选择原则意味着程序可以保留在旧的实现上,直到维护者选择更新到新的实现(直接或通过更新依赖关系)。

Additional features of note 其他值得注意的特性

The google.golang.org/protobuf/encoding/protojson package converts protocol buffer messages to and from JSON using the canonical JSON mapping, and fixes a number of issues with the old jsonpb package that were difficult to change without causing problems for existing users.

google.golang.org/protobuf/encoding/protojson包使用规范的JSON映射将协议缓冲区信息转换为JSON,并修复了旧的jsonpb包的一些问题,这些问题很难在不给现有用户带来问题的情况下进行修改。

The google.golang.org/protobuf/types/dynamicpb package provides an implementation of proto.Message for messages whose protocol buffer type is derived at runtime.

google.golang.org/protobuf/types/dynamicpb包提供了proto.Message的一个实现,用于在运行时派生协议缓冲区类型的消息。

The google.golang.org/protobuf/testing/protocmp package provides functions to compare protocol buffer messages with the github.com/google/cmp package.

google.golang.org/protobuf/testing/protocmp包提供了与github.com/google/cmp包比较协议缓冲区消息的函数。

The google.golang.org/protobuf/compiler/protogen package provides support for writing protocol compiler plugins.

google.golang.org/protobuf/compiler/protogen包提供了对编写协议编译器插件的支持。

Conclusion 结论

The google.golang.org/protobuf module is a major overhaul of Go’s support for protocol buffers, providing first-class support for reflection, custom message implementations, and a cleaned up API surface. We intend to maintain the previous API indefinitely as a wrapper of the new one, allowing users to adopt the new API incrementally at their own pace.

google.golang.org/protobuf模块是对Go的协议缓冲区支持的一次大修,提供了对反射、自定义消息实现的一流支持,并清理了API的表面。我们打算无限期地维护以前的API,作为新API的封装,允许用户按照自己的节奏逐步采用新API。

Our goal in this update is to improve upon the benefits of the old API while addressing its shortcomings. As we completed each component of the new implementation, we put it into use within Google’s codebase. This incremental rollout has given us confidence in both the usability of the new API and the performance and correctness of the new implementation. We believe it is production ready.

我们在这次更新中的目标是改进旧API的优点,同时解决其缺点。当我们完成了新实现的每一个组件时,我们在谷歌的代码库中把它投入使用。这种渐进式的推广使我们对新API的可用性以及新实现的性能和正确性都充满信心。我们相信它已经为生产做好了准备。

We are excited about this release and hope that it will serve the Go ecosystem for the next ten years and beyond!

我们对这个版本感到兴奋,并希望它能在未来十年甚至更久的时间里为Go生态系统服务。