技术说明 6

原文:https://www.lua.org/notes/ltn006.html

Technical Note 6 技术说明 6

Last update: Wed Jun 16 10:43:27 BRT 2004

Weak references: implementation and use in Lua 弱引用:Lua 中的实现和使用

by John Belmonte 作者:John Belmonte

Overview 概述

In computer languages such as Lua that employ garbage collection, a reference to an object is said to be weak if it does not prevent collection of the object. Weak references are useful for determining when an object has been collected and for caching objects without impeding their collection.

​ 在使用垃圾回收的计算机语言(如 Lua)中,如果引用不会阻止对象被回收,则称该引用是弱引用。弱引用可用于确定对象何时被回收,以及在不阻止对象被回收的情况下缓存对象。

Although weak references are available in the Lua C API, there is no standard support in the Lua language itself. This note proposes an interface for weak references in Lua, describes an implementation, and also presents some practical examples of their use: safe destructor events for table objects and object caching.

​ 尽管 Lua C API 中提供了弱引用,但 Lua 语言本身并不提供标准支持。本说明提出了 Lua 中弱引用的接口,描述了实现,还提供了一些其实际用例:表对象的析构函数安全事件和对象缓存。

Interface 接口

Here is a synopsis of the proposed interface:

​ 以下是提议的接口的概要:

    -- creation
    ref = weakref(obj)
    -- dereference
    obj = ref()

That is, a new global function named “weakref” is used to create a weak reference to an object. Weak references can be dereferenced using the function call operator. A dereference returning nil means that the object has been garbage collected. Since nil has this special meaning weak references to the nil object itself are not allowed.

​ 即,使用名为“weakref”的新全局函数来创建对对象的弱引用。可以使用函数调用运算符取消引用弱引用。返回 nil 的取消引用意味着该对象已被垃圾回收。由于 nil 具有这种特殊含义,因此不允许对 nil 对象本身进行弱引用。

Implementation 实现

The Lua C API provides an interface for referencing Lua objects. Weak references are directly supported by way of lua_ref()’s lock flag: a zero value permits the object to be garbage collected.

​ Lua C API 提供了一个用于引用 Lua 对象的接口。弱引用通过 lua_ref() 的锁定标志直接支持:零值允许对象被垃圾回收。

Our weakref function needs to call lua_ref() and return an object that holds the resulting reference id. Dereferences, implemented with a function call tag method on the reference object, simply call lua_getref(). Finally it’s necessary to release the reference when the reference object itself is collected, so a garbage collection (gc) tag method is used to call lua_unref().

​ 我们的 weakref 函数需要调用 lua_ref() 并返回一个保存结果引用 ID 的对象。取消引用(通过引用对象上的函数调用标记方法实现)只需调用 lua_getref()。最后,当引用对象本身被回收时,有必要释放引用,因此使用垃圾回收 (gc) 标记方法来调用 lua_unref()。

The userdata type is the natural choice for the reference object since it is the only type that provides a gc event. Furthermore, as only a single integer is required, the state can be stored as the userdata pointer itself which eliminates a dynamic memory allocation.

​ 用户数据类型是引用对象的自然选择,因为它是唯一提供 gc 事件的类型。此外,由于只需要一个整数,因此可以将状态存储为用户数据指针本身,从而消除了动态内存分配。

The source code of this implementation is provided as a patch to the official Lua 4.0 distribution and is available here. It is applied with the patch utility as follows:

​ 此实现的源代码作为补丁提供给官方 Lua 4.0 发行版,可在此处获得。它通过补丁实用程序应用,如下所示:

    cd <lua distrubution directory>
    patch -p1 < weakrefs.patch

The patch includes a new addition to the test directory “weakref.lua” which shows a trivial example of the extension. 该补丁包括对测试目录“weakref.lua”的新增内容,其中显示了扩展的一个简单示例。

It is proposed that this implementation be added to Lua’s “baselib” standard library for several reasons: weak references are of general usefulness; the implementation is simple and already supported by the C API; and since only one new Lua function is required it would be overkill to create a separate library for its purpose.

​ 出于以下几个原因,建议将此实现添加到 Lua 的“baselib”标准库中:弱引用具有普遍的实用性;该实现简单,并且 C API 已支持;而且由于只需要一个新的 Lua 函数,因此为其目的创建单独的库将是矫枉过正。

Safe Object Destructors 安全对象析构函数

In a language with garbage collection the most common need for destructors– freeing other objects owned by the object being destructed– is eliminated. As a result Lua programmers seldom miss the lack of gc events for (table) objects. The reason such an event is not supported is mainly to keep the garbage collector simple. If gc events were allowed, the collector would have to handle the case of a destructor making a fresh reference to the object being collected.

​ 在具有垃圾回收的语言中,对析构函数最常见的需求——释放被析构对象拥有的其他对象——被消除了。因此,Lua 程序员很少会错过(表)对象的 gc 事件。不支持此类事件的主要原因是为了保持垃圾回收器简单。如果允许 gc 事件,则回收器必须处理析构函数对正在收集的对象进行全新引用的情况。

However there are cases when an object owns some system resource that is not automatically freed such as file handles, graphic buffers, etc. A somewhat tedious solution is to implement such objects from C with the userdata type, which does have a gc event. Weak references provide an elegant alternative to this, allowing safe garbage collection events for table objects from the comfort of Lua.

​ 但是,在某些情况下,对象拥有某些不会自动释放的系统资源,例如文件句柄、图形缓冲区等。一个有点繁琐的解决方案是用 C 实现此类对象,并使用具有 gc 事件的 userdata 类型。弱引用为此提供了一种优雅的替代方案,允许从 Lua 的舒适性中为表对象提供安全的垃圾回收事件。

The implementation uses a table containing weak reference/ destructor function pairs. When a reference’s object is collected the corresponding destructor function is called. These destructors are safe in that they do not have access to the object being destroyed. Any information required by the destructor (such as resource handles) must be accessible independently from the object. This is fairly light work for Lua thanks to first class function objects.

​ 该实现使用包含弱引用/析构函数对的表。当引用对象被回收时,将调用相应的析构函数。这些析构函数是安全的,因为它们无法访问正在销毁的对象。析构函数所需的任何信息(例如资源句柄)都必须独立于对象访问。这对于 Lua 来说是一项相当轻松的工作,这要归功于一流函数对象。

A small interface is required to manage the table, consisting of a function to bind destructor functions to objects and a function to check for collected objects. Here’s an implementation in Lua:

​ 需要一个小型接口来管理表,该接口包含一个将析构函数绑定到对象的功能和一个检查已回收对象的功能。以下是 Lua 中的实现:

    ------------------------------------------
    -- Destructor manager
    local destructor_table = { }
    function RegisterDestructor(obj, destructor)
        %destructor_table[weakref(obj)] = destructor
    end
    function CheckDestructors()
        local delete_list = { }
        for objref, destructor in %destructor_table do
            if not objref() then
                destructor()
                tinsert(delete_list, objref)
            end
        end
        for i = 1, getn(delete_list) do
            %destructor_table[delete_list[i]] = nil
        end
    end

Instead of calling CheckDestructors() manually at some interval, the natural thing to do is chain it to Lua’s garbage collection cycle. The Lua virtual machine supports this by calling the gc tag method for the nil type at the end of a cycle.

​ 除了在某个时间间隔手动调用 CheckDestructors() 之外,最自然的做法是将其链接到 Lua 的垃圾回收周期。Lua 虚拟机通过在周期结束时调用 nil 类型的 gc 标记方法来支持此操作。

As an example of safe destructor use, consider an object used for logging program messages to a file. When the object is garbage collected we would like the log file to be closed. (This example is trivial because file handles are closed when a program terminates. However the method is easily applied to other types of resources.)

​ 作为安全析构函数使用的示例,考虑一个用于将程序消息记录到文件中的对象。当对象被垃圾回收时,我们希望日志文件被关闭。(此示例很简单,因为当程序终止时文件句柄会被关闭。但是该方法很容易应用于其他类型的资源。)

    ------------------------------------------
    -- example object using safe destructor
    function make_logobj(filename)
        local id = openfile(filename, "w")
        assert(id)
        local obj =
        {
            file = id,
            write = function(self, message)
                write(self.file, message)
            end,
        }
        local destructor = function()
            closefile(%id)
        end
        RegisterDestructor(obj, destructor)
        return obj
    end

Object Caching 对象缓存

Consider a web server that dynamically generates web pages from a database such as a mailing list or program source code. In this kind of application it is common to cache generated pages in memory to improve performance. However if the caching were implemented by simply storing page objects in a table, they would never be collected and memory usage would grow unchecked.

​ 考虑一个从数据库(例如邮件列表或程序源代码)动态生成网页的 Web 服务器。在这种类型的应用程序中,通常会将生成的页面缓存在内存中以提高性能。但是,如果缓存只是通过将页面对象存储在表中来实现,那么它们将永远不会被回收,并且内存使用量将不受控制地增长。

One remedy would be to cache only the most recently accessed n pages, but by failing to take the data size into account this will not make good use of available memory. An improvement would be to cache the most recently accessed x kilobytes of generated data. Besides the added program complexity, here the issue that arises is finding a suitable value for x. This is similar to an issue faced by garbage collectors: how often and after how much memory use should a cycle occur?

​ 一种补救方法是仅缓存最近访问的 n 个页面,但如果不考虑数据大小,这将无法充分利用可用内存。一种改进方法是缓存最近访问的 x 千字节生成的数据。除了增加程序复杂性之外,这里出现的问题是找到一个适合 x 的值。这类似于垃圾回收器面临的问题:循环应该多久发生一次以及在使用多少内存后发生?

By using weak references for caching, program complexity is kept low while leaving the memory use issue up to the garbage collector. Instead of storing generated page objects, the cache table consists of weak references to those objects. When a garbage collection cycle occurs page objects not currently in use will be collected.

​ 通过对缓存使用弱引用,可以保持较低的程序复杂性,同时将内存使用问题留给垃圾回收器。缓存表不存储生成的页面对象,而是包含对这些对象的弱引用。当垃圾回收循环发生时,将回收当前未使用的页面对象。

Here is an implementation assuming a function GeneratePage() that makes a page object given its “name”. The function CleanCache() is needed for removing table entries for collected objects, which again should be chained to Lua’s gc cycle.

​ 这里有一个实现,假设一个函数 GeneratePage() 给定其“名称”来生成一个页面对象。CleanCache() 函数用于删除已回收对象的表项,这些表项应再次链接到 Lua 的 gc 循环。

    ------------------------------------------
    -- Page cache
    local cache_table = { }
    function GetPage(name)
        local ref = %cache_table[name]
        local obj = ref and ref()
        if not obj then
            obj = GeneratePage(name)
            %cache_table[name] = weakref(obj)
        end
        return obj
    end
    function CleanCache()
        local delete_list = { }
        for name, ref in %cache_table do
            if not ref() then
                tinsert(delete_list, name)
            end
        end
        for i = 1, getn(delete_list) do
            %cache_table[delete_list[i]] = nil
        end
    end

Acknowledgements 致谢

The author would like to thank Anthony Carrico for discussions on weak references and garbage collection, Roberto Ierusalimschy for kindly pointing out the obvious (that the C API supports weak references), and also NanaOn-Sha, Co. Ltd. and Sony Computer Entertainment, Inc. for permission to share the source code presented here.

​ 作者要感谢 Anthony Carrico 就弱引用和垃圾回收进行的讨论,感谢 Roberto Ierusalimschy 友善地指出显而易见的事实(C API 支持弱引用),还要感谢 NanaOn-Sha, Co. Ltd. 和 Sony Computer Entertainment, Inc. 允许共享此处提供的源代码。

最后修改 January 31, 2024: 更新 (e0896df)