Compacting a WeakReference Dictionary

weakreference example
weakreference android
weak reference c#
weakreference kotlin
weakreference php
weak reference c
weakreference python
weakreference set

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.

So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

The cache is implemented as a Dictionary<TKey,WeakReference>, based on @JaredPar's Building a WeakReference Hashtable:

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by @Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.


My question is: What's the best strategy to compact a WeakReference Dictionary?

The options that I see are:

  1. Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.

  2. Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).

  3. Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?

  4. Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.

  5. Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.

Are there other strategies?

The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?

Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.

Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.

But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.

After some discussion:

Does such a[n amortized] strategy exist?

I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.

Comparable WeakReferences in C#, C# provides us with a WeakReference object. the advantage that if you use an EquatableWeakReference as a key in a Dictionary, it will http://stackoverflow.​com/questions/2047591/compacting-a-weakreference-dictionary. The WeakReference is a reference type and so, when you allocate a WeakReference you are allocating an entire object (with a finalizer too) to reference another object. Only that other object will be "weakly referenced". So it is usually not recommended to use WeakReferences to reference small data.

If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).

According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.

    static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();

    void Window_Loaded(object sender, RoutedEventArgs e)
    {
        ...
        dataGrid.SelectionChanged += (_sender, _e) =>
        {
            var cs = dataGrid.SelectedItem as ClientSession;

            this.tabControl.Items.Clear();

            foreach (var tid in cs.GetThreadIDs())
            {
                tid.tabItem = new TabItem() { Header = ... };
                tid.tabItem.AddHandler(UIElement.MouseDownEvent,
                    new MouseButtonEventHandler((__sender, __e) =>
                    {
                        tabControl_SelectionChanged(tid.tabItem);
                    }), true);
                tidByTab.Add(tid.tabItem, tid);
                this.tabControl.Items.Add(tid.tabItem);
            }
        };
    }

    void tabControl_SelectionChanged(TabItem tabItem)
    {
        this.tabControl.SelectedItem = tabItem;
        if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
        {
            tidExec.EnsureBlocksLoaded();
            ShowStmt(tidExec.CurrentStmt);
        }
        else
            throw new Exception("huh?");
    }

What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).

Weak reference, In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of Mark-compact algorithm · Reference counting · Tracing garbage collection  Private Shared _cache As Dictionary(Of Integer, WeakReference) ' Track the number of times an object is regenerated. Dim regenCount As Integer = 0 Public Sub New(ByVal count As Integer) _cache = New Dictionary(Of Integer, WeakReference) ' Add data objects with a short weak reference to the cache.

Approach #5 is interesting, but has the disadvantage that it could be difficult to know what the real level of hash-table utilization is, and consequently when the hash table should be expanded. That difficulty might be overcome if, whenever it "seems" like the hash table should be expanded, one first does a whole-table scan to remove dead entries. If more than half of the entries in the table were dead, don't bother expanding it. Such an approach should yield amortized O(1) behavior, since one wouldn't do the whole-table scan until one had added back as many entries as had been deleted.

A simpler approach, which would also yield O(1) amortized time and O(1) space per recently-live element would be to keep a count of how many items were alive after the last time the table was purged, and how many elements have been added since then. Whenever the latter count exceeds the first, do a whole-table scan-and-purge. The time required for a scan and purge will be proportional to the number of elements added between purges, thus retaining amortized O(1) time, and the number of total elements in the collection will not exceed twice the number of elements that were recently observed to be alive, so the number of dead elements cannot exceed twice the number of recently-live elements.

Building A Weakreference Hashtable, Since the value in the hashtable is a WeakReference the mere Here is the implementation of the dictionary without any compaction support. At least the weakreference will allow the GC to get clean-up the objects you’re pointing to, but there is no good way to get rid of the dictionary entries. They will continue to grow. See discussion at… http://stackoverflow.com/questions/2047591/compacting-a-weakreference-dictionary

I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):

internal class CleanerRef
{
    ~CleanerRef()
    {
        if (handle.IsAllocated)
            handle.Free();
    }

    public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
    {
        handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
        Dictionary = dictionary;
    }

    public bool IsAlive
    {
        get {return handle.IsAllocated && handle.Target != null;}
    }

    public object Target
    {
        get {return IsAlive ? handle.Target : null;}
    }

    GCHandle handle;
    public WeakDictionary Dictionary;
}


internal class WeakDictionaryCleaner
{
    public WeakDictionaryCleaner(WeakDictionary dict)
    {
        refs.Add(new CleanerRef(this, dict));
    }

    ~WeakDictionaryCleaner()
    {
        foreach(var cleanerRef in refs)
        {
            if (cleanerRef.Target == this)
            {
                cleanerRef.Dictionary.ClearGcedEntries();
                refs.Remove(cleanerRef);
                break;
            }
        }
    }
    private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}

What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:

new WeakDictionaryCleaner(weakDictionary);

Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction. You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.

Hope this helps

WeakReference Class (System), Represents a weak reference, which references an object while still allowing that Dictionary<int, WeakReference>(); // Add objects with a short weak reference  // TODO: convert to BaseDictionary- easier now to have custom dictionary since this does compacting // and weak reference resolution template < class TKey , class TValue , class KeyComparer = DefaultComparer< const TKey*>, bool cleanOnInsert = true > class WeaklyReferencedKeyDictionary

I guess this is a right place to put it, even though it might look like necromancy. Just in case someone stumbles upon this question like I did. Lack of a dedicated Identity Map in .net is somewhat surprising, and I feel the most natural way for it work is as described in the last option: when the table is full and about to double its capacity, it checks to see if there is enough dead entries that can be recycled for further use so that growing is not necessary.

static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));

The class exposes just one public method Get with key and optional value parameter that lazily loads and caches an entity if it is not in the cache.

using System;
class IdentityMap<TKey, TValue>
    where TKey : IEquatable<TKey>
    where TValue : class
{
    Func<TValue, TKey> key_selector;
    WeakReference<TValue>[] references;
    int[] buckets;
    int[] bucket_indexes;
    int tail_index;
    int entries_count;
    int capacity;

    public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
        this.key_selector = key_selector;
        Init(capacity);
    }
    void Init(int capacity) {
        this.bucket_indexes = new int[capacity];
        this.buckets = new int[capacity];
        this.references = new WeakReference<TValue>[capacity];
        for (int i = 0; i < capacity; i++) {
            bucket_indexes[i] = -1;
            buckets[i] = i - 1;
        }
        this.tail_index = capacity - 1;
        this.entries_count = 0;
        this.capacity = capacity;
    }

    public TValue Get(TKey key, Func<TValue> value = null) {
        int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
        var ret = WalkBucket(bucket_index, true, key);
        if (ret == null && value != null) Add(bucket_index, ret = value());
        return ret;
    }

    void Add(int bucket_index, TValue value) {
        if (this.entries_count == this.capacity) {
            for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
            if (this.entries_count * 2 > this.capacity) {
                var old_references = references;
                Init(this.capacity * 2);
                foreach (var old_reference in old_references) {
                    TValue old_value;
                    if (old_reference.TryGetTarget(out old_value)) {
                        int hash = key_selector(value).GetHashCode();
                        Add(Math.Abs(hash % this.capacity), old_value);
                    }
                }
            }
        }
        int new_index = this.tail_index;
        this.tail_index = buckets[this.tail_index];
        this.entries_count += 1;
        buckets[new_index] = bucket_indexes[bucket_index];
        if (references[new_index] != null) references[new_index].SetTarget(value);
        else references[new_index] = new WeakReference<TValue>(value);
        bucket_indexes[bucket_index] = new_index;
    }

    TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
        int curr_index = bucket_indexes[bucket_index];
        int prev_index = -1;
        while (curr_index != -1) {
            TValue value;
            int next_index = buckets[curr_index];
            if (references[curr_index].TryGetTarget(out value)) {
                if (is_searching && key_selector(value).Equals(key)) return value;
                prev_index = curr_index;
            } else {
                if (prev_index != -1) buckets[prev_index] = next_index;
                else bucket_indexes[bucket_index] = next_index;

                buckets[curr_index] = this.tail_index;
                this.tail_index = curr_index;
                this.entries_count -= 1;
            }
            curr_index = next_index;
        }
        return null;
    }
}

Handbook of Research Methods and Applications in Happiness and , With its weak reference to an artificial intelligence that is able to better orient however, was not born ex nihilo. its imagery and vocabulary draw on several notion of the compact city, a misura d'uomo, organized according to the basic needs  I just assumed that ConditionalWeakTable is what applications 2 and 3 would use whereas some people in other posts actually use Dictionary<WeakReference, T>. No idea why—you’d always end up with a ton of null WeakReference s with values that cannot be accessed by any key regardless of how you do it.

Weak Reference Collections - Discussion, active, it's probably a good time to bring weak reference collections up. on base type, so that it can be used for arrays, sets, dictionaries and so on. Removing nil values can be easily done with compact() as desired. compacting definition: 1. present participle of compact 2. to press something together in a tight and solid way: . Learn more.

[PDF] Under the Hood of .NET Memory Management, Garbage collection of the Small Object Heap (SOH) involves compaction. Concurrent GC has a separate thread for the GC to run on, meaning that the appli​- returned from the Target property of a weak reference, that means the object has  The ConditionalWeakTable<TKey,TValue> class enables language compilers to attach arbitrary properties to managed objects at run time. A ConditionalWeakTable<TKey,TValue> object is a dictionary that binds a managed object, which is represented by a key, to its attached property, which is represented by a value.

Swift Arrays Holding Elements With Weak References, Now, we have an array type-safe which maintains a weak reference of your MyClass objects. func compact() { array = array.filter { $0.value != nil } } if you need something similar to NSPointerArray for Dictionary you can  Recently I ran into a situation on a personal project where I needed a hashtable like structure for a set of WeakReference values. When poking around for an existing implementation I saw found several versions which were very thin, type safe wrapper around a Dictionary<TKey,WeakReference> (usually the class even implements IDictionary<TKey,TValue> directly).

Comments
  • One of the serious deficiencies of C#/.NET weak references is the missing communication from the GC, such as Java has with associating a ReferenceQueue with the weak reference.
  • dtb, discussion becomes easier when you number the options instead of bulleting them.
  • I have added two more options. What do you think of those? ... The problem is that I'm building a library that could be used both in server and client apps. In both cases it's likely that my class is used for the entire lifetime of the app. It feels wrong to design a library that acquires memory with each method call and never releases it, even if it's just a a couple of bytes per call.
  • You're right about the library angle. Both new options will probably work but have really bad worst-case behaviour. I vote for nr 2, periodical full scan.
  • Why do you think option 5 has a really bad worst-case behaviour? A hash table implementation will always have to scan the buckets for an empty spot, so why no check for dead WeakReferences here?
  • It won't scan all buckets. It's about the same as checking a few random entries. Underlying problem is that there is no correlation with what the GC cleans up and what you can easily access.
  • This was quite helpful to me recently as I tried to solve the same problem. If I could vote twice I would. Thanks!
  • Thanks for sharing - this helped me a lot!
  • In this case it will be re-inserted by CreateFoo immediately, so this is sadly no improvement.
  • You could do the scans to compact the dictionary when you remove a weak reference.
  • There is not only the dead WeakReference that occupies space, but also the internal structures of the Dictionary. So even if the number of active Foo objects stays constants, the hash table needs to constantly grow to accommodate all the dead WeakReferences.
  • Correct, I didn't take that into account. That would be additional 4 bytes (on a 32-bit system) for each array member (on average of course, there may be some small constant overhead).
  • It is an improvement, it doesn't grow.
  • Finalizers should not access managed objects. It is possible that the dictionary and the entry are collected at the same time and cause very unpredictable behavior by accessing a already finalized dictionary. And also you will cause threading issues in addition by calling Add and Remove concurrently on different threads.
  • @NickWhaley Thank you, I was not fully aware of potential issues you pointed out. So I see this kind of "workaround" might work under special circumstances on what we cannot control substantially, like choice of finalizer thread or destruction order of objects on application exit, which may end up throwing exceptions from finalizer.