0%

C# Hashtable源码学习

第二篇C#源码学习的记录,这次是Hashtable类,同样是.NET 4.8的源码。

Hashtable

Hashtable可以认为是非泛型版本的Dictionary,但是除了这一点差别之外,Hashtable使用双重哈希来处理哈希碰撞,也和Dictionary不同。在Hashtable的源代码中有大量的注释,并且命名风格也和其它的很多文件不同,看起来它和Dictionary更像是出自两个不同时期(或不同团队)。

1
2
3
4
5
6
7
[DebuggerTypeProxy(typeof(System.Collections.Hashtable.HashtableDebugView))]
[DebuggerDisplay("Count = {Count}")]
[System.Runtime.InteropServices.ComVisible(true)]
[Serializable]
public class Hashtable : IDictionary, ISerializable, IDeserializationCallback, ICloneable {
// ...
}

Hashtable实现的接口IDictionary,会对数据增删改查,除此之外还有序列化相关的接口和深拷贝的接口。

内嵌类和结构

bucket

数据存储的载体,其结构定义如下:

1
2
3
4
5
private struct bucket {
public Object key;
public Object val;
public int hash_coll; // Store hash code; sign bit means there was a collision.
}

其中hash_coll是一个整数,低31位存储key的哈希值,符号位表示是否发生过碰撞。在Hashtable中使用数组buckets的下标作为开放寻址的空间,后边会详细说到哈希的原理。

HashtableEnumerator

遍历Hashtable的辅助类,实现了IEnumerator接口。

SyncHashtable

1
2
3
4
private class SyncHashtable : Hashtable, IEnumerable
{
// ...
}

SyncHashtable是线程安全版本的Hashtable,后边会展开解读。

基本原理

尺寸和数量参数

1
2
3
4
5
6
7
8
// The total number of entries in the hash table.
private int count;

// The total number of collision bits set in the hashtable
private int occupancy;

private int loadsize;
private float loadFactor;

Hashtable中有这么几个记录尺寸或数量的参数,它们的具体含义如下:

count:记录Hashtable中存储的键值对的数量,增加新数据时会加1,移除数据时会减1,公有的属性Count返回的也是它;

occupancy:记录标记为碰撞的个数,也就是bucketshash_coll的最高位为1的bucket的个数。每当产生新的碰撞时,就会把occupancy加1,当到达一定阈值(loadsize)时就会触发rehash()

loadFactor:负载因子,它的含义是Hashtable中实际存储的数据(键值对)数量和总的bucket的数量的最大比值。构造函数中可以指定,合法范围为0.1到1,默认参数传入的是1,构造时loadFactor会被乘以0.72(这个值是一个最优值);

loadsize:使用loadFactorhashsize(即buckets的长度)计算出来的负载尺寸,是扩张临界值;

从构造函数中可以看到这些参数的定义方式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public Hashtable(int capacity, float loadFactor) {
if (capacity < 0)
throw new ArgumentOutOfRangeException("capacity", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
if (!(loadFactor >= 0.1f && loadFactor <= 1.0f))
throw new ArgumentOutOfRangeException("loadFactor", Environment.GetResourceString("ArgumentOutOfRange_HashtableLoadFactor", .1, 1.0));
Contract.EndContractBlock();

// Based on perf work, .72 is the optimal load factor for this table.
this.loadFactor = 0.72f * loadFactor;

double rawsize = capacity / this.loadFactor;
if (rawsize > Int32.MaxValue)
throw new ArgumentException(Environment.GetResourceString("Arg_HTCapacityOverflow"));

// Avoid awfully small sizes
int hashsize = (rawsize > InitialSize) ? HashHelpers.GetPrime((int)rawsize) : InitialSize;
buckets = new bucket[hashsize];

loadsize = (int)(this.loadFactor * hashsize);
isWriterInProgress = false;
// Based on the current algorithm, loadsize must be less than hashsize.
Contract.Assert( loadsize < hashsize, "Invalid hashtable loadsize!");
}

哈希函数和碰撞处理

下边这一段逻辑在Hashtable类中反复出现,代码是查找的逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
uint seed;
uint incr;
// Take a snapshot of buckets, in case another thread resizes table
bucket[] lbuckets = buckets;
uint hashcode = InitHash(key, lbuckets.Length, out seed, out incr);
int ntry = 0;

bucket b;
int bucketNumber = (int) (seed % (uint)lbuckets.Length);
do {
b = lbuckets[bucketNumber];
if (b.key == null) {
return false;
}
if (((b.hash_coll & 0x7FFFFFFF) == hashcode) &&
KeyEquals (b.key, key))
return true;
bucketNumber = (int) (((long)bucketNumber + incr)% (uint)lbuckets.Length);
} while (b.hash_coll < 0 && ++ntry < lbuckets.Length);
return false;

首先是使用InitHash初始化一个seedincrInitHash函数如下,注释写得非常清晰,使用的是经典的双重哈希方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// ‘InitHash’ is basically an implementation of classic DoubleHashing (see http://en.wikipedia.org/wiki/Double_hashing)  
//
// 1) The only ‘correctness’ requirement is that the ‘increment’ used to probe
// a. Be non-zero
// b. Be relatively prime to the table size ‘hashSize’. (This is needed to insure you probe all entries in the table before you ‘wrap’ and visit entries already probed)
// 2) Because we choose table sizes to be primes, we just need to insure that the increment is 0 < incr < hashSize
//
// Thus this function would work: Incr = 1 + (seed % (hashSize-1))
//
// While this works well for ‘uniformly distributed’ keys, in practice, non-uniformity is common.
// In particular in practice we can see ‘mostly sequential’ where you get long clusters of keys that ‘pack’.
// To avoid bad behavior you want it to be the case that the increment is ‘large’ even for ‘small’ values (because small
// values tend to happen more in practice). Thus we multiply ‘seed’ by a number that will make these small values
// bigger (and not hurt large values). We picked HashPrime (101) because it was prime, and if ‘hashSize-1’ is not a multiple of HashPrime
// (enforced in GetPrime), then incr has the potential of being every value from 1 to hashSize-1. The choice was largely arbitrary.
//
// Computes the hash function: H(key, i) = h1(key) + i*h2(key, hashSize).
// The out parameter seed is h1(key), while the out parameter
// incr is h2(key, hashSize). Callers of this function should
// add incr each time through a loop.
private uint InitHash(Object key, int hashsize, out uint seed, out uint incr) {
// Hashcode must be positive. Also, we must not use the sign bit, since
// that is used for the collision bit.
uint hashcode = (uint) GetHash(key) & 0x7FFFFFFF;
seed = (uint) hashcode;
// Restriction: incr MUST be between 1 and hashsize - 1, inclusive for
// the modular arithmetic to work correctly. This guarantees you'll
// visit every bucket in the table exactly once within hashsize
// iterations. Violate this and it'll cause obscure bugs forever.
// If you change this calculation for h2(key), update putEntry too!
incr = (uint)(1 + ((seed * HashPrime) % ((uint)hashsize - 1)));
return hashcode;
}

根据keyhashsizebuckets数组的尺寸)得出初始哈希值hashcodehasecode会和0x7FFFFFFF按位与,以确保只使用低31位。然后计算出incr:使用hashcode乘以一个质数(这里是101)然后对(hashsize-1)取余,确保incr在1和(hashsize-1)之间。

初始化得到seedincr之后,开始查找:

  1. 使用初始化得到的seed对数组长度取余,进行第一次哈希,找到第一个bucketNumber
  2. 如果bucketNumber所对应的bucket是空的,返回false,没找到;
  3. 如果bucketNumber所对应的bucket中,hash_collkey都匹配,返回true,找到了;
  4. 进行第二重哈希,得到新的bucketNumber
  5. 此时,如果上次循环中找到的bucket是发生过哈希碰撞的(hash_coll小于0即最高位是1),并且查询的次数小于buckets数组的长度,回到步骤2,继续查找,直到不满足循环的条件,返回false

增删改查

基于哈希的基本原理,我们来看增删改查的过程:

Add

1
2
3
public virtual void Add(Object key, Object value) {
Insert(key, value, true);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
private void Insert (Object key, Object nvalue, bool add) {
// @


if (key == null) {
throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
}
Contract.EndContractBlock();
if (count >= loadsize) {
expand();
}
else if(occupancy > loadsize && count > 100) {
rehash();
}

uint seed;
uint incr;
// Assume we only have one thread writing concurrently. Modify
// buckets to contain new data, as long as we insert in the right order.
uint hashcode = InitHash(key, buckets.Length, out seed, out incr);
int ntry = 0;
int emptySlotNumber = -1; // We use the empty slot number to cache the first empty slot. We chose to reuse slots
// create by remove that have the collision bit set over using up new slots.
int bucketNumber = (int) (seed % (uint)buckets.Length);
do {

// Set emptySlot number to current bucket if it is the first available bucket that we have seen
// that once contained an entry and also has had a collision.
// We need to search this entire collision chain because we have to ensure that there are no
// duplicate entries in the table.
if (emptySlotNumber == -1 && (buckets[bucketNumber].key == buckets) && (buckets[bucketNumber].hash_coll < 0))//(((buckets[bucketNumber].hash_coll & unchecked(0x80000000))!=0)))
emptySlotNumber = bucketNumber;

// Insert the key/value pair into this bucket if this bucket is empty and has never contained an entry
// OR
// This bucket once contained an entry but there has never been a collision
if ((buckets[bucketNumber].key == null) ||
(buckets[bucketNumber].key == buckets && ((buckets[bucketNumber].hash_coll & unchecked(0x80000000))==0))) {

// If we have found an available bucket that has never had a collision, but we've seen an available
// bucket in the past that has the collision bit set, use the previous bucket instead
if (emptySlotNumber != -1) // Reuse slot
bucketNumber = emptySlotNumber;

// We pretty much have to insert in this order. Don't set hash
// code until the value & key are set appropriately.
#if !FEATURE_CORECLR
Thread.BeginCriticalRegion();
#endif
isWriterInProgress = true;
buckets[bucketNumber].val = nvalue;
buckets[bucketNumber].key = key;
buckets[bucketNumber].hash_coll |= (int) hashcode;
count++;
UpdateVersion();
isWriterInProgress = false;
#if !FEATURE_CORECLR
Thread.EndCriticalRegion();
#endif

#if FEATURE_RANDOMIZED_STRING_HASHING
#if !FEATURE_CORECLR
// coreclr has the randomized string hashing on by default so we don't need to resize at this point

if(ntry > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(_keycomparer))
{
// PERF: We don't want to rehash if _keycomparer is already a RandomizedObjectEqualityComparer since in some
// cases there may not be any strings in the hashtable and we wouldn't get any mixing.
if(_keycomparer == null || !(_keycomparer is System.Collections.Generic.RandomizedObjectEqualityComparer))
{
_keycomparer = HashHelpers.GetRandomizedEqualityComparer(_keycomparer);
rehash(buckets.Length, true);
}
}
#endif // !FEATURE_CORECLR
#endif // FEATURE_RANDOMIZED_STRING_HASHING

return;
}

// The current bucket is in use
// OR
// it is available, but has had the collision bit set and we have already found an available bucket
if (((buckets[bucketNumber].hash_coll & 0x7FFFFFFF) == hashcode) &&
KeyEquals (buckets[bucketNumber].key, key)) {
if (add) {
throw new ArgumentException(Environment.GetResourceString("Argument_AddingDuplicate__", buckets[bucketNumber].key, key));
}
#if !FEATURE_CORECLR
Thread.BeginCriticalRegion();
#endif
isWriterInProgress = true;
buckets[bucketNumber].val = nvalue;
UpdateVersion();
isWriterInProgress = false;
#if !FEATURE_CORECLR
Thread.EndCriticalRegion();
#endif

#if FEATURE_RANDOMIZED_STRING_HASHING
#if !FEATURE_CORECLR
if(ntry > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(_keycomparer))
{
// PERF: We don't want to rehash if _keycomparer is already a RandomizedObjectEqualityComparer since in some
// cases there may not be any strings in the hashtable and we wouldn't get any mixing.
if(_keycomparer == null || !(_keycomparer is System.Collections.Generic.RandomizedObjectEqualityComparer))
{
_keycomparer = HashHelpers.GetRandomizedEqualityComparer(_keycomparer);
rehash(buckets.Length, true);
}
}
#endif // !FEATURE_CORECLR
#endif
return;
}

// The current bucket is full, and we have therefore collided. We need to set the collision bit
// UNLESS
// we have remembered an available slot previously.
if (emptySlotNumber == -1) {// We don't need to set the collision bit here since we already have an empty slot
if( buckets[bucketNumber].hash_coll >= 0 ) {
buckets[bucketNumber].hash_coll |= unchecked((int)0x80000000);
occupancy++;
}
}

bucketNumber = (int) (((long)bucketNumber + incr)% (uint)buckets.Length);
} while (++ntry < buckets.Length);

// This code is here if and only if there were no buckets without a collision bit set in the entire table
if (emptySlotNumber != -1)
{
// We pretty much have to insert in this order. Don't set hash
// code until the value & key are set appropriately.
#if !FEATURE_CORECLR
Thread.BeginCriticalRegion();
#endif
isWriterInProgress = true;
buckets[emptySlotNumber].val = nvalue;
buckets[emptySlotNumber].key = key;
buckets[emptySlotNumber].hash_coll |= (int) hashcode;
count++;
UpdateVersion();
isWriterInProgress = false;
#if !FEATURE_CORECLR
Thread.EndCriticalRegion();
#endif

#if FEATURE_RANDOMIZED_STRING_HASHING
#if !FEATURE_CORECLR
if(buckets.Length > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(_keycomparer))
{
// PERF: We don't want to rehash if _keycomparer is already a RandomizedObjectEqualityComparer since in some
// cases there may not be any strings in the hashtable and we wouldn't get any mixing.
if(_keycomparer == null || !(_keycomparer is System.Collections.Generic.RandomizedObjectEqualityComparer))
{
_keycomparer = HashHelpers.GetRandomizedEqualityComparer(_keycomparer);
rehash(buckets.Length, true);
}
}
#endif // !FEATURE_CORECLR
#endif
return;
}

// If you see this assert, make sure load factor & count are reasonable.
// Then verify that our double hash function (h2, described at top of file)
// meets the requirements described above. You should never see this assert.
Contract.Assert(false, "hash table insert failed! Load factor too high, or our double hashing function is incorrect.");
throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_HashInsertFailed"));
}

Add会在内部调用Insert方法,这个函数的有点长,在开始之前先注意两个要点:

  1. 这里有一个判断buckets[bucketNumber].key == buckets,应该是一种取巧的处理,当这个bucket数据被清除时,它仍要保留碰撞位以确保可以继续向后查找,后边Remove的代码可以看到,这种情况下会把buckets赋值给key,用于做特殊标记;

  2. 插入过程中需要一边寻找emptySlotNumber,一边寻找目标bucket,前者是找到潜在的可使用的bucket。即便是找到了emptySlotNumber,还是需要继续查找,直到找到一个没有发生过碰撞的bucket以确保当前的Hashtable中不存在将要插入的key

这个插入的过程可以抽取出以下的主要步骤:

  1. 如果count >= loadsize,说明buckets的使用率已经达到阈值,为保证哈希的性能,需要扩张expand()

  2. 如果没有达到扩张的阈值,判断如果occupancy > loadsize && count > 100即碰撞的数量大于阈值,并且存储的键值对数量超过100,则需要重新哈希rehash()。注意碰撞位的数量可能会比键值对的数量更多,后边会看到,当删除bucket中的数据之后,b.hash_coll的碰撞位会保留;

  3. 开始进入循环,查找key,并将可能使用的空bucket的下标保存到emptySlotNumber,循环的条件是(++ntry < buckets.Length),即循环的次数会小于数组的长度;

  4. 在循环中判断主要有以下逻辑:

    1. 如果尚未找到emptySlotNumber,且当前的bucket是空的且发生过碰撞(碰撞位是1),那么将bucketNumber赋给emptySlotNumber
    2. 如果当前bucket是空的(满足key为null,或者满足key为buckets但碰撞位是0),那么就在emptySlotNumber的位置插入新的数据,并将此buckethash_coll低31位写为哈希值,结束循环,返回;
    3. 如果当前bucket的哈希值和key均和查找的目标匹配,那么说明已存在这组键值,对于Add这种情况,会抛出异常;如果是单纯的修改数据,那么直接改value的值;
    4. 此时如果我们还没有合适的emptySlotNumber(说明至少之前的bucket都是发生过碰撞的)且当前的bucket没有发生过碰撞,那么将当前bucket的碰撞为置位1(下一次遍循环时应该就会找到合适的空的bucket了);
    5. bucketNumber增加incr,取模,继续下一轮循环;
  5. 循环结束后,如果找到了emptySlotNumber,那么直接将数据存储在对应的bucket里;如果仍然没找到,会抛出异常。

接下来是expandrehash的逻辑:

1
2
3
4
private void expand()  {
int rawsize = HashHelpers.ExpandPrime(buckets.Length);
rehash(rawsize, false);
}
1
2
3
private void rehash() {
rehash( buckets.Length, false );
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
private void rehash( int newsize, bool forceNewHashCode ) {

// reset occupancy
occupancy=0;

// Don't replace any internal state until we've finished adding to the
// new bucket[]. This serves two purposes:
// 1) Allow concurrent readers to see valid hashtable contents
// at all times
// 2) Protect against an OutOfMemoryException while allocating this
// new bucket[].
bucket[] newBuckets = new bucket[newsize];

// rehash table into new buckets
int nb;
for (nb = 0; nb < buckets.Length; nb++){
bucket oldb = buckets[nb];
if ((oldb.key != null) && (oldb.key != buckets)) {
int hashcode = ((forceNewHashCode ? GetHash(oldb.key) : oldb.hash_coll) & 0x7FFFFFFF);
putEntry(newBuckets, oldb.key, oldb.val, hashcode);
}
}

// New bucket[] is good to go - replace buckets and other internal state.
#if !FEATURE_CORECLR
Thread.BeginCriticalRegion();
#endif
isWriterInProgress = true;
buckets = newBuckets;
loadsize = (int)(loadFactor * newsize);
UpdateVersion();
isWriterInProgress = false;
#if !FEATURE_CORECLR
Thread.EndCriticalRegion();
#endif
// minimun size of hashtable is 3 now and maximum loadFactor is 0.72 now.
Contract.Assert(loadsize < newsize, "Our current implementaion means this is not possible.");
return;
}

rehash的时候,会先使用ExpandPrime计算出新的buckets的长度,然后创建新数组,将数据拷贝至新数组,然后再用新数组取代原来的数组。向新的数组中插入数据时又封装了一个函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
private void putEntry (bucket[] newBuckets, Object key, Object nvalue, int hashcode)
{
Contract.Assert(hashcode >= 0, "hashcode >= 0"); // make sure collision bit (sign bit) wasn't set.

uint seed = (uint) hashcode;
uint incr = (uint)(1 + ((seed * HashPrime) % ((uint)newBuckets.Length - 1)));
int bucketNumber = (int) (seed % (uint)newBuckets.Length);
do {

if ((newBuckets[bucketNumber].key == null) || (newBuckets[bucketNumber].key == buckets)) {
newBuckets[bucketNumber].val = nvalue;
newBuckets[bucketNumber].key = key;
newBuckets[bucketNumber].hash_coll |= hashcode;
return;
}

if( newBuckets[bucketNumber].hash_coll >= 0 ) {
newBuckets[bucketNumber].hash_coll |= unchecked((int)0x80000000);
occupancy++;
}
bucketNumber = (int) (((long)bucketNumber + incr)% (uint)newBuckets.Length);
} while (true);
}

每次插入数据时都会根据key的哈希值,寻找可用的bucket,同时会把发生碰撞的bucket的碰撞为标记为1。由于是新创建的数组,相比于Insert方法,这个循环中不需要考虑有碰撞标记但是数据已被清除的情况。

Remove

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
public virtual void Remove(Object key) {
if (key == null) {
throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
}
Contract.EndContractBlock();
Contract.Assert(!isWriterInProgress, "Race condition detected in usages of Hashtable - multiple threads appear to be writing to a Hashtable instance simultaneously! Don't do that - use Hashtable.Synchronized.");

uint seed;
uint incr;
// Assuming only one concurrent writer, write directly into buckets.
uint hashcode = InitHash(key, buckets.Length, out seed, out incr);
int ntry = 0;

bucket b;
int bn = (int) (seed % (uint)buckets.Length); // bucketNumber
do {
b = buckets[bn];
if (((b.hash_coll & 0x7FFFFFFF) == hashcode) &&
KeyEquals (b.key, key)) {
#if !FEATURE_CORECLR
Thread.BeginCriticalRegion();
#endif
isWriterInProgress = true;
// Clear hash_coll field, then key, then value
buckets[bn].hash_coll &= unchecked((int)0x80000000);
if (buckets[bn].hash_coll != 0) {
buckets[bn].key = buckets;
}
else {
buckets[bn].key = null;
}
buckets[bn].val = null; // Free object references sooner & simplify ContainsValue.
count--;
UpdateVersion();
isWriterInProgress = false;
#if !FEATURE_CORECLR
Thread.EndCriticalRegion();
#endif
return;
}
bn = (int) (((long)bn + incr)% (uint)buckets.Length);
} while (b.hash_coll < 0 && ++ntry < buckets.Length);

//throw new ArgumentException(Environment.GetResourceString("Arg_RemoveArgNotFound"));
}

Remove时,也是查找对应的bucket。如果找到了目标,会将目标bucketvalue置为null,如果有碰撞标记,那么把它的key设为buckets,否则把key设为null。如果找到了没有发生过碰撞的bucket或者查找次数已经到达整个数组的长度,那么结束循环,没找到,什么也不会发生。

Hashtable中定义有:

1
private volatile bool isWriterInProgress;

每当进行写操作时,会在写操作之前将其置为true,并在写操作完成之后置为false。而在删除操作时,会在一开始有一个isWriterInProgress的断言判断(Clear时也有),即不允许一边写入一边删除。为避免出现这样的问题,可以使用同步版本的SyncHashtable

set

1
2
3
set {
Insert(key, value, false);
}

修改数据的过程,参见Insert,当数据已存在,直接替换value。如果不存在,寻找可用的空的bucket,写入值。

get

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
get {
if (key == null) {
throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
}
Contract.EndContractBlock();

uint seed;
uint incr;


// Take a snapshot of buckets, in case another thread does a resize
bucket[] lbuckets = buckets;
uint hashcode = InitHash(key, lbuckets.Length, out seed, out incr);
int ntry = 0;

bucket b;
int bucketNumber = (int) (seed % (uint)lbuckets.Length);
do
{
int currentversion;

// A read operation on hashtable has three steps:
// (1) calculate the hash and find the slot number.
// (2) compare the hashcode, if equal, go to step 3. Otherwise end.
// (3) compare the key, if equal, go to step 4. Otherwise end.
// (4) return the value contained in the bucket.
// After step 3 and before step 4. A writer can kick in a remove the old item and add a new one
// in the same bukcet. So in the reader we need to check if the hash table is modified during above steps.
//
// Writers (Insert, Remove, Clear) will set 'isWriterInProgress' flag before it starts modifying
// the hashtable and will ckear the flag when it is done. When the flag is cleared, the 'version'
// will be increased. We will repeat the reading if a writer is in progress or done with the modification
// during the read.
//
// Our memory model guarantee if we pick up the change in bucket from another processor,
// we will see the 'isWriterProgress' flag to be true or 'version' is changed in the reader.
//
int spinCount = 0;
do {
// this is violate read, following memory accesses can not be moved ahead of it.
currentversion = version;
b = lbuckets[bucketNumber];

// The contention between reader and writer shouldn't happen frequently.
// But just in case this will burn CPU, yield the control of CPU if we spinned a few times.
// 8 is just a random number I pick.
if( (++spinCount) % 8 == 0 ) {
Thread.Sleep(1); // 1 means we are yeilding control to all threads, including low-priority ones.
}
} while ( isWriterInProgress || (currentversion != version) );

if (b.key == null) {
return null;
}
if (((b.hash_coll & 0x7FFFFFFF) == hashcode) &&
KeyEquals (b.key, key))
return b.val;
bucketNumber = (int) (((long)bucketNumber + incr)% (uint)lbuckets.Length);
} while (b.hash_coll < 0 && ++ntry < lbuckets.Length);
return null;
}

取值的之后,除了简单的查找逻辑之外,还额外做了一些并发的处理。如果在取到bucket之后,发现isWriterInProgress || (currentversion != version),那么需要重新取一次(bucket是值类型赋值时会发生拷贝),并且在循环内会以一定频率让出CPU。

遍历

使用实现了IEnumerator的辅助类HashtableEnumerator来遍历,并且在遍历的过程中也会对version判断,禁止遍历过程中修改数据。

线程安全

Hashtable本身不是线程安全的,但是它提供了一个静态方法:

1
2
3
4
5
6
7
8
9
// Returns a thread-safe wrapper for a Hashtable.
//
[HostProtection(Synchronization=true)]
public static Hashtable Synchronized(Hashtable table) {
if (table==null)
throw new ArgumentNullException("table");
Contract.EndContractBlock();
return new SyncHashtable(table);
}

会得到一个SyncHashtable对象,它是线程安全的。SyncHashtableHashtable的内嵌类,并且衍生自Hashtable,它覆写了Hashtable的一些接口,以实现多线程的读写安全。

SyncHashtable的内部持有了一个Hashtable对象,在写操作时,会先加锁再对数据进行操作,详见一下的代码:

1
2
3
4
5
6
7
public override void Add(Object key, Object value)
{
lock (_table.SyncRoot)
{
_table.Add(key, value);
}
}
1
2
3
4
5
6
7
public override void Clear()
{
lock (_table.SyncRoot)
{
_table.Clear();
}
}
1
2
3
4
5
6
7
public override void Remove(Object key)
{
lock (_table.SyncRoot)
{
_table.Remove(key);
}
}
1
2
3
4
5
6
7
8
9
10
public override Object this[Object key] {
get {
return _table[key];
}
set {
lock(_table.SyncRoot) {
_table[key] = value;
}
}
}

REFERENCE

https://referencesource.microsoft.com/#mscorlib/system/collections/hashtable.cs