Problem
I have been working on a mod that renders a bitmap image in-memory based on the game map to a table before writing it out to a file. While performance was slow but acceptable with smaller image (and therefore table) sizes, I found that after adding 1049600 entries to a table, the next entry took over 11 seconds to write.Code: Select all
local p = game.create_profiler()
local t = {}
for i = 1, 1049599 do t[i] = 0 end
game.print{"", "1-", #t, ": ", p}
for i = #t+1, #t+4 do
p.reset()
t[i] = 0
game.print{"", i, ": ", p}
end
-- output:
1-1049599: Elapsed: 556.496840ms
1049600: Elapsed: 0.007879ms
1049601: Elapsed: 11443.948614ms
1049602: Elapsed: 0.008234ms
1049603: Elapsed: 0.075207ms
Code: Select all
local p = game.create_profiler()
local t = {}
for i = 1, 5000000 do
t[i] = 0
if i % 100000 == 0 then
game.print{"", i, ": ", p}
p.reset()
end
end
-- output:
100000: Elapsed: 46.207088ms
200000: Elapsed: 53.280971ms
300000: Elapsed: 91.903078ms
400000: Elapsed: 30.148098ms
500000: Elapsed: 35.163378ms
600000: Elapsed: 189.806780ms
700000: Elapsed: 34.790679ms
800000: Elapsed: 34.939359ms
900000: Elapsed: 36.173256ms
1000000: Elapsed: 38.712967ms
1100000: Elapsed: 12990.752463ms
1200000: Elapsed: 200.527076ms
1300000: Elapsed: 568.851723ms
1400000: Elapsed: 819.362545ms
1500000: Elapsed: 1064.082434ms
1600000: Elapsed: 1302.363555ms
1700000: Elapsed: 1572.051218ms
1800000: Elapsed: 1840.763217ms
1900000: Elapsed: 2099.557564ms
2000000: Elapsed: 2371.360799ms
2100000: Elapsed: 3335.998881ms
2200000: Elapsed: 39.499897ms
2300000: Elapsed: 41.462934ms
2400000: Elapsed: 40.821955ms
2500000: Elapsed: 43.024659ms
2600000: Elapsed: 42.432301ms
2700000: Elapsed: 42.355921ms
2800000: Elapsed: 45.198566ms
2900000: Elapsed: 44.559136ms
3000000: Elapsed: 48.675417ms
3100000: Elapsed: 47.267981ms
3200000: Elapsed: 48.517513ms
3300000: Elapsed: 48.437369ms
3400000: Elapsed: 51.772120ms
3500000: Elapsed: 49.738693ms
3600000: Elapsed: 53.296874ms
3700000: Elapsed: 55.244591ms
3800000: Elapsed: 54.951138ms
3900000: Elapsed: 54.545961ms
4000000: Elapsed: 55.934428ms
4100000: Elapsed: 58.980624ms
4200000: Elapsed: 1936.416306ms
4300000: Elapsed: 29.680170ms
4400000: Elapsed: 30.193556ms
4500000: Elapsed: 29.431090ms
4600000: Elapsed: 29.327514ms
4700000: Elapsed: 33.734657ms
4800000: Elapsed: 29.790708ms
4900000: Elapsed: 29.234059ms
5000000: Elapsed: 30.076491ms
Code: Select all
local function iterate(t)
local p = game.create_profiler()
local sum = 0
for i = 1, #t do sum = sum + t[i] end --[[ Using pairs/ipairs for iteration have similar performance ]]
p.stop()
game.print{"", "Iterated ", #t, " items: ", p}
end
local t = {}
for i = 1, 2000000 do t[i] = 1 end; iterate(t)
for i = 2000001, 2200000 do t[i] = 1 end; iterate(t)
--output:
Iterated 2000000 items: Duration: 26773.726862ms
Iterated 2200000 items: Duration: 544.336383ms
- What I expected is that there shouldn't be such a severe performance degradation in the 1-2.1 million size range when using tables with sequential integer keys.
- This is completely reproducible in a fresh savefile with no mods.
Investigation/Explanation
(Caveat lector: I'm not super-experienced with this and some or all of this explanation may be incorrect)I grabbed a copy of the Factorio Lua fork from https://github.com/Rseding91/Factorio-Lua/ and compiled it and was able to reproduce the slow performance there. After some profiling and adding some debug logging, I found that when the table reaches the problem size, almost all the time is spent following long node chains in the hash table.
Some background: Standard Lua tables are implemented internally using a combination of an array and a hash table. Sequential integer keys starting from 1 are stored in the array part, and other types of keys, potentially including larger integers when representing sparse arrays, are stored in the hash part. This behavior is described in more detail in the Performance Tips chapter of Lua Programming Gems.
In Factorio's fork of Lua, integer keys of 1024 or less are always stored in the array part of the table, even if sparse, and integers greater than 1024 are always stored in the hash part. This makes storing large sequences in tables inherently a little slower in Factorio Lua than in standard Lua, since almost all keys need to go through a hash table lookup, but it ought to be a fairly constant slow-down.
Lua also has two different approaches for hashing numbers that it uses depending on the platform: one based on re-interpreting a double's underlying storage as integers ("cast" hash), and one based on the frexp() function. Factorio uses the "frexp" hash on all platforms, except for Nintendo Switch. My understanding is that these differences are intended to make Lua script execution deterministic across multiple platforms, which is necessary for Factorio's multiplayer and replay functionality.
The problem is that the "frexp" hash function distributes integers across hash table buckets poorly compared to other hash functions, including the "cast" hash, and when the hash table size is 2^21 specifically, almost all keys end up in a very small number of buckets. I haven't dug into the math behind the hash function enough to understand why that's the case, but I have written a small program to evaluate the different hash functions at various table sizes to experimentally determine how effective they are. Note the 21st row in particular:
Code: Select all
frexp hash || cast hash
==================================++==================================
bits in-pos max avg it-ops || bits in-pos max avg it-ops
1 50.0% 2 2.0 1.50 || 1 50.0% 2 2.0 1.50
2 50.0% 2 2.0 1.50 || 2 75.0% 2 1.3 1.25
3 25.0% 4 4.0 2.50 || 3 87.5% 2 1.1 1.12
4 50.0% 2 2.0 1.50 || 4 93.8% 2 1.1 1.06
5 50.0% 2 2.0 1.50 || 5 96.9% 2 1.0 1.03
6 28.1% 4 3.6 2.31 || 6 98.4% 2 1.0 1.02
7 1.6% 64 64.0 32.50 || 7 99.2% 2 1.0 1.01
8 85.2% 2 1.2 1.15 || 8 99.6% 2 1.0 1.00
9 28.5% 4 3.5 2.29 || 9 99.8% 2 1.0 1.00
10 50.0% 2 2.0 1.50 || 10 99.9% 2 1.0 1.00
11 75.0% 2 1.3 1.25 || 11 75.0% 2 1.3 1.25
12 76.4% 3 1.3 1.27 || 12 81.2% 2 1.2 1.19
13 52.4% 4 1.9 1.62 || 13 78.1% 3 1.3 1.23
14 34.0% 5 2.9 2.26 || 14 82.4% 3 1.2 1.18
15 81.1% 3 1.2 1.19 || 15 81.8% 3 1.2 1.19
16 49.2% 5 2.0 1.82 || 16 81.5% 3 1.2 1.19
17 47.2% 7 2.1 2.04 || 17 81.4% 3 1.2 1.19
18 43.9% 10 2.3 2.21 || 18 81.7% 3 1.2 1.18
19 37.6% 13 2.7 2.51 || 19 81.7% 3 1.2 1.18
20 25.1% 21 4.0 3.48 || 20 81.7% 3 1.2 1.18
21 0.3% 512 341.6 235.04 || 21 81.6% 3 1.2 1.18
22 50.3% 12 2.0 1.99 || 22 70.0% 4 1.4 1.32
23 73.7% 5 1.4 1.31 || 23 55.9% 4 1.8 1.51
24 79.3% 4 1.3 1.21 || 24 50.0% 6 2.0 1.88
25 66.9% 3 1.5 1.34 || 25 31.2% 12 3.2 3.19
26 54.3% 5 1.8 1.54 || 26 18.7% 24 5.3 5.84
27 54.8% 6 1.8 1.74 || 27 12.5% 16 8.0 7.17
28 57.4% 4 1.7 1.62 || 28 14.8% 8 6.7 4.34
29 64.6% 4 1.5 1.39 || 29 25.4% 7 3.9 2.50
30 81.7% 3 1.2 1.18 || 30 37.4% 6 2.7 2.01
- bits: size of hashtable = 2^bits
- in-pos: what percentage of table entries are are stored at their "main" position in the hash table and require no Node chain traversal to get (ideally should be about 63% for large tables with a uniform hash)
- max: the longest node chain, or the most entries that collide to the same hash bucket
- avg: the average length of all Node chains in the table, ignoring buckets with no nodes
- it-ops: the average number of operations/comparisons needed to find an entry in the table (ideally should be about 1.5 for large tables with a uniform hash). This is the closest to a "score" for how good the hash function is.
This issue doesn't affect standard Lua when using tables with sequential integer keys because it is able to store all of the table entries internally in the array part of the table, so it doesn't hash the keys at all and doesn't encounter the collisions at this particular table size.