Some benchmark cleanup#3
Some benchmark cleanup#3mustiikhalil merged 3 commits intomustiikhalil:update-struct-pushing-to-bufferfrom
Conversation
|
|
||
| Benchmark("structs") { benchmark in | ||
| let structCount = 1_000_000 | ||
|
|
||
| Benchmark("Structs", configuration: kiloConfiguration) { benchmark in | ||
| let structCount = 1_000 |
There was a problem hiding this comment.
why are the structCount 1_000 now?
There was a problem hiding this comment.
Yeah, should have commented that - sorry - it would crash due to what looks like OOM with 1M:
* thread #2, queue = 'com.apple.root.default-qos.cooperative', stop reason = Swift runtime failure: Not enough bits to represent the passed value
frame #0: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] Swift runtime failure: Not enough bits to represent the passed value at <compiler-generated>:0 [opt]
* frame #1: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] generic specialization <Swift.UInt32, Swift.Int> of Swift.UnsignedInteger< where τ_0_0: Swift.FixedWidthInteger>.init<τ_0_0 where τ_1_0: Swift.BinaryInteger>(τ_1_0) -> τ_0_0 at <compiler-generated>:0 [opt]
frame #2: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter(self=4294967296) at Int+extension.swift:31:13 [opt]
frame #3: 0x00000001001202c8 FlatbuffersBenchmarks`ByteBuffer.Storage.reallocate(size=80, writerSize=2147483576, alignment=8, self=0x00000001005c4b70) at ByteBuffer.swift:88:27 [opt]
frame #4: 0x0000000100120c98 FlatbuffersBenchmarks`closure #1 (Swift.UnsafeRawBufferPointer) -> () in FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements: Swift.Array<τ_0_0>) -> () at ByteBuffer.swift:371:16
frame #5: 0x0000000100129720 FlatbuffersBenchmarks`partial apply for closure #1 in ByteBuffer.push<A>(elements:) at <compiler-generated>:0 [opt]
frame #6: 0x0000000190348b6c libswiftCore.dylib`Swift.Array.withUnsafeBytes<τ_0_0>((Swift.UnsafeRawBufferPointer) throws -> τ_1_0) throws -> τ_1_0 + 352
frame #7: 0x0000000100126efc FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(ofStructs:) [inlined] FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements=<unavailable>, self=FlatBuffers.ByteBuffer @ 0x000000016fe863e0) -> () at ByteBuffer.swift:246:14 [opt]
frame #8: 0x0000000100126ec0 FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(structs=<unavailable>, self=FlatBuffers.FlatBufferBuilder @ 0x000000016fe863d8) at FlatBufferBuilder.swift:626:9 [opt]
frame #9: 0x0000000100130d80 FlatbuffersBenchmarks`closure #12 in closure #1 in variable initialization expression of benchmarks(benchmark=<unavailable>, array=5 values) at FlatbuffersBenchmarks.swift:153:25 [opt]
frame #10: 0x00000001000a51d0 FlatbuffersBenchmarks`BenchmarkExecutor.run(_:) at Benchmark.swift:344:13 [opt]
frame #11: 0x00000001000a51b8 FlatbuffersBenchmarks`BenchmarkExecutor.run(benchmark=0x00000001006b0d80, self=0x0000000100605480) at BenchmarkExecutor.swift:53:23 [opt]
frame #12: 0x00000001000c6988 FlatbuffersBenchmarks`BenchmarkRunner.run(self=Benchmark.BenchmarkRunner @ 0x0000000100b8c440) at BenchmarkRunner.swift:192:49 [opt]
frame #13: 0x00000001000c2f84 FlatbuffersBenchmarks`static BenchmarkRunnerHooks.main(self=0x00000001001ad048) at BenchmarkRunner.swift:92 [opt]
frame #14: 0x000000010015fd58 FlatbuffersBenchmarks`specialized thunk for @escaping @convention(thin) @async () -> () at <compiler-generated>:0 [opt]There was a problem hiding this comment.
(previously the inner loop was a single iteration, when we moved up to kiloConfiguration we run 1K inner loops, so it gets to 1M total - but I guess the question here is really "what do you want to measure"? - we are putting 1M vectors into the single fb here - what is the intended desired benchmark really?)
There was a problem hiding this comment.
Can we keep the million but make the iterations less? Or we clear the buffer after each iteration?
There was a problem hiding this comment.
Clearing buffer gave runtime of 220 seconds (+another 220 seconds for the single warmup iteration).
I will return it back to how it was with single iteration, it still gives 10+ samples and runtime ~223ms.
There was a problem hiding this comment.
Structs
╒════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) │ 21 │ 21 │ 21 │ 21 │ 21 │ 21 │ 21 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │ 155 │ 155 │ 155 │ 155 │ 156 │ 156 │ 156 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K) │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms) │ 223 │ 223 │ 224 │ 225 │ 226 │ 228 │ 228 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms) │ 224 │ 225 │ 225 │ 226 │ 226 │ 229 │ 229 │ 13 │
╘════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
There was a problem hiding this comment.
But in the code above we are only measuring the amount it takes to add 5 structs into the fb right? or are we measuring the amount of time we add 5 structs a million time into a buffer? as in the total time?
There was a problem hiding this comment.
That is 5 structs a million times into a buffer, for _ in benchmark.scaledIterations { is 1M times
Added ARC metric, some overall cleanup to use the built-in support for inner loops.