Skip to content

Commit dc1afe1

Browse files
Add xpdata.MapBuilder struct (#13617)
#### Description This PR adds a new `MapBuilder` struct to the `xpdata` package, which can be used to more efficiently create a `pcommon.Map` in receivers. The simplest way to insert data in a `Map` is to repeatedly call `Map.PutEmpty`, `Map.PutStr`, or other methods of the `Put` family. These methods all handle duplicate keys on a "last write wins" basis by calling `Map.Get` to find a potential existing value for the inserted key. Unfortunately, `Get` is rather slow, as it performs a linear scan of the `Map`'s key/value pairs. This means that the repeated `Put` method has quadratic complexity, and has significant overhead even in cases with a realistic number of keys. `MapBuilder` allows components to build up a list of key/value pairs, then turn them into a `Map` with very little overhead using the `UnsafeIntoMap` method. However, this has the caveat that, if used inappropriately, the final `Map` may contain duplicate keys, which breaks a basic invariant of the data structure and requirement of the OTLP protocol, with unpredictable consequences. I created [a benchmark](https://github.com/jade-guiton-dd/opentelemetry-collector/blob/549e107e852d16f299f4199816a6c20eb5a5ea32/pdata/pcommon/map_experiments_test.go#L75) to test this method, as well as a number of alternatives to optimize insertions. The benchmark was performed on realistic data containing duplicate keys, realistic data without duplicate keys, and worst-case data (200 keys, treated as if they may contain duplicate keys). Here are the basic results: - Using `MapBuilder.UnsafeIntoMap` is the fastest option in all three scenarios. In scenarios with duplicate keys, we perform an upfront sorting + neighbor-comparison deduplication step before calling the method. - Realistic data without duplicates: This results in **-49% runtime** compared to the `Put` version. - Realistic data with duplicates: Despite the extra work of the deduplication step, this still results in -11% runtime. This is the use case that I am interested in using. - Adding the data into a Go `map` and using the existing `Map.FromRaw` method, with code to reuse the `map` across iterations: - Worst-case data: Using `FromRaw` succeeds at reducing the worst-case complexity (-92% runtime compared to using `Put`), but manual sort-based deduplication + `MapBuilder.UnsafeIntoMap` still results in a further -56% runtime reduction. - Realistic data: Somewhat slower than `Put` (+3% runtime with duplicates / +29% runtime with no duplicates), and performs more memory allocations. - These numbers are significantly worse in the simpler (and more parallelizable) version where a new `map` is allocated each time. - The safer alternative `XIntoMap` methods, which check that the input keys are distinct in various ways, are all too slow to be useful in the "realistic, no duplicates" case, and are worse than `UnsafeIntoMap` in other cases. I would consider adding them only if `UnsafeIntoMap` is deemed too dangerous to even be in an experimental API like `xpdata`. - Changing `pcommon.Map` internals to use a sorted array or Go `map` representation in order to reduce the cost of `Put`/`Get` succeeds at reducing the worst-case time complexity as well, but is invariably slower than the `Put` version in the realistic scenarios. #### Testing I added a basic unit test for `MapBuilder`'s functionality.
1 parent 524b044 commit dc1afe1

File tree

3 files changed

+115
-0
lines changed

3 files changed

+115
-0
lines changed

.chloggen/xpdata-mapbuilder.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: 'enhancement'
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver)
7+
component: xpdata
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: Add experimental MapBuilder struct to optimize pcommon.Map construction
11+
12+
# One or more tracking issues or pull requests related to the change
13+
issues: [13617]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext:
19+
20+
# Optional: The change log or logs in which this entry should be included.
21+
# e.g. '[user]' or '[user, api]'
22+
# Include 'user' if the change is relevant to end users.
23+
# Include 'api' if there is a change to a library API.
24+
# Default: '[user]'
25+
change_logs: [api]

pdata/xpdata/map_builder.go

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package xpdata // import "go.opentelemetry.io/collector/pdata/xpdata"
5+
6+
import (
7+
"go.opentelemetry.io/collector/pdata/internal"
8+
otlpcommon "go.opentelemetry.io/collector/pdata/internal/data/protogen/common/v1"
9+
"go.opentelemetry.io/collector/pdata/pcommon"
10+
)
11+
12+
// MapBuilder is an experimental struct which can be used to create a pcommon.Map more efficiently
13+
// than by repeated use of the Put family of methods, which check for duplicate keys on every call
14+
// (a linear time operation).
15+
// A zero-initialized MapBuilder is ready for use.
16+
type MapBuilder struct {
17+
state internal.State
18+
pairs []otlpcommon.KeyValue
19+
}
20+
21+
// EnsureCapacity increases the capacity of this MapBuilder instance, if necessary,
22+
// to ensure that it can hold at least the number of elements specified by the capacity argument.
23+
func (mb *MapBuilder) EnsureCapacity(capacity int) {
24+
mb.state.AssertMutable()
25+
oldValues := mb.pairs
26+
if capacity <= cap(oldValues) {
27+
return
28+
}
29+
mb.pairs = make([]otlpcommon.KeyValue, len(oldValues), capacity)
30+
copy(mb.pairs, oldValues)
31+
}
32+
33+
func (mb *MapBuilder) getValue(i int) pcommon.Value {
34+
return pcommon.Value(internal.NewValue(&mb.pairs[i].Value, &mb.state))
35+
}
36+
37+
// AppendEmpty appends a key/value pair to the MapBuilder and return the inserted value.
38+
// This method does not check for duplicate keys and has an amortized constant time complexity.
39+
func (mb *MapBuilder) AppendEmpty(k string) pcommon.Value {
40+
mb.state.AssertMutable()
41+
mb.pairs = append(mb.pairs, otlpcommon.KeyValue{Key: k})
42+
return mb.getValue(len(mb.pairs) - 1)
43+
}
44+
45+
// UnsafeIntoMap transfers the contents of a MapBuilder into a Map, without checking for duplicate keys.
46+
// If the MapBuilder contains duplicate keys, the behavior of the resulting Map is unspecified.
47+
// This operation has constant time complexity and makes no allocations.
48+
// After this operation, the MapBuilder becomes read-only.
49+
func (mb *MapBuilder) UnsafeIntoMap(m pcommon.Map) {
50+
mb.state.AssertMutable()
51+
internal.GetMapState(internal.Map(m)).AssertMutable()
52+
mb.state = internal.StateReadOnly // to avoid modifying a Map later marked as ReadOnly through builder Values
53+
*internal.GetOrigMap(internal.Map(m)) = mb.pairs
54+
}

pdata/xpdata/map_builder_test.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package xpdata_test
5+
6+
import (
7+
"testing"
8+
9+
"github.com/stretchr/testify/assert"
10+
11+
"go.opentelemetry.io/collector/pdata/pcommon"
12+
"go.opentelemetry.io/collector/pdata/xpdata"
13+
)
14+
15+
func TestMapBuilder(t *testing.T) {
16+
var mb xpdata.MapBuilder
17+
mb.EnsureCapacity(3)
18+
mb.AppendEmpty("key1").SetStr("val")
19+
mb.AppendEmpty("key2").SetInt(42)
20+
21+
m := pcommon.NewMap()
22+
mb.UnsafeIntoMap(m)
23+
24+
assert.Equal(t, 2, m.Len())
25+
val, ok := m.Get("key1")
26+
assert.True(t, ok && val.Type() == pcommon.ValueTypeStr && val.Str() == "val")
27+
val, ok = m.Get("key2")
28+
assert.True(t, ok && val.Type() == pcommon.ValueTypeInt && val.Int() == 42)
29+
30+
assert.Panics(t, func() {
31+
mb.AppendEmpty("key3") // mb should now be read-only
32+
})
33+
assert.NotPanics(t, func() {
34+
m.PutEmpty("key3") // m should still be mutable
35+
})
36+
}

0 commit comments

Comments
 (0)