-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Improve compatibility testing against Prometheus and TSDB. #758
Description
Let's start with acceptance criteria for our compatibity tests:
- I would like to know with which versions of Prometheus Thanos supports on each PR.
But what we mean by supports? We have essentially 2 points of contact:
- TSDB format (including very low level index and metadata scheme)
- HTTP API (
api/v1/flags,api/v1/config,api/v1/label/,api/v1/read, api/v1/snapshot` etc)
Storage format can change but is versioned (index and metadata separatedly e.g index version changed somewhere between 2.0 and 2.2.1), HTTP API should not for v1 but things get added (e.g api/v1/flags was added in v2.2.1, snaphot endpoint was extended etc)
Goal: Support all minor Prometheus versions (e.g 2.0, 2.2, ... 2.7.. etc) There are expections. For example broken Prometheus releases like 2.1.x. This means that we would like to test and support to tip of minor version (e.g 2.4.3 for 2.4).
How we test this now?
Now (Before #704 PR or #730 lands, depending which will land first), our current method for testing compatibility is to perform on CI:
SUPPORTED_PROM_VERSIONS ?=v2.2.1 v2.3.2 v2.4.3 v2.5.0
@for ver in $(SUPPORTED_PROM_VERSIONS); do \
THANOS_TEST_PROMETHEUS_PATH="prometheus-$$ver" THANOS_TEST_ALERTMANAGER_PATH="alertmanager-$(ALERTMANAGER_VERSION)" go test $(shell go list ./... | grep -v /vendor/ | grep -v /benchmark/); \
done
This runs ALL our tests with different THANOS_TEST_PROMETHEUS_PATH var which controlled which Prometheus binary is used for our e2e tests (we have quite few of them. All tests that ends up with e2e suffix in name). This tests were fine to check if we support our common points as mentioned above.
The problems we see:
- Upgrade of TSDB Golang dependencies (like here) blocks our ability to do any advance testing methods like injecting blocks here or here. This is because, obviously as new Promethus versions are backward compatible with old TSDB format versions, the old Prometheus versions are not forward compatible with new format.
- We run ALL tests against different Prometheus versions using external for loop. This means:
- We might hit golang using cache all the time, because changing some environment variable is not seen by caching logic and it can assume code being not changed, thus cache being used.
- Something is wrong with signal handling, as some tests are green, but actually should fail: https://circleci.com/gh/improbable-eng/thanos/1935?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
- What if config/flags change in some Prometheus version?
Note that upgrading TSDB and Prometheus dependencies is essential to stay up to date with fixes and recent optimizations. We reuse lots of packages.
Extra:
As a nice-to-have we would like to make sure anyone can grab TSDB block from object storage to Prometheus and use it there. This means that we need test if compactor produced block is compatible with Prometheus and if yes, with what version (aiming for just latest is fine). How to test that?