You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/NightStatisticsMonitoring.md
+33-20Lines changed: 33 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -11,21 +11,21 @@ Thus, we decided to do it every night when (hopefully!) no one makes changes.
11
11
12
12
## How do we collect statistics?
13
13
14
-
To find the algorithm you can refer to StatisticsMonitoring.kt. Shortly speaking, it is based on ContestEstimator.kt, which runs test generation on the sample projects and then compile the resulting tests. We repeat the whole process several times to reduce measurement error.
14
+
To find the algorithm you can refer to `StatisticsMonitoring.kt`. Shortly speaking, it is based on `ContestEstimator.kt`, which runs test generation on the sample projects and then compile the resulting tests. We repeat the whole process several times to reduce measurement error.
15
15
16
16
## Statistics monitoring usage
17
17
18
18
### Collecting statistics
19
19
20
-
To run statistics monitoring you have to specify the name of the json output file.
20
+
To run statistics monitoring you have to specify the name of the JSON output file.
21
21
22
22
Input arguments: `<output json>`.
23
23
24
-
Output format: you get the json file, which contains an array of objects with statistics on each run.
24
+
Output format: you get the JSON file, which contains an array of objects with statistics on each run.
25
25
26
-
More about each statistic: Statistics.kt.
26
+
More about each statistic: `Statistics.kt`.
27
27
28
-
More about monitoring settings: MonitoringSettings.kt.
28
+
More about monitoring settings: `MonitoringSettings.kt`.
29
29
30
30
Input example:
31
31
@@ -90,22 +90,27 @@ Output example (the result of three runs during one night):
90
90
91
91
### Metadata and summarising
92
92
93
-
We can summarise collected statistics by averaging to get statistics more precisely.
93
+
To get rid of measurement errors and get a general understanding of UnitTestBot efficiency we average the statistics over the runs during one night.
94
94
95
-
In addition, we need more information about the environment and versions used to collect statistics for better interpretation and analysis.
95
+
Our main goal is to find code changes or run conditions related to the reduced UnitTestBot performance. Thus, we collect metadata about each run: the commit hash, the UnitTestBot build number, and also information about the environment (including JDK and build system versions, and other parameters).
96
96
97
-
You can find script insert_metadata.py to do tasks described before.
97
+
The `insert_metadata.py`script is responsible for doing this. To run it you have to specify the following arguments.
Output example (an average for each statistic over the three runs followed by metadata):
109
114
```json
110
115
{
111
116
"classes_for_generation": 20.0,
@@ -139,20 +144,26 @@ Output example:
139
144
140
145
### Aggregating
141
146
142
-
Script build_aggregated_data.py creates a file with an array of statistics collected during specified period. It can be needed for visualisation or analyzing some statistics as max, min, median etc.
147
+
The `build_aggregated_data.py` script gathers the results for several nights. The summarised results for each of the nights are put together into one array. You can specify the period for aggregating. It is useful for visualising or finding statistical characteristics of UnitTestBot performance, e.g. the median or max/min values.
Required name format of file with data: `*-<timestamp>-<commit hash>.json`.
153
+
Please notice that the `<input data dir>` must contain the files which look like `*-<timestamp>-<commit hash>.json`. You (probably) have already named them properly during summarisation.
147
154
148
-
Output format: you get the json file, which contains an array of objects with statistics collected during specified period.
155
+
Output format: you get the JSON file, which contains an array of summarised results for each of the nights during the specified period.
149
156
150
157
Input example:
158
+
151
159
```
152
160
./data aggregated_data.json 0 1660740407
153
161
```
154
162
155
163
Output example:
164
+
165
+
(You'll get an array of several summarised outputs without metadata. The following example is just one element of such an array.)
166
+
156
167
```json
157
168
[
158
169
{
@@ -176,14 +187,16 @@ Output example:
176
187
177
188
### Datastorage structure
178
189
179
-
Our repository is used as database for collected statistics.
190
+
We store the collected statistics in our repository. You can find two special branches: `monitoring-data` and `monitoring-aggregated-data`.
191
+
192
+
The `monitoring-data` branch is a storage for raw statistics data as well as metadata.
193
+
194
+
The filename format: `data-<branch>-<yyyy>-<mm>-<dd>-<timestamp>-<short commit hash>.json`
180
195
181
-
There are 2 branches: monitoring-data, monitoring-aggregated-data.
196
+
The `monitoring-aggregated-data` branch is a storage for aggregated statistics. The aggregating period is set to one month by default.
182
197
183
-
monitoring-data branch is used as a storage for raw collected statistics with metadata. Filename format: `data-<branch>-<yyyy>-<mm>-<dd>-<timestamp>-<short commit hash>.json`
198
+
The filename format: `aggregated-data-<yyyy>-<mm>-<dd>.json`
184
199
185
-
monitoring-aggregated-data branch is used as a storage for aggregated statistics. Specified period is one month. Filename format: `aggregated-data-<yyyy>-<mm>-<dd>.json`
200
+
### Grafana (in process)
186
201
187
-
### Grafana (TODO: in process)
188
-
Also, we can use [Grafana](https://monitoring.utbot.org) for more dynamic and detailed statistics visualization.
189
-
Grafana pulls data from our repository automatically by GitHub API.
202
+
We can use [Grafana](https://monitoring.utbot.org) for more dynamic and detailed statistics visualisation. Grafana pulls data from our repository automatically by means of GitHub API.
0 commit comments