Skip to content

Commit 7dd7719

Browse files
author
Rustam Sadykov
committed
update docs
1 parent 4bacc05 commit 7dd7719

File tree

1 file changed

+33
-20
lines changed

1 file changed

+33
-20
lines changed

docs/NightStatisticsMonitoring.md

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,21 @@ Thus, we decided to do it every night when (hopefully!) no one makes changes.
1111

1212
## How do we collect statistics?
1313

14-
To find the algorithm you can refer to StatisticsMonitoring.kt. Shortly speaking, it is based on ContestEstimator.kt, which runs test generation on the sample projects and then compile the resulting tests. We repeat the whole process several times to reduce measurement error.
14+
To find the algorithm you can refer to `StatisticsMonitoring.kt`. Shortly speaking, it is based on `ContestEstimator.kt`, which runs test generation on the sample projects and then compile the resulting tests. We repeat the whole process several times to reduce measurement error.
1515

1616
## Statistics monitoring usage
1717

1818
### Collecting statistics
1919

20-
To run statistics monitoring you have to specify the name of the json output file.
20+
To run statistics monitoring you have to specify the name of the JSON output file.
2121

2222
Input arguments: `<output json>`.
2323

24-
Output format: you get the json file, which contains an array of objects with statistics on each run.
24+
Output format: you get the JSON file, which contains an array of objects with statistics on each run.
2525

26-
More about each statistic: Statistics.kt.
26+
More about each statistic: `Statistics.kt`.
2727

28-
More about monitoring settings: MonitoringSettings.kt.
28+
More about monitoring settings: `MonitoringSettings.kt`.
2929

3030
Input example:
3131

@@ -90,22 +90,27 @@ Output example (the result of three runs during one night):
9090

9191
### Metadata and summarising
9292

93-
We can summarise collected statistics by averaging to get statistics more precisely.
93+
To get rid of measurement errors and get a general understanding of UnitTestBot efficiency we average the statistics over the runs during one night.
9494

95-
In addition, we need more information about the environment and versions used to collect statistics for better interpretation and analysis.
95+
Our main goal is to find code changes or run conditions related to the reduced UnitTestBot performance. Thus, we collect metadata about each run: the commit hash, the UnitTestBot build number, and also information about the environment (including JDK and build system versions, and other parameters).
9696

97-
You can find script insert_metadata.py to do tasks described before.
97+
The `insert_metadata.py` script is responsible for doing this. To run it you have to specify the following arguments.
9898

9999
Input arguments: `<stats file> <output file> <commit hash> <build number>`.
100100

101-
Output format: you get the json file, which contains object with summarised statistics and metadata.
101+
Please notice, that the `<output file>` must look like:
102+
103+
`*-<timestamp>-<commit hash>.json`
104+
105+
106+
Output format: you get the JSON file, containing summarised statistics and metadata.
102107

103108
Input example:
104109
```
105110
stats.json data/data-main-2022-08-17-1660740407-66a1aeb6.json 66a1aeb6 2022.8
106111
```
107112

108-
Output example:
113+
Output example (an average for each statistic over the three runs followed by metadata):
109114
```json
110115
{
111116
"classes_for_generation": 20.0,
@@ -139,20 +144,26 @@ Output example:
139144

140145
### Aggregating
141146

142-
Script build_aggregated_data.py creates a file with an array of statistics collected during specified period. It can be needed for visualisation or analyzing some statistics as max, min, median etc.
147+
The `build_aggregated_data.py` script gathers the results for several nights. The summarised results for each of the nights are put together into one array. You can specify the period for aggregating. It is useful for visualising or finding statistical characteristics of UnitTestBot performance, e.g. the median or max/min values.
148+
149+
To run aggregating you should provide the input.
143150

144151
Input arguments: `<input data dir> <output file> <timestamp from> <timestamp to>`.
145152

146-
Required name format of file with data: `*-<timestamp>-<commit hash>.json`.
153+
Please notice that the `<input data dir>` must contain the files which look like `*-<timestamp>-<commit hash>.json`. You (probably) have already named them properly during summarisation.
147154

148-
Output format: you get the json file, which contains an array of objects with statistics collected during specified period.
155+
Output format: you get the JSON file, which contains an array of summarised results for each of the nights during the specified period.
149156

150157
Input example:
158+
151159
```
152160
./data aggregated_data.json 0 1660740407
153161
```
154162

155163
Output example:
164+
165+
(You'll get an array of several summarised outputs without metadata. The following example is just one element of such an array.)
166+
156167
```json
157168
[
158169
{
@@ -176,14 +187,16 @@ Output example:
176187

177188
### Datastorage structure
178189

179-
Our repository is used as database for collected statistics.
190+
We store the collected statistics in our repository. You can find two special branches: `monitoring-data` and `monitoring-aggregated-data`.
191+
192+
The `monitoring-data` branch is a storage for raw statistics data as well as metadata.
193+
194+
The filename format: `data-<branch>-<yyyy>-<mm>-<dd>-<timestamp>-<short commit hash>.json`
180195

181-
There are 2 branches: monitoring-data, monitoring-aggregated-data.
196+
The `monitoring-aggregated-data` branch is a storage for aggregated statistics. The aggregating period is set to one month by default.
182197

183-
monitoring-data branch is used as a storage for raw collected statistics with metadata. Filename format: `data-<branch>-<yyyy>-<mm>-<dd>-<timestamp>-<short commit hash>.json`
198+
The filename format: `aggregated-data-<yyyy>-<mm>-<dd>.json`
184199

185-
monitoring-aggregated-data branch is used as a storage for aggregated statistics. Specified period is one month. Filename format: `aggregated-data-<yyyy>-<mm>-<dd>.json`
200+
### Grafana (in process)
186201

187-
### Grafana (TODO: in process)
188-
Also, we can use [Grafana](https://monitoring.utbot.org) for more dynamic and detailed statistics visualization.
189-
Grafana pulls data from our repository automatically by GitHub API.
202+
We can use [Grafana](https://monitoring.utbot.org) for more dynamic and detailed statistics visualisation. Grafana pulls data from our repository automatically by means of GitHub API.

0 commit comments

Comments
 (0)