You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/NightStatisticsMonitoring.md
+97-31Lines changed: 97 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -1,31 +1,40 @@
1
1
# Night Statistics Monitoring
2
2
3
-
## The problem
4
-
We want to develop and improve our product and, of course,
5
-
there are some changes and its combinations
6
-
which, according to some statistics, can make UTBot worse.
3
+
## What is the problem?
7
4
8
-
## Monitoring
9
-
The main idea is collecting statistics after made changes.
10
-
But it takes too long to collect statistics on a huge project
11
-
to do it after each push into master.
12
-
Thus, we will do it every night when no one makes changes.
5
+
As UnitTestBot contributors, we'd like to constantly improve our product. There are many of us introducing code changes simultaneously — unfortunately, some changes or combinations of them may lead to reduced plugin efficiency. To avoid such an unlucky result we need to monitor statistics on test generation performance.
6
+
7
+
## Why monitor nightly?
8
+
9
+
It would be great to collect statistics as soon as the contributor changes the code. In case you have a huge project it takes too long to run the monitoring system after each push into master.
10
+
Thus, we decided to do it every night when (hopefully!) no one makes changes.
11
+
12
+
## How do we collect statistics?
13
+
14
+
To find the algorithm you can refer to StatisticsMonitoring.kt. Shortly speaking, it is based on ContestEstimator.kt, which runs test generation on the sample projects and then compile the resulting tests. We repeat the whole process several times to reduce measurement error.
15
+
16
+
## Statistics monitoring usage
13
17
14
18
### Collecting statistics
15
-
Collecting statistics StatisticsMonitoring.kt based on ContestEstimator.kt
16
-
that runs testcase generation on projects, then compile generated tests.
17
-
We run it several times. Input arguments: `<output json>`.
18
-
Output stats format: json, array of object with statistics after each running try.
19
19
20
-
More about statistic: Statistics.kt.
20
+
To run statistics monitoring you have to specify the name of the json output file.
21
+
22
+
Input arguments: `<output json>`.
23
+
24
+
Output format: you get the json file, which contains an array of objects with statistics on each run.
25
+
26
+
More about each statistic: Statistics.kt.
21
27
22
28
More about monitoring settings: MonitoringSettings.kt.
23
29
24
-
Example input:
30
+
Input example:
31
+
25
32
```
26
33
stats.json
27
34
```
28
-
Example output:
35
+
36
+
Output example (the result of three runs during one night):
37
+
29
38
```json
30
39
[
31
40
{
@@ -79,24 +88,71 @@ Example output:
79
88
]
80
89
```
81
90
82
-
### Transforming, aggregating and rendering statistics
83
-
Transforming adds total coverage statistics and timestamp.
84
-
After that all collected statistics are aggregated by average function.
85
-
Then history updates by aggregated statistics and rendered into 2 pictures:
86
-
- coverage graph - graph with coverage statistics.
87
-
- quantitative graph - graph with other quantitative statistics.
91
+
### Metadata and summarising
92
+
93
+
We can summarise collected statistics by averaging to get statistics more precisely.
94
+
95
+
In addition, we need more information about the environment and versions used to collect statistics for better interpretation and analysis.
88
96
97
+
You can find script insert_metadata.py to do tasks described before.
"java_version": "openjdk version \"1.8.0_322\"\r\nOpenJDK Runtime Environment Corretto-8.322.06.1 (build 1.8.0_322-b06)\r\nOpenJDK 64-Bit Server VM Corretto-8.322.06.1 (build 25.322-b06, mixed mode)\r\n",
131
+
"gradle_version": "Gradle 7.4",
132
+
"JAVA_HOME": "D:\\Java\\jdk",
133
+
"KOTLIN_HOME": "D:\\Kotlin\\kotlinc",
134
+
"PATH": "D:\\gradle-7.4\\bin;D:\\Java\\jre\\bin;"
135
+
}
136
+
}
137
+
}
138
+
```
139
+
140
+
### Aggregating
141
+
142
+
Script build_aggregated_data.py creates a file with an array of statistics collected during specified period. It can be needed for visualisation or analyzing some statistics as max, min, median etc.
Required name format of file with data: `*-<timestamp>-<commit hash>.json`.
147
+
148
+
Output format: you get the json file, which contains an array of objects with statistics collected during specified period.
149
+
150
+
Input example:
151
+
```
152
+
./data aggregated_data.json 0 1660740407
153
+
```
154
+
155
+
Output example:
100
156
```json
101
157
[
102
158
{
@@ -113,11 +169,21 @@ Example output:
113
169
"total_coverage": 56.84739152087949,
114
170
"total_coverage_by_fuzzing": 41.60749728061026,
115
171
"total_coverage_by_concolic": 44.420096905766805,
116
-
"timestamp": 1660202621883
172
+
"timestamp": 1660740407
117
173
}
118
174
]
119
175
```
120
176
121
-
### Grafana
177
+
### Datastorage structure
178
+
179
+
Our repository is used as database for collected statistics.
180
+
181
+
There are 2 branches: monitoring-data, monitoring-aggregated-data.
182
+
183
+
monitoring-data branch is used as a storage for raw collected statistics with metadata. Filename format: `data-<branch>-<yyyy>-<mm>-<dd>-<timestamp>-<short commit hash>.json`
184
+
185
+
monitoring-aggregated-data branch is used as a storage for aggregated statistics. Specified period is one month. Filename format: `aggregated-data-<yyyy>-<mm>-<dd>.json`
186
+
187
+
### Grafana (TODO: in process)
122
188
Also, we can use [Grafana](https://monitoring.utbot.org) for more dynamic and detailed statistics visualization.
123
189
Grafana pulls data from our repository automatically by GitHub API.
0 commit comments