Test | Category | TIC | FIC | iDFlakies-commit |
---|---|---|---|---|
org.fluentd.logger.TestFluentLogger.testReconnection | NDOD;UD;NOD | 5fd46383 | 5fd46383 | da14ec34 |
org.fluentd.logger.TestFluentLogger.testClose | NDOD;UD | 87e957ae | 87e957ae | da14ec34 |
Flaky tests are “tests that can intermittently pass or fail, even for the same code version.” (Luo et al. 2014)
“tests that cause spurious failures without any code changes, i.e., flaky tests” (Gruber et al. 2021)
“Tests that fail inconsistently, without changes to the code under test, are described as flaky.” (Parry et al. 2021)
“some test failures may not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests” (Bell et al. 2018)
“our goal is to distinguish between flaky failures and failures caused by regression” (Gruber et al. 2023a)
Is there any flaky test in our code base that we should fix?
Numerous research datasets available, many integrated in IDoFT (Lam 2020)
Is a test failure in CI due to a regression in the CUT or not?
Single dataset of flaky test failures within regression test history (Gruber et al. 2023b)
Requirements:
Options:
Test | Category | TIC | FIC | iDFlakies-commit |
---|---|---|---|---|
org.fluentd.logger.TestFluentLogger.testReconnection | NDOD;UD;NOD | 5fd46383 | 5fd46383 | da14ec34 |
org.fluentd.logger.TestFluentLogger.testClose | NDOD;UD | 87e957ae | 87e957ae | da14ec34 |
Slug (Module) | FIC Hash | Tests | Commits | Av. Commits/Test | Flaky Tests | Tests w/ Consistent Failures | Total Distinct Histories |
---|---|---|---|---|---|---|---|
fluent/fluent-logger-java | 5fd463 | 19 | 131 | 105.6 | 11 | 2 | 8.0x10^32 |
fluent/fluent-logger-java | 87e957 | 19 | 160 | 122.4 | 11 | 3 | 2.1x10^31 |
Commit Hash | Test Method | Distinct Results |
---|---|---|
5fd4638 | testNormal03 | 2 |
de2b9f4 | testNormal03 | 1 |
30a7221 | testNormal03 | 1 |
6aece14 | testNormal03 | 1 |
d1077ae | testNormal03 | 3 |
a7da917 | testNormal03 | 1 |
7f5eb6b | testNormal03 | 1 |
43869ca | testNormal03 | 2 |
a646dbf | testNormal03 | 2 |
2f3f8a2 | testNormal03 | 1 |
Nr. possible histories:
\(2 \times 3 \times 2 \times 2 = 24\)
Commit Hash | Verdict | Verdict Type | Message |
---|---|---|---|
5fd4638 | passed | . | . |
5fd4638 | failure | java.lang.AssertionError | expected:<10000> but was:<0> |
d1077ae | passed | . | . |
d1077ae | failure | java.lang.AssertionError | expected:<10000> but was:<3543> |
d1077ae | failure | java.lang.AssertionError | expected:<10000> but was:<2234> |
43869ca | passed | . | . |
43869ca | failure | java.lang.AssertionError | expected:<10000> but was:<2234> |
a646dbf | passed | . | . |
a646dbf | failure | java.lang.AssertionError | expected:<10000> but was:<0> |
Slug (Module) | FIC Hash | Tests | Commits | Av. Commits/Test | Flaky Tests | Tests w/ Consistent Failures | Total Distinct Histories |
---|---|---|---|---|---|---|---|
TooTallNate/Java-WebSocket | 822d40 | 146 | 75 | 75.0 | 24 | 1 | 2.6x10^9 |
apereo/java-cas-client (cas-client-core) | 5e3655 | 157 | 65 | 61.7 | 3 | 2 | 1.0x10^7 |
eclipse-ee4j/tyrus (tests/e2e/standard-config) | ce3b8c | 185 | 16 | 16.0 | 12 | 0 | 261 |
feroult/yawp (yawp-testing/yawp-testing-appengine) | abae17 | 1 | 191 | 191.0 | 1 | 1 | 8 |
fluent/fluent-logger-java | 5fd463 | 19 | 131 | 105.6 | 11 | 2 | 8.0x10^32 |
fluent/fluent-logger-java | 87e957 | 19 | 160 | 122.4 | 11 | 3 | 2.1x10^31 |
javadelight/delight-nashorn-sandbox | d0d651 | 81 | 113 | 100.6 | 2 | 5 | 4.2x10^10 |
javadelight/delight-nashorn-sandbox | d19eee | 81 | 93 | 83.5 | 1 | 5 | 2.6x10^9 |
sonatype-nexus-community/nexus-repository-helm | 5517c8 | 18 | 32 | 32.0 | 0 | 0 | 18 |
spotify/helios (helios-services) | 023260 | 190 | 448 | 448.0 | 0 | 37 | 190 |
spotify/helios (helios-testing) | 78a864 | 43 | 474 | 474.0 | 0 | 7 | 43 |
main
/master
testNormal03
from fluent-logger
in commit 43b2c3d
.
Only verdict: Non-flaky
Including message: Flaky
Verdict | Verdict Type | Message |
---|---|---|
failure | java.lang.AssertionError | expected:<10000> but was:<9339> |
failure | java.lang.AssertionError | expected:<10000> but was:<7726> |
failure | java.lang.AssertionError | expected:<10000> but was:<8166> |
failure | java.lang.AssertionError | expected:<10000> but was:<5235> |
failure | java.lang.AssertionError | expected:<10000> but was:<6180> |
failure | java.lang.AssertionError | expected:<10000> but was:<8818> |
failure | java.lang.AssertionError | expected:<10000> but was:<9630> |
failure | java.lang.AssertionError | expected:<10000> but was:<8801> |
failure | java.lang.AssertionError | expected:<10000> but was:<8067> |
failure | java.lang.AssertionError | expected:<10000> but was:<8507> |
failure | java.lang.AssertionError | expected:<10000> but was:<6533> |
failure | java.lang.AssertionError | expected:<10000> but was:<5308> |
failure | java.lang.AssertionError | expected:<10000> but was:<7450> |
failure | java.lang.AssertionError | expected:<10000> but was:<7889> |
failure | java.lang.AssertionError | expected:<10000> but was:<9343> |
failure | java.lang.AssertionError | expected:<10000> but was:<7490> |
failure | java.lang.AssertionError | expected:<10000> but was:<8353> |
failure | java.lang.AssertionError | expected:<10000> but was:<8815> |
failure | java.lang.AssertionError | expected:<10000> but was:<7697> |
failure | java.lang.AssertionError | expected:<10000> but was:<8965> |
failure | java.lang.AssertionError | expected:<10000> but was:<8459> |
failure | java.lang.AssertionError | expected:<10000> but was:<8326> |
failure | java.lang.AssertionError | expected:<10000> but was:<8372> |
failure | java.lang.AssertionError | expected:<10000> but was:<8292> |
failure | java.lang.AssertionError | expected:<10000> but was:<5553> |
failure | java.lang.AssertionError | expected:<10000> but was:<8938> |
failure | java.lang.AssertionError | expected:<10000> but was:<9669> |
failure | java.lang.AssertionError | expected:<10000> but was:<7566> |
failure | java.lang.AssertionError | expected:<10000> but was:<8791> |
failure | java.lang.AssertionError | expected:<10000> but was:<6360> |
testNormal01
from fluent-logger
flags regression in commit 7046496
4924e54
Example: testNormal03
from fluent-logger
Commit Hash | Distinct results |
---|---|
167dee4 | 2 |
189337a | 3 |
2e67bc0 | 2 |
3268963 | 2 |
36ae754 | 2 |
37744e2 | 2 |
3ae1bbd | 2 |
43869ca | 2 |
43b2c3d | 30 |
4ecd3f2 | 2 |
58610c7 | 2 |
82b109d | 2 |
87e957a | 27 |
8d418ae | 2 |
8fe164f | 2 |
a061b9e | 2 |
abc5024 | 2 |
aef9865 | 2 |
b70b1f0 | 2 |
b97b239 | 2 |
cc7a1f8 | 3 |
cd9bae3 | 2 |
cfffb7e | 29 |
d608b06 | 2 |
ff26da1 | 2 |
Test repetitions
but along a commit range
for studying flaky test effects on regression testing.
Slug (Module) | FIC Hash | Tests | Commits | Av. Commits/Test | Flaky Tests | Tests w/ Consistent Failures | Total Distinct Histories |
---|---|---|---|---|---|---|---|
TooTallNate/Java-WebSocket | 822d40 | 146 | 75 | 75.0 | 24 | 1 | 2.6x10^9 |
apereo/java-cas-client (cas-client-core) | 5e3655 | 157 | 65 | 61.7 | 3 | 2 | 1.0x10^7 |
eclipse-ee4j/tyrus (tests/e2e/standard-config) | ce3b8c | 185 | 16 | 16.0 | 12 | 0 | 261 |
feroult/yawp (yawp-testing/yawp-testing-appengine) | abae17 | 1 | 191 | 191.0 | 1 | 1 | 8 |
fluent/fluent-logger-java | 5fd463 | 19 | 131 | 105.6 | 11 | 2 | 8.0x10^32 |
fluent/fluent-logger-java | 87e957 | 19 | 160 | 122.4 | 11 | 3 | 2.1x10^31 |
javadelight/delight-nashorn-sandbox | d0d651 | 81 | 113 | 100.6 | 2 | 5 | 4.2x10^10 |
javadelight/delight-nashorn-sandbox | d19eee | 81 | 93 | 83.5 | 1 | 5 | 2.6x10^9 |
sonatype-nexus-community/nexus-repository-helm | 5517c8 | 18 | 32 | 32.0 | 0 | 0 | 18 |
spotify/helios (helios-services) | 023260 | 190 | 448 | 448.0 | 0 | 37 | 190 |
spotify/helios (helios-testing) | 78a864 | 43 | 474 | 474.0 | 0 | 7 | 43 |