How can I avoid the test explosion problem?
To move from an abstract high-level scenario to concrete traffic scenarios which can be simulated, we have to define values for a large number of parameters, which is not limited to the parameters of the abstract traffic scenario. One additional aspect is the coverage of the ODD (Operational Design Domain), which describes the specific operating conditions in which the vehicle is supposed to be operated, for example, road types or even weather conditions.
So how many test scenarios will we need? If we try to simulate all parameter combinations (sometimes also called “brute force”), it will lead to an unfeasibly high amount of simulation runs. On the other hand, variation strategies which are based on random algorithms will not be able to sufficiently cover the interesting and safety-critical corner cases. So we need a smarter approach to generate test scenarios intelligently in way which still leads to a good level of confidence in the safety of the system.
Let’s look at an example and calculate the number of test scenarios needed to cover a certain amount of parameters. If we optimistically assume only 100 parameters and only 3 different values per parameter, we will need around 10^48 (1 Octillion) test scenarios. If we then optimistically assume that each simulation takes one second, we get to an overall simulation time of 10^34 years.
Because of the complexity described above, the goal of an intelligent test scenario generation should be the following: Achieve the required confidence regarding the safety of the system without simulating all possible parameter combinations. To achieve this goal, we are reducing the number of tests by focusing on the more interesting scenarios and create less of the uninteresting ones. Of course, this immediately raises the question regarding the definition of “interesting”. In this context, the expression “interesting” can have two meanings. It can be either scenarios which are very likely to happen or scenarios which contain situations which are safety-critical. On the other hand, this means that we would have fewer test cases which are both unlikely and at the same time not safety critical. We are achieving this distribution with a two-step approach, in which each step has its own test-end criteria.
The first step separates parameter ranges into manageable parts while obeying a given variation strategy such as probabilistic distributions. This strategy enables us to get more of the likely scenarios and less of the unlikely ones. The basis for this process is the definition of multiple individual value ranges for each parameter. As an example, let’s assume a vehicle which is supposed to have a velocity between 0 km/h and 60 km/h. Similar to the concept of “Equivalence Classes” in the ISO 26262, this value range can be sliced into individual ranges, for example 0-40 km/h, 40-50 km/h and 50-60km/h. Subsequently, a probability is defined for each of these individual value ranges. For each variant of a scenario, the multiplication of these probabilities will lead to a resulting probability and a user-defined threshold allows to control which scenarios will actually get executed.
To achieve a particularly high level of confidence regarding the behavior during safety-critical situations, the second variation step explores the parameter space using AI technology to detect SUT-specific weaknesses. The weakness will get formally defined by a weakness function that can be evaluated after each simulation run. One example for such a weakness function could be the time-to-collision, where smaller values correspond to a more safety critical situation (and therefore a higher weakness score). The weakness detection can therefore be described as an optimization problem with the goal to find local and global maxima in an n-dimensional space (n being the number of different parameters).
The outcome of this “Weakness Detection” step is either a test case leading to a critical or unwanted behavior of the SUT, or absence of weaknesses is reported in probabilistic terms.
This will lead to a lower amount of test cases for situations which are less critical (for example because all traffic participants are far away from each other) and therefore ensure that the total number of test cases does not explode.