In recent years, as advanced driving systems, autonomous driving, and other functions have been realized in automobiles, the software installed in ECUs has become larger and larger. Such ECU-level simulations for software tend to be long. Simulators with faster execution speed are required to resolve this problem.
In order to realize high-speed simulation, Renesas is developing the Renesas QEMU Environment for the R-Car S4 automotive system-on-chip (SoC) simulator (hereinafter referred to as this simulator) which combines QEMU models with SystemC models. Only one CPU architecture enables simulation on the QEMU framework. However, R-Car S4 has three types of CPU architectures (CA55, CR52, and G4MH). This simulator needs to simulate them at the same time.
Therefore, the CPU cores are realized using a wrapper of SystemC, shown in Figure 1. In this way, QEMU models can be simulated in parallel as instances of SystemC. This wrapper converts the interfaces prepared by QEMU (C language) into the SystemC interface and controls the timing between QEMU and SystemC. As a result, this simulator realizes simultaneous simulation of the three types of CPUs.
Among these features, DBT is an important technology for accelerating the simulation speed of QEMU’s CPU models. DBT divides an execution code on QEMU’s CPU (guest code) into short code blocks. Then, DBT translates the guest code blocks into execution codes (host code) and caches them on a CPU that executes this simulator (host CPU). The execution procedure of DBT is shown below.
When executing the guest code, a series of instructions converted to TBs is executed on the host CPU by repeating (1) to (3). Only if the BB is executed for the first time, the conversion processes of (1) and (2) are executed. If an instruction at an executed address has already been executed, steps (1) and (2) can be omitted because the next TB is stored in the translation cache.
In addition, by connecting the TBs using direct block chaining, shown in (3), the process of determining the TB conversion request of the guest code and the process of referencing the next TB from the translation cache are omitted.
In this way, DBT helps the CPUs to execute guest code faster by omitting conversion processes and using direct block chaining.
The results of the CPU benchmark for the actual R-Car S4 and Renesas QEMU Environment are shown in Figure 3. From the result of Figure 3, it is shown that the processing performance (number of instructions executed) of this simulator is approximately 0.8 instructions per one instruction of the actual machine.
Learn more about the R-Car S4 automotive system-on-chip (SoC).
We introduced the overview of the simulator for R-Car S4, the Renesas QEMU Environment, and CPU performance. The simulator is based on QEMU 7.0 and will support QEMU 8.0. In addition, we will improve this simulation speed including the CPU peripheral functions. Renesas will prepare next-generation platforms so that they can be used for our customers’ software development and system studies.