The authors developed a cycle-accurate performance model of the CVA6 RISC-V processor in Python to enable efficient architectural exploration and implementation of performance-enhancing features. The model achieved 99.2% accuracy on the CoreMark benchmark compared to the RTL implementation.
Using the performance model, the authors designed and implemented a superscalar version of CVA6 with the following key steps:
The model was instrumental in identifying and fixing performance bugs during the implementation phase. For example, the authors discovered an issue with the scoreboard management that was degrading performance on embedded systems. They addressed this by enhancing the scoreboard logic to better handle the limited resources.
The final superscalar CVA6 implementation achieved a 40% performance improvement on the CoreMark benchmark compared to the single-issue reference design, with a 11% increase in area. The authors also observed a 24% performance gain on the Dhrystone benchmark, validating the effectiveness of their model-driven approach.
The authors plan to further enhance the performance model by incorporating support for divisions, data caching, and instruction caching. They also intend to explore the impact of register renaming on the superscalar CVA6 design, as it could significantly improve performance on benchmarks with more Write-After-Write (WAW) hazards.
To Another Language
from source content
arxiv.org
Deeper Inquiries