Tesi etd-04132021-181901 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
ALBICOCCHI, FRANCESCO
URN
etd-04132021-181901
Titolo
A RISC-V ISA Extension for speeding-up Post Quantum CRYSTALS Algorithms through HW Accelerators integrated in the Ariane Core Pipeline
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA ELETTRONICA
Relatori
relatore Prof. Fanucci, Luca
relatore Ing. Nannipieri, Pietro
relatore Ing. Di Matteo, Stefano
relatore Ing. Nannipieri, Pietro
relatore Ing. Di Matteo, Stefano
Parole chiave
- cryptography
- CRYSTALS
- Dilithium
- Kyber
- post-quantum
- risc-v
Data inizio appello
30/04/2021
Consultabilità
Non consultabile
Data di rilascio
30/04/2091
Riassunto
In the last years a digitalization process is going on in a lot of different areas like industry 4.0, automotive and healthcare. This fact leads to more and more complex Systems-on-Chips because they require a continuous Internet connection to the cloud that has to be supported in an efficient way, especially in mobile systems. A secure communication is an essential requirement for all these domains of application since it is performed over insecure channels like the public 5G infrastructure. At the moment, the security of the connections relies on the Public Key Cryptography which uses a pair of keys: public and private. These algorithms are based on hard mathematical problems that are considered infeasible to solve, e.g. it would take far too many time to actually compute the solution. The security of the standard cryptographic algorithms is compromised by the advent of quantum computers which are able to brake the systems in polynomial time using the algorithm conceived by Shor. For this reason, in 2017 the National Institute of Standards and Technology started a standardization process which is arrived at its third round on July 22 2020, in order to find one or more quantum-resistant public-key cryptographic algorithm. In 2019 Google announced to have reached the "Quantum supremacy" building a quantum based processor. In the next few years, quantum computing is expected to become more and more feasible and powerful, so cryptographic security is becoming a concrete problem. Among all the possible post-quantum algorithms that are being developed nowadays, Lattice Based cryptography offers a very good trade-off between security and efficiency, in fact it is one of the favourite candidate to win the standardization competition. In this work, we selected the CRYSTALS Lattice Based Post-Quantum algo-rithms called Kyber and Dilithium. They are a Key Encapsulation Mechanism and a Digital Signature Scheme respectively. Post-Quantum Cryptography uses mathematical elements and operations which are usually not easy to implement on standard processors. This is a critical aspect especially in low power embedded devices which have a limited amount of resources. For these reasons there is an increasing interest regarding the hardware acceleration of Post-Quantum Cryptography. Typically, hardware accelerators are implemented outside the core as memory mapped devices. This method requires several resources for the interconnection and also suffers from a high overhead due the communication between the core and the accelerator. Embedding lighter and more specific hardware accelerators
directly in the core architecture reduces the area increment; moreover the overhead due to the communication is completely removed. However, in order to use the hardware inside the core pipeline, some new assembly instructions have to be designed to drive the accelerators. RISC-V is perfect for this scope, since it is a free and open instruction set architecture which provides a set of unutilized opcodes specifically created to promote more specialized instruction set extensions. In particular, we used the CVA6 core (also known as Ariane) which is a 6-stage, single issue, 64-bit, in-order CPU. After identifying the computational bottlenecks of the algorithms, we found three software functions that are suitable to be executed as assembly instructions. Then we designed very small hardware accelerators that perform these tasks and integrated them directly in a new functional unit in the execution stage of the Ariane pipeline. In the end, we extended the RISC-V ISA with three new instructions which let the programmer use the hw accelerators directly from the software. A System-on-Chip has been implemented on the Xilinx ZCU106 evaluation board in order to perform a set of tests to measure the speed up obtained. Using the hardware accelerators developed in this work, we reached a maximum clock cycles reduction of 33.51% for Kyber and 35.44% for Dilithium. The implemented functional unit is not in the critical path of the design, so the maximum reachable frequency has not been reduced. The speed up obtained in terms of cycle count brings also to an energy reduction of 10.9 mJ for Kyber and 35.3 mJ for Dilithium in the best cases. This is due to the fact that the average power of the system remains almost the same, while the execution time reduces. The cost of all these optimizations is only a small increment of the area occupation. Compared to the core without our functional unit, the additional resources are 12 DSPs and a LUTs and FFs increment of the 4.08% and 0.50% respectively.
directly in the core architecture reduces the area increment; moreover the overhead due to the communication is completely removed. However, in order to use the hardware inside the core pipeline, some new assembly instructions have to be designed to drive the accelerators. RISC-V is perfect for this scope, since it is a free and open instruction set architecture which provides a set of unutilized opcodes specifically created to promote more specialized instruction set extensions. In particular, we used the CVA6 core (also known as Ariane) which is a 6-stage, single issue, 64-bit, in-order CPU. After identifying the computational bottlenecks of the algorithms, we found three software functions that are suitable to be executed as assembly instructions. Then we designed very small hardware accelerators that perform these tasks and integrated them directly in a new functional unit in the execution stage of the Ariane pipeline. In the end, we extended the RISC-V ISA with three new instructions which let the programmer use the hw accelerators directly from the software. A System-on-Chip has been implemented on the Xilinx ZCU106 evaluation board in order to perform a set of tests to measure the speed up obtained. Using the hardware accelerators developed in this work, we reached a maximum clock cycles reduction of 33.51% for Kyber and 35.44% for Dilithium. The implemented functional unit is not in the critical path of the design, so the maximum reachable frequency has not been reduced. The speed up obtained in terms of cycle count brings also to an energy reduction of 10.9 mJ for Kyber and 35.3 mJ for Dilithium in the best cases. This is due to the fact that the average power of the system remains almost the same, while the execution time reduces. The cost of all these optimizations is only a small increment of the area occupation. Compared to the core without our functional unit, the additional resources are 12 DSPs and a LUTs and FFs increment of the 4.08% and 0.50% respectively.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |