Belangrijkste concepten
This study introduces zkUCB, an innovative algorithm that integrates the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) with the Upper Confidence Bound (UCB) algorithm to enable privacy-preserving and verifiable decision-making in Multi-Armed Bandit (MAB) problems.
Samenvatting
This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. The researchers introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB.
The key highlights of the study are:
-
Overcoming Challenges in Integrating zkUCB:
- Addressing the inherent randomness of the UCB algorithm by incorporating a pseudorandom number generator.
- Converting the non-polynomial operations in UCB, such as logarithms and non-integer powers, into polynomial forms using piecewise linear approximation and Newton's method.
- Handling the gap between the floating-point numbers in UCB and the finite fields required by zk-SNARKs through a quantization process.
-
zkUCB Workflow:
- The setup phase generates a common reference string (crs) and a simulation trapdoor (td).
- In the proof generation stage, the prover computes a proof π using the secret input (witness) w, the statement ϕ, and the crs.
- The verification phase allows the verifier to assess the proof π against the crs to confirm the truthfulness of the statement ϕ.
-
Experimental Evaluation and Analysis:
- Reward Comparison: zkUCB with appropriate quantization bits outperforms the standard UCB algorithm in terms of average reward.
- Time Efficiency: The setup, compilation, witness calculation, and proof generation times are minimally affected by the quantization bits, while the verification time scales linearly with the number of algorithm steps.
- Components Size: The sizes of the zkUCB components, including the witness, proving key, verifying key, and proof, are primarily influenced by the number of algorithm steps rather than the quantization bits.
The study demonstrates that zkUCB maintains a manageable scale even when dealing with large datasets, highlighting its potential for broad application and scalability in privacy-sensitive domains.
Statistieken
The average reward generated during each round by UCB and zkUCB over 100 iterations.
The average time during various phases of zkUCB, including setup, compile, compute witness, and generate proof.
The average time for verifying the proof in zkUCB.
The average size of various components of zkUCB, including witness, proving key, verifying key, and proof.
Citaten
"zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making."
"zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB, showcasing zkUCB's adept balance between data security and operational efficiency."