Our last article about The State of Zero-Knowledge Machine Learning described how pioneers like ezkl and Modulus Labs created techniques for implementing a private and secure method of running algorithms. Here we describe our work with ezkl and how we're implementing their library.
A Vision for Trustless ML with On-Chain Rewards
Machine learning is expensive and time consuming, but worth it—machine learning informs all kinds of institutional decision-making, from creating credit scores to hunting for heart murmurs. Competitive machine learning offers companies and institutions who don’t have the time or expertise to hire a data science team a way to solve data science problems by posting them as challenges and offering a bounty to whomever solves it. By protecting models during a competition using zero knowledge machine learning (zkML) and Web3 innovations like streaming payments, Spectral changes the competitive machine learning dynamic from an optimization contest to a perpetual source of revenue for modelers and businesses. Consider on-chain oracles such as Chainlink’s price feeds. They supply raw data feeds to smart contracts. An ML oracle like Spectral supplies a constant stream of validated ML inferences.
For more information, please read our Vision for Trustless ML with On-Chain Rewards.
To help us implement zkML, we’ve been working closely with zero-knowledge pioneers ezkl.
How We’re Implementing zkML
Zero knowledge proofs (ZKPs) are a way of mathematically verifying a piece of information without revealing the contents of it. Essentially they work by correctly predicting the outcome of an equation beyond a certain statistical threshold. Imagine you want to prove you know a secret number without revealing it. You write down the number and place it in a sealed envelope. You then write down another random number and place it in a second sealed envelope. You hand both envelopes to someone and ask them to choose one. If they choose the envelope with the secret number, you reveal the random number from the other envelope. If they choose the envelope with the random number, you reveal the secret. In either case, you prove you know the secret number without revealing it directly.
In web3, ZKPs are often used for compiling multiple transactions up into a single blockchain transaction (also known as a zk-rollup). (Almost like a wax seal, a traditional stamp of authenticity).
Zero-knowledge Machine Learning (zkML) takes that concept and applies it to an entire machine learning (ML) model. At Spectral, zkML gives us the ability to verifiably prove that a given prediction came from a specific machine learning model. In essence, we want to make sure that a prediction came from the model that a modeler claims it did. There are two main players in the zkML game:
- There’s the prover, who testifies that a given prediction is the output of a certain ML model
- Then there’s the verifier, who verifies the correctness of the above proof
We’ve partnered with ezkl to allow the generation and verification of verifiable zero-knowledge proofs (ZKPs). ezkl allows modelers to create zero-knowledge proofs of ML model predictions imported using the Open Neural Network Exchange (ONNX).
At a high level, the end-to-end ezkl workflow we use comprises of:
1. Setting up ezkl
- Model Training: Begin by training an ML model. While PyTorch enjoys native support and extensive testing, recent introductions include support for scikit-learn decision trees, random forests, and XGBoost. We're testing these additions.
- Model Export: Transition the trained PyTorch model into the ONNX format utilizing torch.onnx.export() or through other libraries that support conversion of trained non-PyTorch models to the ONNX format (e.g. Hummingbird).
- Settings Calibration: Generate and fine-tune a JSON settings file via ezkl.generate_settings() and ezkl.calibrate_settings(). These refined settings pave the way for crafting the quantized Halo2 circuit representative of the underlying ML model.
- Model Compilation: Invoke ezkl.compile_circuit() to compile the ONNX model into a format usable by ezkl.
- Structured Reference String (SRS) Retrieval: An essential component for zkML that allows for a trustless setup, ensure the verifier has access to this SRS file.
- ezkl Setup: Employ ezkl.setup() to formulate the proving and verifying keys, ensuring the verifying key is accessible to the verifier.
2. Proof Generation
- Witness File Creation: Harness ezkl.gen_witness() for this task.This takes the model inputs, passes it through the zk-circuit and produces the zk-circuit’s output.
- Mock Run: Execute a mock run via ezkl.mock() to ensure that ezkl has been set up correctly..
- Proof Creation: For any given model output, produce a proof using ezkl.prove(). Ensure the verifier receives this proof file.
3. Proof Verification
- Dive into the validation process by utilizing ezkl.verify().
A real use case for zkML
On October 17, 2023 Modulus Labs released the eighth chapter of the “How to put your AI on-chain” series. Working with zero knowledge machine learning (zkML) is tough—it adds compute time to each transaction, which means you need a really good reason to use it.
Here’s ours: Spectral’s new machine learning oracle uses zkML to keep machine learning competitions honest. Competitive machine learning platforms pit data scientists against one another, but, because there was no way to conceal a model and prove inferences were generated from that model, traditional competitions were limited in scope. zkML allows Spectral to keep models verifiable and private, which in turn allows us to unlock a powerful new form of competitive machine learning.