README.md 2.98 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Implementation of an RDP privacy accountant and smooth sensitivity analysis for
the PATE framework. The underlying theory and supporting experiments appear in
"Scalable Private Learning with PATE" by Nicolas Papernot, Shuang Song, Ilya
Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson (ICLR 2018,
https://arxiv.org/abs/1802.08908).

## Overview

The PATE ('Private Aggregation of Teacher Ensembles') framework was introduced 
by Papernot et al. in "Semi-supervised Knowledge Transfer for Deep Learning from
Private Training Data" (ICLR 2017, https://arxiv.org/abs/1610.05755). The 
framework enables model-agnostic training that provably provides [differential
privacy](https://en.wikipedia.org/wiki/Differential_privacy) of the training 
dataset. 

The framework consists of _teachers_, the _student_ model, and the _aggregator_. The 
teachers are models trained on disjoint subsets of the training datasets. The student
Ilya Mironov's avatar
Ilya Mironov committed
18
model has access to an insensitive (e.g., public) unlabelled dataset, which is labelled by 
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
interacting with the ensemble of teachers via the _aggregator_. The aggregator tallies 
outputs of the teacher models, and either forwards a (noisy) aggregate to the student, or
refuses to answer.

Differential privacy is enforced by the aggregator. The privacy guarantees can be _data-independent_,
which means that they are solely the function of the aggregator's parameters. Alternatively, privacy 
analysis can be _data-dependent_, which allows for finer reasoning where, under certain conditions on
the input distribution, the final privacy guarantees can be improved relative to the data-independent
analysis. Data-dependent privacy guarantees may, by themselves, be a function of sensitive data and 
therefore publishing these guarantees requires its own sanitization procedure. In our case 
sanitization of data-dependent privacy guarantees proceeds via _smooth sensitivity_ analysis.

The common machinery used for all privacy analyses in this repository is the 
Rényi differential privacy, or RDP (see https://arxiv.org/abs/1702.07476). 

This repository contains implementations of privacy accountants and smooth 
sensitivity analysis for several data-independent and data-dependent mechanism that together
comprise the PATE framework.


### Requirements

* Python, version ≥ 2.7
* absl (see [here](https://github.com/abseil/abseil-py), or just type `pip install absl-py`)
* numpy
* scipy
* sympy (for smooth sensitivity analysis)
* unittest (for testing)


### Self-testing

To verify the installation run
```bash
$ python core_test.py
$ python smooth_sensitivity_test.py
```


## Files in this directory

Ilya Mironov's avatar
Ilya Mironov committed
60
*   core.py — RDP privacy accountant for several vote aggregators (GNMax,
61
62
    Threshold, Laplace).

Ilya Mironov's avatar
Ilya Mironov committed
63
*   smooth_sensitivity.py — Smooth sensitivity analysis for GNMax and
64
65
    Threshold mechanisms.

Ilya Mironov's avatar
Ilya Mironov committed
66
*   core_test.py and smooth_sensitivity_test.py — Unit tests for the
67
68
69
70
71
    files above.

## Contact information

You may direct your comments to mironov@google.com and PR to @ilyamironov.