These python scripts implement 5 differentially private data
sanitizer called SPA [2], EFPA [1], P-HPartition [1], EFPAG, and DCTG.
EFPAG and DCTG are the implementations
of the EFPA and a Discrete Cosine Transformation based
perturbation technique. Both add Gaussian instead of Laplacian
noise to the coefficients in order to have larger accuracy. 
See the source code for details.
For more information, please refer to [1] which contains the 
full description of the implemented schemes.

REQUIREMENTS:
=============

python (2.7.1), numpy (2.0.0.dev_7297785_20111104) 

The given versions are used in this current implementation, different (older)
versions are not tested and thus may not work properly.

INSTALL: 
========

Usage: ./setup.sh 

It compiles an in-place python module (lib/cutils_pymod.c) that provides a
python interface for a mini C-library (lib/cutils.c). The result is
lib/cutils.so that is invoked by lib/Clustering.py

DESCRIPTION: 
============

Main.py:
-------------
This illustrates the usage of the 2 schemes.
EFPA (lib/EFPA.py), SPA (lib/SPA.py) [2], 
P-HPartition (lib/Clustering.py) [1].

Usage: python Main.py

>>> INPUT:

All parameters can be edited inside the script files

- Histograms (one histogram in a single file, where one line corresponds to one
  bin). They reside in ./datasets (Waitakere)
- Epsilon (default: 0.01)
- Maximum number of clusters generated by Clustering.py (default: 400).

<<< OUTPUT (./output):

- perturbed histograms

Datasets:
=========

* Waitakere dataset:

Synthetic dataset generated from the census data of
Waitakere (ca. 39km x 29km) that is an administrative region of New-Zealand (on
the west side of Auckland).  The necessary census and topographical data is
freely available at

http://www.stats.govt.nz/Census/2006CensusHomePageMeshblockDataset.aspx 

Note that this was released in 2006, and hence the meshblock boundaries are 
described using NZMG coordinates in the shape file. However, they can be 
converted to WGS84 (traditional lat/lon coordinates), using ogr2ogr from 
the GDAL framework: 

ogr2ogr <output>.shp <inputfile>.shp -t_srs epsg:4326 

where the input file contains NZMG coordinates.  The random population
(./datasets/waitakere/raw/Waitakere_random_population.txt) is generated by a
perl script (./datasets/waitakere/raw/generate_population.pl): in each mesh
block, it generates random points (their real number is published by the Census
Bureau: ./datasets/waitakere/raw/2006 mb dataset Auckland Region part 1.csv).
Each line in Waitakere_random_population.txt contains the random coordinates of
an individual point in NZMG format. 

File format: a vector that is obtained by converting the grid matrix to
a row vector by concateanting consecutive rows


[1] G. Acs, C. Castelluccia, R.Chen. Differentially Private Histogram Release
through Lossy Compression. IEEE International Conference on Data Mining (ICDM), 2012. 


[2] Vibhor Rastogi and Suman Nath.  Differentially Private Aggregation of Distributed 
Time-Series with Transformation and Encryption. In SIGMOD, 2010.

@author: Gergely Acs <acs@crysys.hu> 

	



 

 
