Members and Collaborators
Networking and cybersecurity research critically need publicly available, fresh and diverse user data, for data mining and for validation.
There are very few publicly available sources of user data, because of privacy concerns. Institutions, which collect user data, are concerned that releasing such data for research would jeopardize user privacy. Anonymization and differential privacy have emerged as two approaches to address privacy-safe data sharing. Yet, they each have drawbacks, which prevent their wide adoption. Anonymization can be successfully attacked by leveraging auxilliary data, and differential privacy may lose too much research utility by adding large noise to heavy-tailed data.
Our current research lies in two directions. First, we investigate new ways to share data safely, by introducing Commoner Privacy. Second, we have developed a framework called Critter@home, which empowers volunteer users to share their network traffic data with researchers in an anonymous, privacy-safe, aggregated manner.
Commoner Privacy is a data-processing approach, which
fuzzes (omits, aggregates or adds noise to) outputs of queries
ran over private data. It fuzzes only those output points where an individual’s contribution
to this point is an outlier. By hiding outliers, our mechanism
hides the presence or absence of an individual in a dataset.
We propose one mechanism that achieves commoner privacy—
interactive 𝑘-anonymity. We also show that commoner privacy holds for query composition
either via presampling or via query introspection.
Critter@home is a continuously updated archive of content-rich network data,
contributed by volunteer users. Data contributors join the Critter overlay
whenever online, offering their data to interested researchers.
Privacy of data contributors is protected in multiple ways:
- Contributors have the option of hosting their own data locally, thus
retaining full control over it.
Before data is stored, it is modified via a PPI-sanitization process to
replace all personal and private information (PPI).
- Data is always stored and transmitted in an encrypted format.
No human apart from the contributor will ever access the raw, PPI-sanitized,
data. Instead, researchers access data via a query system which only returns
- All contact with a contributor is at her discretion and is done via an
anonymizing network where contributor identities are hidden both from
researchers and the Internet at large.
Contributors (if they so desire) can have full, fine-grained control over
their data at all times via policy settings.
Our work relies on the secure query framework, which uses Commoner Privacy.
This framework allows only for queries about aggregate features of the data, such as
counts, distributions, etc. and preserves user privacy by applying
k-anonymity and l-diversity principles.
Software and Datasets
- Commoner Privacy And A Study On Network Traces, Xiyue Deng and Jelena Mirkovic, In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2017PDFBIB
- Critter: Content-Rich Traffic Trace Repository, V. Sharma, G. Bartlett and J. Mirkovic, In Proceedings of Workshop on Information Sharing and Collaborative Security (WISCS), 2014PDFBIB
This material is based upon work supported by the
National Science Foundation under grant #1224035 and #0914780. Any opinions,
findings, and conclusions or recommendations expressed in this material are
those of the authors and do not necessarily reflect the views of the
National Science Foundation.