CRAWDAD Wiki | Main / Sanitization
Main /
Sanitization

Suppose that the data to be collected contain the MAC address (a 48-bit globally unique identifier for that wireless device) and the source and destination IP addresses. The MAC address may represent a risk to confidentiality. The IP addresses of any hosts visited by a user may also represent a risk to confidentiality. The main goal of sanitization is to reduce these risks.

For dartmouth/campus data set collected at Dartmouth College, the sanitization procedure consists of a program that will parse the collected data, anonymize the data, and remove any data which are not interesting for the wireless research.

The sanitization procedure consists of the procedures common to all of their data sets, and data-specific sanitization procedures.

The common sanitization procedure concentrates on mapping identifiers to corresponding sanitized identifiers. We try to maintain the same mapping across all of our data sets. We consider three identifiers: MAC addresses, IP addresses, and Access Point hostnames, as follows:

  1. how to sanitize MAC addresses
  2. how to sanitize IP addresses
  3. how to sanitize access point hostnames

The data-specific sanitization procedures are applied to the following traces which were collected using different collection mechanisms:

  1. syslog: how to sanitize syslog traces
  2. SNMP: how to sanitize SNMP traces
  3. tcpdump: how to sanitize tcpdump traces

[Feel free to add your sanitization experiences or other sanitization techniques!]

dot line
Edit - History - Recent Changes - Search
Page last modified on August 01, 2006, at 10:55 AM EST