CRAWDAD Wiki | Sanitization / IPAddresses
IPAddresses

At Dartmouth College, we anonymize IP addresses using a prefix-preserving anonymization algorithm (see Xu, J., Fan, J. Ammar, M., and Moon, S. On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization, Proc. of 10th IEEE International Conference on Network Protocols (ICNP 2002), Paris, France, November 2002). This algorithm has the advantage that we can compare the prefixes of the anonymized IP addresses, i.e., if two IP addresses share the same k-bit prefix, the anonymized addresses will also share the same k-bit prefix. This information is useful for research purposes, such as determining which hosts are in the same subnet.

One of the most well-known anonymization tool is tcpdpriv (see Greg Minshall http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html). Tcpdpriv anonymizes only IP and TCP/UDP headers but has limited functionality. Concerning IP addresses, it offers five levels of anonymization. At level 0, IP addresses are mapped to integers. This policy is followed by NLANR for distribution of public traces (http://www.nlanr.net/) Level 1 maps the upper and lower 16 bits, separately, to integers. Level 2 maps each byte of the address separately; each byte map is independent. With level 50, if two of the original addresses were equal in the most significant N bits, then these two addresses will map to private addresses that are similarly equal in the most significant N bits. Level 99 leaves addresses unchanged. These levels of anonymization has two main drawbacks. First, we have coarse-grained subnet information preservation. Second, mapping is inconsistent between various traces. To overcome this problem, Peuhkuri in his paper "A method to compress and anonymize packet traces" (Internet Measurement Workshop, San Francisco, California, USA: 2001, pages 257–261, 2001) deals with persistent anonymization of IP address among different packet traces. The proposed algorithm makes use of cryptography and the mapped address is produced by merging a part of the original address with a value encrypted with a key provided by the user.

AnonTool (http://crawdad.cs.dartmouth.edu/meta.php?name=tools/sanitize/generic/AnonTool) is a generic tool that offers flexible anonymization functionality. Both prefix-preserving anonymization algorithm and mappings used in tcpdpriv (in fact, AnonTool includes optimized versions of mapping functions that make it faster than tcpdpriv) are implemented in the tool. Apart from that, a number of functions are also provided. These functions include: replace addresses with a constant value, encrypt them using AES or triple DES as well as randomize them. The added value of AnonTool is that it provides a simple API to write your own anonymization application that will define the anonymization on each field separately according to user's needs.

dot line
Edit - History - Recent Changes - Search
Page last modified on September 25, 2006, at 08:25 AM EST