CRAWDAD metadata: ctu/personal (v. 2012-03-15)
This dataset contains 142 days of mobile phone records (aka Call Data Records)
and ground-truth movement description of Czech Ph.D. student Michal Ficek,
stored by his own mobile terminal in 2010-2011.
[xml metadata]
Note: This metadata was prepared by the CRAWDAD team and verified by the data set (or tool) authors. We have made every effort to ensure its accuracy, but urge all users to consider the metadata and data carefully and be sure that their use in research is consistent with the nature and limitations of the data. We welcome any corrections. This metadata was prepared based on the following reference(s):
CRAWDAD metadata structure[what is CRAWDAD metadata]
- [Data]
- [Dataset]
ctu/personal (v. 2012-03-15) [what's new]
- [Traceset] ctu/personal/mobile (v. 2012-03-15) [what's new] [download 24KB gz from: US UK AU] [download 320KB gz from: US UK AU]
- [Dataset]
ctu/personal (v. 2012-03-15) [what's new]
- [Tools]
- [Authors]
- [Author] Michal Ficek
- [Papers]
You can see more papers that use this dataset or tool at citeulike's 'crawdad' group with tag ctu_personal . Please add more papers. Also please cite this data set using the following bibtex (or cite one of the papers below).
@MISC{ctu-personal-2012-03-15, author = {Michal Ficek}, title = {{CRAWDAD} data set ctu/personal (v. 2012-03-15)}, howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal}, month = mar, year = 2012 }- [Paper] ficek-intercall
- [Paper] ficek-spatial
[Dataset] ctu/personal (v. 2012-03-15) | top |
| version | v. 2012-03-15 |
| changes | the initial version |
| bibtex |
@MISC{ctu-personal-2012-03-15,
author = {Michal Ficek},
title = {{CRAWDAD} data set ctu/personal (v. 2012-03-15)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal},
month = mar,
year = 2012
}
|
| metadata last modified | 2012-03-15 |
| summary | This dataset contains 142 days of mobile phone records (aka Call Data Records) and ground-truth movement description of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011. |
| release date | 2012-03-15 |
| measurement start | 2010-08-16 |
| measurement end | 2011-02-06 |
| authors | Michal Ficek |
| web site | http://www.crawdad.org/ctu/personal |
| wiki | go to the wiki page for this data set |
| keyword | cellular network, location |
| measurement purposes | User Mobility Characterization Usage Characterization Positioning Systems Social Network Analysis Human Behavior Modeling Localization |
| network type | GSM (Global System for Mobile Communications) |
| network type | cellular network |
| environment | This dataset contains 142 days of mobile phone records (as known as Call Data Records) and cell transitions (a ground-truth movement description) of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011. The dataset covers more than 99.99% of 142 days of mobile phone usage in mobile networks of 8 different providers in 5 countries: Czech Republic, Slovak Republic, Germany, Austria and the USA. |
| network | The phone was serviced mostly by Vodafone Czech Republic, the home network of the user, in the Czech Republic. Other network providers in countries abroad are as follows: Orange (Slovakia), A1 Telekom (Austria), T-Mobile Deutschland, Vodafone D2, O2 (Germany), and T-Mobile and AT&T (USA) |
| collection | The source of the data is user's own mobile phone Nokia E52. The publicly available LogExport application was used to record time and type of communication events (voice, SMS, data). For cell-transition recording, the free CellTrack91 application was utilized. The coordinates of positions within the cells were obtained by translating the Cell-IDs to their geographical coordinates by querying the Google Location API, as described in our MASS paper. |
| sanitization | The Cell Global Identity of a cell the mobile phone is attached to is only partially anonymized. Fields with original values are the Mobile Country Code (MCC) and the Mobile Network Code (MNC), to distinguish in which country a mobile phone was present, and which provider serviced it. The Location Area Code (LAC) and the Cell-ID are anonymized, in other words, renumbered according to the time of their first occurence in the dataset. Such approach does not limit the data usage but helps the mobile providers not to feel threatened by exposing the Cell-IDs together with the approximate geographical coordinates of the cell. This geographical information, the longitude/latitude coordinates of a cell, is not anonymized and thus represents a way to reconstruct a ground-truth movement trajectory of the mobile phone. |
| limitation | The spatial accuracy of the data is typical for a cellular network. It depends on a cell size and thus varies from tens to hundred of meters in urban areas to several kilometers in rural areas. |
| hole | There are only three gaps in the data when the cell-tracking application was turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from 05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off during the measurement period, except when on-board of a plane and airborne. |
| error | The positions within the cells were obtained by querying the Google Location API. In our MASS paper, we showed, by comparing with data obtained from a large and cooperating mobile network provider, that the accuracy of such approach is nearing the cellular network operator's own approximation of position inside a cell. |
| tracesets included | ctu/personal/mobile (v. 2012-03-15) |
[Traceset] ctu/personal/mobile (v. 2012-03-15) | top |
| version | v. 2012-03-15 |
| changes | the initial version. |
| bibtex |
@MISC{ctu-personal-mobile-2012-03-15,
author = {Michal Ficek},
title = {{CRAWDAD} trace set ctu/personal/mobile (v. 2012-03-15)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal/mobile},
month = mar,
year = 2012
}
|
| metadata last modified | 2012-03-15 |
| summary | This traceset contains 142 days of mobile phone records (aka Call Data Records) and ground-truth movement description of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011. |
| release date | 2012-03-15 |
| measurement start | 2010-08-16 |
| measurement end | 2011-02-06 |
| measurement purposes | User Mobility Characterization Usage Characterization Positioning Systems Social Network Analysis Human Behavior Modeling Localization |
| methodology | On a mobile phone Nokia E52 (firmware version 054.003) we run a publicly available application LogExport 1.1 UTC (http://tinyhack.com/freewarelist/s603rd/2007/03/02/logexport/) to record both time and type of communication events. For cell transitions recording the free CellTrack91 1.0.9 (http://www.afischer-online.de/sos/celltrack/) application was used. Every week during the measurement period the data from both applications were stored, and the cell coordinates were obtained from Google Location API. The mobile phone was allways carried by the dataset author. |
| sanitization | The Cell Global Identity of a cell the mobile phone is attached to is only partially anonymized. The Location Area Code (LAC) and the Cell-ID are anonymized, in other words, renumbered according to the time of their first occurence in the dataset. The Mobile Country Code (MCC) and Mobile Network Code (MNC) remain intact, are not anonymized. |
| limitation | The spatial accuracy of the data is typical for a cellular network. It depends on a cell size and thus varies from tens to hundred of meters in urban areas to several kilometers in rural areas. |
| hole | There are only three gaps in the data when the cell-tracking application was turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from 05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off during the measurement period, except when on-board of a plane and airborne. |
| error | The positions within the cells were obtained by querying the Google Location API. In our MASS paper, we showed, by comparing with data obtained from a large and cooperating mobile network provider, that the accuracy of such approach is nearing the cellular network operator's own approximation of position inside a cell. |
| download url | Download (24KB gz) (MD5 Hash: be33b354956287a768fb5446594d5900) from US UK AU |
| download url | Download (320KB gz) (MD5 Hash: 6ce11990c64d107c7ef55c1c94eb223c) from US UK AU |
| parent data | ctu/personal (v. 2012-03-15) |
| traces included | ctu/personal/mobile/2010 (v. 2012-03-15) |
[Trace] ctu/personal/mobile/2010 (v. 2012-03-15) | top |
| version | v. 2012-03-15 |
| changes | the initial version |
| bibtex |
@MISC{ctu-personal-mobile-2010-2012-03-15,
author = {Michal Ficek},
title = {{CRAWDAD} trace ctu/personal/mobile/2010 (v. 2012-03-15)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal/mobile/2010},
month = mar,
year = 2012
}
|
| metadata last modified | 2012-03-15 |
| summary | This trace covers 142 days of mobile phone usage by Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011 |
| derived | false |
| release date | 2012-03-15 |
| measurement start | 2010-08-16 |
| measurement end | 2011-02-06 |
| configuration | We used the application LogExport 1.1 running on a mobile phone Nokia E52 (fw 054.003). |
| format | The communications' trace, ficek_personal_communication.csv, consists of
timestamped records for every voice, text message and data communication,
either outgoing, or incoming.
The movement trace, ficek_personal_movement.csv, contains a timestamped list
with full Cell Global Identity of a cell the phone was attached to (Mobile
Country Code, Mobile Network Code, Location Area Code, and Cell-ID), and the
approximate geographical coordinates of the corresponding cell tower (longitude,
latitude) in non-anonymized form.
Each file has 1 header row.
ficek_personal_communication.csv contains the following fields.
Fields 1-5: "YYYYMMDD","hhmmss (UTC+0)","Type","Direction","Duration".
- The time field "hhmmss" represents the GMT time.
- Type of communication is either "Voice", "SMS" or "Data".
- Communication direction in the "Direction" field is either "Outgoing" (call
made, SMS sent, Data session started), "Incoming" (call or SMS received), or
"Missed call".
- "Duration" field stores the duration in seconds of a call or a data session.
ficek_personal_movement.csv contains fields "YYYYMMDD","hhmmss (UTC+0)","MCC",
"MNC","LAC","CID","Latitude","Longitude","Timezone".
- The time field "hhmmss" represents the GMT time.
The other fields are self-explanatory. ("MCC" stands for the Mobile Country
Code, "MNC" for the Mobile Network Code, "LAC" for the Location Area Code,
"CID" for the Cell-ID.)
To get the local time, the "Timezone" field must be added to the UTC time. The
timezone field already contains the daylight saving time (DST) adjustment. If
MCC=0 and MNC=0, the mobile phone is at a place without signal coverage. If
Latitude and Longitude fields equal zero, the coordinates for the corresponding
cell are unknown. |
| sanitization | The cell numbers of parties communicating with the mobile phone are not present. The Cell Global Identity of a cell the mobile phone is attached to is partially anonymized. The Mobile Country Code (MCC) and Mobile Network Code (MNC) remain intact, are not anonymized. |
| limitation | The spatial accuracy of the data is typical for a cellular network. It depends
on a cell size and thus varies from tens to hundred of meters in urban areas to
several kilometers in rural areas.
We are aware of two situations where the geographical coordinates of cells in
the data do not correspond to their actual coordinates.
1) Due to the nature of cell-retrieving method, the coordinates of about 13
cells (out of approx. 3700 cells) were not found by the Google Location API
and thus are missing in the trace. Such records have the MCC, MNC, LAC and CID
fields filled, but their Longitude and Latitude fields are set to zero.
2) For a specific reason, all cells that cover different subway stations in Prague,
the capital of the Czech Republic, share the same geographical coordinates
(50.074297, 14.428297). However, they are in fact distributed all around the Prague. |
| hole | There are only three gaps in the data when the cell-tracking application was turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from 05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off during the measurement period, except when on-board of a plane and airborne. |
| error | The positions within the cells were obtained by querying the Google Location API. In our MASS paper, we showed, by comparing with data obtained from a large and cooperating mobile network provider, that the accuracy of such approach is nearing the cellular network operator's own approximation of position inside a cell. |
| parent data | ctu/personal/mobile (v. 2012-03-15) |
[Author] Michal Ficek | top |
| michal.ficek@fel.cvut.cz | |
| institution | Czech Technical University in Prague |
| department | Electrical Engineering |
| position | Ph.D. Student |
| address | Technicka 2, 166 27, Prague, Czech Republic |
| phone | 00420-606-842-803 |
| web site | http://www.rdc.cz/en/people/ficek |
| related data/tools | ctu/personal (v. 2012-03-15) |
[Paper] ficek-intercall | top |
| category | inproceedings |
| authors | Michal Ficek Lukas Kencl |
| title | Inter-Call Mobility Model: A Spatio-temporal Refinement of Call Data Records Using a Gaussian Mixture Model |
| booktitle | Proceedings of the 31st Annual IEEE International Conference on Computer Communications (INFOCOM'2012) |
| address | Orlando, Florida, USA |
| download url | http://www.rdc.cz/download/publications/p469-ficek.pdf |
| month | --03-- |
| year | 2012 |
| abstract | With global mobile phone penetration nearing 100\%, cellular Call Data Records (CDRs) provide a large-scale and ubiquitous, but also sparse and skewed snapshot of human mobility. It may be difficult or inappropriate to reach strong conclusions about user movement based on such data without proper understanding of user movement between call records. Based on an analysis of a real-world trace, we propose a novel, probabilistic Inter-Call Mobility (ICM) model of users' position in between calls. The ICM model combines Gaussian mixtures to build a general, comprehensive spatio-temporal refinement of CDRs. We demonstrate that ICM model's application yields strikingly different conclusions to the existing models when applied to basic CDR analyses, such as user proximity probability. |
| publisher | IEEE |
| keywords | wireless |
| keywords | measurement |
| keywords | ctu_personal |
| related data/tools | ctu/personal |
[Paper] ficek-spatial | top |
| category | inproceedings |
| authors | Michal Ficek Lukas Kencl |
| title | Spatial extension of the reality mining dataset |
| booktitle | IEEE 7th International Conference on Mobile Adhoc and Sensor Systems (MASS) 2010 |
| pages | 666-673 |
| year | 2010 |
| address | San Francisco, CA |
| download url | http://meltworks.org/MELT_Workshop/Program_files/ficek-kencl.pdf |
| month | --11-- |
| publisher | IEEE |
| keywords | wireless |
| keywords | measurement |
| keywords | ctu_personal |
| related data/tools | ctu/personal |


