CRAWDAD metadata: ctu/personal (v. 2012-03-15)

This dataset contains 142 days of mobile phone records (aka Call Data Records) and ground-truth movement description of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011.
[xml metadata]

Note: This metadata was prepared by the CRAWDAD team and verified by the data set (or tool) authors. We have made every effort to ensure its accuracy, but urge all users to consider the metadata and data carefully and be sure that their use in research is consistent with the nature and limitations of the data. We welcome any corrections. This metadata was prepared based on the following reference(s):


    CRAWDAD metadata structure[what is CRAWDAD metadata]


    [Dataset] ctu/personal (v. 2012-03-15)

    top

    version v. 2012-03-15
    changes
    the initial version
    bibtex
    @MISC{ctu-personal-2012-03-15,
      author = {Michal Ficek},
      title = {{CRAWDAD} data set ctu/personal (v. 2012-03-15)}, 
      howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal},
      month = mar,  
      year = 2012
    }
    					
    metadata last modified2012-03-15
    summary
    This dataset contains 142 days of mobile phone records (aka Call Data Records)
    and ground-truth movement description of Czech Ph.D. student Michal Ficek, 
    stored by his own mobile terminal in 2010-2011.
    release date2012-03-15
    measurement start 2010-08-16
    measurement end 2011-02-06
    authorsMichal Ficek
    web site http://www.crawdad.org/ctu/personal
    wiki go to the wiki page for this data set
    keywordcellular network, location
    measurement purposesUser Mobility Characterization
    Usage Characterization
    Positioning Systems
    Social Network Analysis
    Human Behavior Modeling
    Localization
    network typeGSM (Global System for Mobile Communications)
    network typecellular network
    environment
    This dataset contains 142 days of mobile phone records (as known as Call Data 
    Records) and cell transitions (a ground-truth movement description) of Czech 
    Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011. 
    The dataset covers more than 99.99% of 142 days of mobile phone usage in mobile
    networks of 8 different providers in 5 countries: Czech Republic, Slovak 
    Republic, Germany, Austria and the USA.
    network
    The phone was serviced mostly by Vodafone Czech Republic, the home network of 
    the user, in the Czech Republic. Other network providers in countries abroad 
    are as follows: Orange (Slovakia), A1 Telekom (Austria), T-Mobile Deutschland, 
    Vodafone D2, O2 (Germany), and T-Mobile and AT&T (USA)
    collection
    The source of the data is user's own mobile phone Nokia E52. The publicly 
    available LogExport application was used to record time and type of 
    communication events (voice, SMS, data). For cell-transition recording, the 
    free CellTrack91 application was utilized. The coordinates of positions within 
    the cells were obtained by translating the Cell-IDs to their geographical 
    coordinates by querying the Google Location API, as described in our MASS paper.
    sanitization
    The Cell Global Identity of a cell the mobile phone is attached to is only 
    partially anonymized. Fields with original values are the Mobile Country Code 
    (MCC) and the Mobile Network Code (MNC), to distinguish in which country a 
    mobile phone was present, and which provider serviced it. The Location Area Code
    (LAC) and the Cell-ID are anonymized, in other words, renumbered according to 
    the time of their first occurence in the dataset. Such approach does not limit 
    the data usage but helps the mobile providers not to feel threatened by 
    exposing the Cell-IDs together with the approximate geographical coordinates of
    the cell. This geographical information, the longitude/latitude coordinates of 
    a cell, is not anonymized and thus represents a way to reconstruct a 
    ground-truth movement trajectory of the mobile phone.
    limitation
    The spatial accuracy of the data is typical for a cellular network. It depends 
    on a cell size and thus varies from tens to hundred of meters in urban areas to
    several kilometers in rural areas.
    hole
    There are only three gaps in the data when the cell-tracking application was 
    turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from
    05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 
    09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off 
    during the measurement period, except when on-board of a plane and airborne.
    error
    The positions within the cells were obtained by querying the Google Location 
    API. In our MASS paper, we showed, by comparing with data obtained from a large
    and cooperating mobile network provider, that the accuracy of such approach is 
    nearing the cellular network operator's own approximation of position inside a
    cell.
    tracesets included ctu/personal/mobile (v. 2012-03-15)

    [Traceset] ctu/personal/mobile (v. 2012-03-15)

    top

    version v. 2012-03-15
    changes
    the initial version.
    bibtex
    @MISC{ctu-personal-mobile-2012-03-15,
      author = {Michal Ficek},
      title = {{CRAWDAD} trace set ctu/personal/mobile (v. 2012-03-15)}, 
      howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal/mobile},
      month = mar,  
      year = 2012
    }
    					
    metadata last modified2012-03-15
    summary
    This traceset contains 142 days of mobile phone records (aka Call Data Records)
    and ground-truth movement description of Czech Ph.D. student Michal Ficek, 
    stored by his own mobile terminal in 2010-2011.
    release date2012-03-15
    measurement start 2010-08-16
    measurement end 2011-02-06
    measurement purposesUser Mobility Characterization
    Usage Characterization
    Positioning Systems
    Social Network Analysis
    Human Behavior Modeling
    Localization
    methodology
    On a mobile phone Nokia E52 (firmware version 054.003) we run a publicly 
    available application LogExport 1.1 UTC 
    (http://tinyhack.com/freewarelist/s603rd/2007/03/02/logexport/) to record both 
    time and type of communication events. For cell transitions recording the free 
    CellTrack91 1.0.9 (http://www.afischer-online.de/sos/celltrack/) application 
    was used. Every week during the measurement period the data from both 
    applications were stored, and the cell coordinates were obtained from Google 
    Location API. The mobile phone was allways carried by the dataset author.
    sanitization
    The Cell Global Identity of a cell the mobile phone is attached to is only 
    partially anonymized. The Location Area Code (LAC) and the Cell-ID are 
    anonymized, in other words, renumbered according to the time of their first 
    occurence in the dataset. The Mobile Country Code (MCC) and Mobile Network Code
    (MNC) remain intact, are not anonymized.
    limitation
    The spatial accuracy of the data is typical for a cellular network. It depends 
    on a cell size and thus varies from tens to hundred of meters in urban areas to
    several kilometers in rural areas.
    hole
    There are only three gaps in the data when the cell-tracking application was 
    turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from
    05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 
    09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off 
    during the measurement period, except when on-board of a plane and airborne.
    error
    The positions within the cells were obtained by querying the Google Location 
    API. In our MASS paper, we showed, by comparing with data obtained from a large
    and cooperating mobile network provider, that the accuracy of such approach is 
    nearing the cellular network operator's own approximation of position inside a
    cell.
    download urlDownload (24KB gz)
    (MD5 Hash: be33b354956287a768fb5446594d5900) from US UK AU
    download urlDownload (320KB gz)
    (MD5 Hash: 6ce11990c64d107c7ef55c1c94eb223c) from US UK AU
    parent datactu/personal (v. 2012-03-15)
    traces included ctu/personal/mobile/2010 (v. 2012-03-15)

    [Trace] ctu/personal/mobile/2010 (v. 2012-03-15)

    top

    version v. 2012-03-15
    changes
    the initial version
    bibtex
    @MISC{ctu-personal-mobile-2010-2012-03-15,
      author = {Michal Ficek},
      title = {{CRAWDAD} trace ctu/personal/mobile/2010 (v. 2012-03-15)}, 
      howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/ctu/personal/mobile/2010},
      month = mar,  
      year = 2012
    }
    					
    metadata last modified2012-03-15
    summary
    This trace covers 142 days of mobile phone usage by Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011
    derivedfalse
    release date2012-03-15
    measurement start 2010-08-16
    measurement end 2011-02-06
    configuration
    We used the application LogExport 1.1 running on a mobile phone Nokia E52 (fw 
    054.003).
    format
    The communications' trace, ficek_personal_communication.csv, consists of 
    timestamped records for every voice, text message and data communication, 
    either outgoing, or incoming. 
    
    The movement trace, ficek_personal_movement.csv, contains a timestamped list 
    with full Cell Global Identity of a cell the phone was attached to (Mobile 
    Country Code, Mobile Network Code, Location Area Code, and Cell-ID), and the 
    approximate geographical coordinates of the corresponding cell tower (longitude,
    latitude) in non-anonymized form. 
    
    Each file has 1 header row.
    
    ficek_personal_communication.csv contains the following fields.
    
    Fields 1-5: "YYYYMMDD","hhmmss (UTC+0)","Type","Direction","Duration".
    - The time field "hhmmss" represents the GMT time.
    - Type of communication is either "Voice", "SMS" or "Data". 
    - Communication direction in the "Direction" field is either "Outgoing" (call 
      made, SMS sent, Data session started), "Incoming" (call or SMS received), or 
      "Missed call".
    - "Duration" field stores the duration in seconds of a call or a data session. 
    
    ficek_personal_movement.csv contains fields "YYYYMMDD","hhmmss (UTC+0)","MCC",
    "MNC","LAC","CID","Latitude","Longitude","Timezone".
    - The time field "hhmmss" represents the GMT time.
      The other fields are self-explanatory. ("MCC" stands for the Mobile Country 
      Code, "MNC" for the Mobile Network Code, "LAC" for the Location Area Code,
      "CID" for the Cell-ID.)
    
    To get the local time, the "Timezone" field must be added to the UTC time. The 
    timezone field already contains the daylight saving time (DST) adjustment. If 
    MCC=0 and MNC=0, the mobile phone is at a place without signal coverage. If 
    Latitude and Longitude fields equal zero, the coordinates for the corresponding
    cell are unknown.
    sanitization
    The cell numbers of parties communicating with the mobile phone are not present.
    The Cell Global Identity of a cell the mobile phone is attached to is partially
    anonymized. The Mobile Country Code (MCC) and Mobile Network Code (MNC) remain 
    intact, are not anonymized.
    limitation
    The spatial accuracy of the data is typical for a cellular network. It depends 
    on a cell size and thus varies from tens to hundred of meters in urban areas to
    several kilometers in rural areas.
    
    We are aware of two situations where the geographical coordinates of cells in
    the data do not correspond to their actual coordinates.
    
    1) Due to the nature of cell-retrieving method, the coordinates of about 13 
    cells (out of approx. 3700 cells) were not found by the Google Location API 
    and thus are missing in the trace. Such records have the MCC, MNC, LAC and CID 
    fields filled, but their Longitude and Latitude fields are set to zero.
                        
    2) For a specific reason, all cells that cover different subway stations in Prague,
    the capital of the Czech Republic, share the same geographical coordinates 
    (50.074297, 14.428297). However, they are in fact distributed all around the Prague.
    hole
    There are only three gaps in the data when the cell-tracking application was 
    turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from
    05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 
    09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off 
    during the measurement period, except when on-board of a plane and airborne.
    error
    The positions within the cells were obtained by querying the Google Location 
    API. In our MASS paper, we showed, by comparing with data obtained from a large
    and cooperating mobile network provider, that the accuracy of such approach is 
    nearing the cellular network operator's own approximation of position inside a
    cell.
    parent datactu/personal/mobile (v. 2012-03-15)

    [Author] Michal Ficek

    top

    emailmichal.ficek@fel.cvut.cz
    institutionCzech Technical University in Prague
    departmentElectrical Engineering
    positionPh.D. Student
    addressTechnicka 2, 166 27, Prague, Czech Republic
    phone00420-606-842-803
    web site http://www.rdc.cz/en/people/ficek
    related data/toolsctu/personal (v. 2012-03-15)

    [Paper] ficek-intercall

    top

    category inproceedings
    authorsMichal Ficek
    Lukas Kencl
    titleInter-Call Mobility Model: A Spatio-temporal Refinement of Call Data Records Using a Gaussian Mixture Model
    booktitleProceedings of the 31st Annual IEEE International Conference on Computer Communications (INFOCOM'2012)
    addressOrlando, Florida, USA
    download urlhttp://www.rdc.cz/download/publications/p469-ficek.pdf
    month--03--
    year2012
    abstract
    With global mobile phone penetration nearing 100\%, cellular Call Data Records 
    (CDRs) provide a large-scale and ubiquitous, but also sparse and skewed 
    snapshot of human mobility. It may be difficult or inappropriate to reach 
    strong conclusions about user movement based on such data without proper 
    understanding of user movement between call records. Based on an analysis of a 
    real-world trace, we propose a novel, probabilistic Inter-Call Mobility (ICM) 
    model of users' position in between calls. The ICM model combines Gaussian 
    mixtures to build a general, comprehensive spatio-temporal refinement of CDRs. 
    We demonstrate that ICM model's application yields strikingly different 
    conclusions to the existing models when applied to basic CDR analyses, such as 
    user proximity probability.
    publisherIEEE
    keywordswireless
    keywordsmeasurement
    keywordsctu_personal
    related data/toolsctu/personal

    [Paper] ficek-spatial

    top

    category inproceedings
    authorsMichal Ficek
    Lukas Kencl
    titleSpatial extension of the reality mining dataset
    booktitleIEEE 7th International Conference on Mobile Adhoc and Sensor Systems (MASS) 2010
    pages666-673
    year2010
    addressSan Francisco, CA
    download urlhttp://meltworks.org/MELT_Workshop/Program_files/ficek-kencl.pdf
    month--11--
    publisherIEEE
    keywordswireless
    keywordsmeasurement
    keywordsctu_personal
    related data/toolsctu/personal