Evaluating Personal Search Workshop | ECIR2011, 18 April 2011, Dublin, Ireland

Guinness Storehouse - workshop venue

Datasets

Jinyoung Kim and Bruce Croft have created simulated desktop collections and user queries. Interested participants can use these resources prior to the workshop in various ways -- analyzing the characteristics of the data, evaluating various retrieval methods using the queries and result sets, suggesting a better data set for evaluating personal search, etc. Details on these resources follow.

Pseudo-desktop Collections & Queries

Pseudo-desktop collections [1] were created so that they may contain typical file types in desktop collections like emails, web pages (html) and office documents (pdf, doc and ppt) related to specific individuals. Documents were collected by filtering from the W3C email collection and using Yahoo! web search API. Queries were generated by taking terms from each of the target documents, which were later validated using a separate set of hand-written queries.

  1. Kim, J. and Croft W. B. (2009) Retrieval experiments using pseudo-desktop collections. In CIKM ’09, pp. 1297–1306. ACM.


Computer Science Collection & Queries with session logs

The computer science (CS) collection [1] was created for the evaluation of desktop search, where documents of various types were collected from many public sources in the Computer Science department, Univ. of Mass, Amherst. Known-item queries were created from the people of the same department using the DocTrack game.

  1. Kim, J. and Croft W. B. (2010) Ranking using multiple document types in desktop search. In SIGIR ’10, New York, NY, USA, 2010. ACM.


How To Get the Datasets

To obtain the datasets, we ask you to send a brief introduction and agreement to the following terms of use to Jinyoung Kim (jykim at cs dot umass dot edu). Once your identity is verified, you will be given links to download the datasets, along with detailed instructions on how to use them.


Terms of Use

The dataset may only be used for internal evaluations and research purposes, and must not be shared with other parties without prior written permission. Any use of the dataset, including publication of excerpts of the data, shall not infringe on the rights of any third party, including, but not limited to, the authors and publishers of the excerpts and any individuals named in the dataset. Publication of the dataset excerpts that reveals an individual's identity is prohibited. This dataset is derived from TREC data, so there are access limitations. Please contact Jinyoung Kim (jykim at cs dot umass dot edu) to inquire about access to the dataset.


Disclaimer of Warranty

USER ACKNOWLEDGES AND AGREES THAT "DATA" RECEIVED ARE PROVIDED BY THE UNIVERSITY OF MASSACHUSETTS AMHERST AND OTHER CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR DATA CONTRIBUTORS BE LIABLE FOR SPECIAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, INCIDENTAL OR OTHER DAMAGES, LOSSES, COSTS, CHARGES, CLAIMS, DEMANDS, FEES OR EXPENSES OF ANY NATURE OR KIND ARISING IN ANY WAY FROM THE FURNISHING OF OR USER’S USE OF THE DATA RECEIVED.