NEPOMUK Deliverable D4.1: Distributed Search System - Basic Infrastructure

Last modified by StephaneLauriere on 2007/01/15 00:45

NEPOMUK Deliverable D4.1: Distributed Search System - Basic Infrastructure

D4.1Distributed Search System - Basic Infrastructure
Publication date31/12/2006
Dissemination levelPublic

Executive summary

The goal of this document is to describe the basic infrastructure of the Distributed Search System, which is realized by the Distributed Index (DI) component. It focuses on the functionality available in the first Nepomuk prototype, which will be later extended by additional operations to cover the mandatory functional requirements.

More particularly, the requirement of finding information in remote desktops raised the need of a component to perform distributed search inside a group of Nepomuk users. DI is the identified component in the Nepomuk architecture for performing this task. The implemented component is based on P-Grid, a highly scalable structured overlay network. Nevertheless, a number of state-of-the-art approaches have been investigated and evaluated in their ability to meet the functional and non-functional requirements of the Nepomuk case studies.

Therefore, Section 2 provides the identified functional and non-functional requirements. This set of requirements is the result of analyzing the case studies and gathering information based on a detailed questionnaire distributed to the responsible partners. Moreover, commonly deployed P2P applications have been evaluated to extract additional requirements for the targeted system. However, meeting the complete set of non-functional requirements is very challenging, since many conflicts arise and trade-offs are in place.

Section 3 evaluates the state-of-the-art solutions for distributed search. More specifically, a number of P2P overlay networks have been designed aiming in providing a number of features such as scalability, fault-tolerance, load-balancing, etc. Some well-known approaches are Chord, Pastry, CAN, Gnutella, Edutella, RDFGrowth and Freenet, which are shortly described. However, P-Grid has certain advantages over them, i.e., its hybrid architecture combining unstructured and structured overlays, preserving key-order, which is essential for complex search operations such as range queries and similarity queries as well as great adaptability to several environment conditions, i.e., churn rate. Therefore, it has been selected for implementing Nepomuk's distributed search task.

Section 4 describes the API (defined in WSDL) and the provided functionality of the DI component. Moreover, it discusses the interactions with the other Nepomuk components. Afterwards, Section 4.3 provides important information on the core concepts, the architecture and the implementation of P-Grid. A preliminary evaluation is given too, based on some locally performed experiments. Section 4.5 investigates deeper the nature and the context of the exchanged social metadata. Storage, ontologies and instances are discussed in further detail, as they have been identified by the personal workspace model developed in WP2000.

Summarizing, the complete followed procedure of identifying the requirements, evaluating the existing solutions, defining the interactions and developing the first prototype for distributed search are given in this document. The preliminary results show promising performance for the Nepomuk case studies.

Nepomuk - NEPOMUK - The Social Semantic Desktop - FP6-027705

Nepomuk Consortium 2006-2008