A major challenge of managing personal dataspace is dealing with its heterogeneity. Heterogeneity relates to data models and formats used to represent personal information. It also relates to the data sources in which that information is available and to the mechanisms available for data delivery (push/pull). Ideally, a PDSMS has a single, unified representation for all personal information that bridges the divide between different data models and data representations. This unified representation would enable queries that ignore which system is used to store the personal information, which format is used and where the data is located. That unified representation should not require any semantic ntegration efforts as required by traditional information integration approaches.
Unlike a DBMS, a PDSMS needs to support multiple data models at its core so it accommodates as many types of data sources as possible in a natural way. The data models supported by a PDSMS will fall into a hierarchy of expressive power. For example, at the very top (most general) level of the hierarchy are collections of named resources, possibly with basic properties, such as size, creation date and type (e.g., JPEG image, MySQL database). Query against this data model corresponds to what a file system typically supports for its directories: name match, find in date range, sort by file size, and so forth. Below the top level, a PDSMS should support the bag-of-words data model, implying that we should be able to pose keyword queries on any data source in the personal dataspace, and hence gain some visibility into the data source in personal dataspace. The semi-structured labeled-graph data model can come one level below the bag-of-words model in the hierarchy. Whenever a data source supports some structure, we should be able to pose simple path or containment queries, or possibly more complex queries based on the semi-structured data model. The goal should be that whenever there is a way of naturally interpreting a path query on a participant, the query processor should attempt to follow such an interpretation. There will be other data models in the hierarchy, including the relational model, XML with schema, RDF, OWL (the Web Ontology Language). Given an environment, a key challenge is to find methods for interpreting queries in various languages on data sources that support certain models. Specifically, how do we reformulate a query posed in a complex language on a source that supports a weaker data model, and conversely, how do we reformulate a query in a simple language on a source that supports a more expressive model and query language (e.g., keyword query on a relational database).
Topic revision: r1 - 24 Mar 2008 - 12:09:15 -
JidongChen?