The goal of Personal Information Management (PIM) is to offer easy access and manipulation of all of the information on a persons desktop, with possible extension to mobile devices, personal information on the Web, or even all the information accessed during a person's lifetime. Although DBMS technology successfully resolved the physical and logical data independence problem for highly structured data, it is no coincidence that the problem remains unsolved for the highly heterogeneous data mix present in personal information.
Today we rarely have a situation in which all the data that needs to be managed can fit nicely into a conventional relational database management system (DBMS). Rather, most of the data will be authored independently from a DBMS and will not be in its full control. This world of disparate, distributed and independently authored unstructured, semi-structured and structured data is termed a dataspace. Personal dataspace is the total of all personal information pertaining to a certain person. One crucial aspect of dataspace management systems is their need to provide a pay-as-you-go information integration framework that allows to integrate data without defining a global schema.
Existing Information Management Solutions
The Figure shows a design space of existing information management solutions along two dimensions, Semantic Integration Requirement and Update guarantees. The horizontal axis displays requirements for semantic integration, while the vertical axis displays the degree of update guarantees provided by different systems.

Figure: Design space of existing information management solutions
DBMS requires high semantic integration efforts and full control of the data, but provides strong update guarantees (ACID). Strictly opposed to that, a desktop search engine (DSE) does neither require semantic integration, nor full control of the data. On the other hand, these systems do not provide any update guarantees and do not allow structural information to be exploited for queries. Data warehouses are optimized for read-only access. Furthermore, they require very high semantic integration efforts. Traditional information integration systems (middle-left) require high semantic integration investments and vary in terms of their update guarantees. Some systems extend data warehouse and information integration technology. They extract information from desktop data sources into a repository and represent that information in a domain model (ontology). The domain model is a high-level mediated global schema over the personal information sources. This schema-first approach makes it hard to integrate information in a pay-as-you-go fashion as required by a dataspace management system. Versioning systems (e.g., Subversion, Perforce) provide strong update guarantees but do not require semantic integration. File systems provides weaker update guarantees than versioning systems (e.g., recovery on metadata for journaling file systems). However, Windows Vista may give some basic information management capabilities, covering functionalities offered by file system and DSEs.
The figure shows a huge design space between the different extremes not covered by current information management solutions. However, the abstraction of personal dataspaces calls for a new kind of system that is able to support the entire personal dataspace of a user. We term this kind of system a Personal
DataSpace? Management System (PDSMS). A PDSMS will fill that gap, which occupies the design space in-between the two extremes high semantic integration (schema-first) and low semantic integration (no schema). Furthermore, PDSMSs occupy the middleground between a read-only DSE (without any update guarantees) and a write-optimized DBMS (with strict ACID guarantees).
Overview of PDSMS
Unlike traditional information integration approaches, a PDSMS does not require semantic data integration before any data services are provided. Rather, a PDSMS is a data co-existence approach in which tighter integration is performed in a pay-as-you-go fashion. Unlike in a relational DBMS, a PDSMS does not assume full control of the data, but rather manages the complex dataspace of one's personal information.
Research Challenges in PDSMS
We discuss several research challenges encountered building such a system, data model and query language, storage and indexing, search and query, update personal information:
A single and unified data model
Extended full-text indexing
Comments