The current generation of desktop search engines (DSE) are an important first step for PDSMS, but are restricted to read-only querying. DSE does not offer any means to update the underlying data. Data may only be changed by directly accessing the underlying data sources. For this reason, a DSE cannot offer any update guarantees, such as durability or consistency. A DBMS, on the other hand, provides strict transactional ACID guarantees, but demand a high price for them: full control of the data. In contrast to both approaches, a PDSMS occupies the middle-ground between a read-only DSE (without any update capabilities) and a write-optimized DBMS (with strict ACID guarantees). Guarantees may vary according to the interfaces offered by the data sources managed by the PDSMS.
The development of PDSMS update mechanisms poses several challenges:.
A PDSMS has an update model that accounts for the fact that data may be independently updated via the APIs of the underlying data sources bypassing the PDSMS. In this scenario, ACID guarantees are too strict since the PDSMS may be notified of updates after the fact. Nevertheless, we believe that classical database recovery techniques may be adapted to this setting to provide softer backup and recovery guarantees (e.g., all items updated more than 5 min ago may be recovered). The recovery mechanisms also have to work for dataspaces backed up by distributed instances of a PDSMS.
Updates to personal information may be performed via the API of a given data source or directly via a PDSMS' API. Therefore, we must architect the PDSMS supporting and recognizing updates in the underlying data sources. Moreover, if updates are performed via a PDSMS' API, the PDSMS has to write the data back to the affected data sources. In that case, the PDSMS should decide in which subsystem(s) it is most suitable to be represented.
In a relational DBSMS, previous versions of a given tuple may be reconstructed from the database log (see e.g., time travel feature of Oracle). However, personal items are typically more heavy weight than relational tuples, as they may have medium to large content. An alternative to logging would be to keep an independent versioning subsystem (e.g. Subversion) to account for content versioning. We need to investigate how to integrate versioning into the update model and also whether there are profitable interactions with the techniques chosen for recovery (e.g. logging).
A user may have several devices, e.g. laptop, PC, and handheld. The PDSMS has to support personal dataspaces that are backed up by several distributed instances. Different instances are connected to form a single dataspace by using a name server. Data exchange is then performed in a peer-to-peer fashion. Note that distributed dataspaces also pose challenges related to security and privacy. We believe that the latter aspects are keys to convincing end-users to trust the services offered by a PDSMS implementation.
Topic revision: r1 - 24 Mar 2008 - 12:10:48 -
JidongChen?