Private Information Retrieval

Introduction

Retrieval of sensitive data from databases or web services, such as patent databases and medical databases, invokes concerns on user privacy exposure. The server’s ability of performing information inference is unfavorable to the users. The notion of Private Information Retrieval (PIR) was introduced in [CGKS95], which allows a user to retrieve an item from a server in possession of a database without revealing which item she is retrieving. More formally, we view the data as an n-bit string x from which the user wished to obtain the bit xi while keeping the index i private from the server.

PIR in Standard Model

PIR are originally introduced in information theoretic setting. This means that the queries asked by the user give no information whatsoever about i. In this setting, it is proved that there exist no better solutions than the trivial scheme which is for the server to return an entire copy of the database to the user. There are two ways to get around this problem: one is to make the server computationally bounded and the other is to assume that there are multiple non-cooperating servers each having a copy of the database. The assumption of multiple non-cooperating servers is not much practical, so we only research on the solution of computational PIR. The first computational single database PIR solution was constructed in [KO97]. In computational setting, the database is restricted to perform only polynomial-time computations so that the identity of i is only computationally hidden from the database. Naturally, the privacy in this case has to rely on some intractability assumption. Since then, PIR has emerged as an important cryptographic primitive and it turned out to be intimately connected to collision-resistant hash functions, oblivious transfer and public-key encryptions with additional properties. The surveys [Gas04, OS07] give an overview of many of the constructions and describe some of the connections of PIR to other primitives.

One challenge in PIR design is to reduce the communication complexity, which measures the number of bits transmitted between the user and the server per query. The upper bound of communication complexity is O(n), for the trivial scheme, while the lower bound is O(logn), since by all means the user has to provide an index. Another key performance metric of PIR schemes is the computation complexity at the server end. In [BIM00], it proved that in the standard PIR model, where the servers hold only the database, the server's computation for each retrieval is at least linear in the size of entire database, even if the user requires just one bit.

In the research work of PIR schemes in the standard model, the “linear in n” behavior of server-side computation seems satisfactory. So for a long time the computation cost for one retrieval stay at n∙Tmul(|N|), where Tmul is the cost of computing modular multiplication and |N| is the security parameter. Recently, [SC07] showed that single database PIR protocols, running on modern high-end hardware and networks, are mostly orders of magnitude slower than in the case of trivial (linear communication) transfer of the entire database to the client. This result stimulated the design of computationally efficient PIR scheme. Based on the worst-case hardness of some lattice problems which are conjectured to be hard, [GY07] designed a PIR protocol requiring the server to perform only O(n) modular additions and no multiplications per query. One drawback of the protocol is that a non-zero probability of error exists. In my opinion, novel PIR protocols with lower server-side per-bit computation requirements will be researched and designed in this research field.

PIR using Trusted Hardware

Different with the major body of PIR work focusing in the standard model, some PIR schemes using trusted hardware are also proposed. In [SS01], a single database PIR scheme using IBM cryptographic coprocessor (3DES engine can exceed 18megabytes/s) is designed, in which the coprocessor need to scan the entire data items for each retrieval and the main bottleneck is in DMA between the 3DES engine and the coprocessor’s internal RAM. Recently, the dominant component of hardware-based solutions [Aso04, IS05, WDDB06] use the periodic reshuffling of the database, performed by the secure hardware. The reshuffling period is determined by the size of the secure hardware’s tamper proof cache. For reshuffling, the operations performed by the secure hardware are encryptions, decryptions and communication with the host. With the assumption that the computation cost of encryption/decryption in secure hardware is negligible, [Aso04] requires O(n \sqrt n) operations for one reshuffle, [IS05] use Benes networks to decrease this overhead to O(nlogn) operations and [WDDB06] further reduce this overhead to O(n) operations. This approach is equal to running a client proxy inside the secure hardware, so the communication cost is just to send the index of data item to the server.

The main idea of dominant solution using trusted hardware is shown in Fig.1. Before affording retrieval service, the server host needs to shuffle and encrypt the entire data items using trusted hardware. We note the permutation π and symmetric key are only kept in hardware and not revealed to any other parties. When Alice sends the retrieval index i encrypted with K (established in SSL/TLS) to remote host equipped with a trusted hardware such as IBM4758 platform, the processing steps using hardware is as below:

PIR_fig1.JPG

Fig.1 PIR solution using trusted hardware

(1) Decrypts the index i and compute π-1(i);

(2) If the data of π-1(i)-th item is not in hardware buffer: retrieve π-1(i)-th item from the shuffled database, decrypt it and put it in buffer.

(3) If the data of π-1(i)-th item is in hardware buffer: randomly select an unretrieved item from the shuffled database, decrypt it and put it in buffer.

(4) For each time the hardware buffer is full, it needs to run a reshuffle protocol to make the database reshuffled using a new permutation, and the shuffled data items are also encrypted by new symmetric key.

One drawback of hardware-based PIR schemes is the inefficiency of secure hardware because it is orders of magnitude slower than main CPUs and the buffer is also very limited. Another main bottleneck is DMA between the encryption/decryption engine and the hardware’s internal RAM. So the frequent shuffling process using hardware is very costly. Moreover, when running the shuffling protocol, the server will disrupt the retrieval service. I believe efficient protocols are likely to access the secure hardware just sparsely.

PIR using Resident Delegate

This is an extended idea in our ongoing work based on PIR using hardware. The goal is to remove redundant communication cost between client and server. To achieve this goal, we can migrate the client’s retrieving codes to the server-side or establish a trusted application process (we call it “resident delegate”) on behalf of the client. The approach of establishing trusted process running on an untrusted OS can be supported by the excellent work of Fudan teams in Daoli project.

In this model, we can choose an existing PIR schemes which have least computation cost at server-side, without considering how much of the communication cost. It’s natural to find the trivial PIR solution which sends the entire database to the client will become one of our best choices. The trivial PIR scheme has the least computation cost O(1), notwithstanding the communication cost of O(n), so in this model the retrieving process is similar with the local data retrieving (see Fig.2). The resident delegate needs to realize the retrieving function and management of the entire data items. If retrieving data items from a complicated database server managing massive data, this trivial approach will not be practical unless the user plans to realize a complicated database management system. So the trivial solution using resident delegate only suit for the scenarios in which the data items should be easily managed, such as LDAP server storing certificates.

PIR_fig2.JPG

Fig.2 Trivial solution using resident delegate

But for a retrieval from DBMS which ingeniously organizes massive data records, we can not make use of the resident delegate to control the entire database because it is not a trivial work. It’s reasonable to assume that the delegate retrieves data items by calling the retrieval function of DBMS. Similar with the PIR solution using trusted hardware, we can realize the shuffle and encryption process using resident delegate to take the place of the trusted hardware. One of the advantages in this substitution is that it can use the main CPU and RAM to achieve more efficient computation. But how to reduce the times of shuffling and avoid the encryption/decryption of entire database is one of the challenges we meet.

In recent hot research on outsourced database and SaaS? (storage as a service), is it possible to design new organization and storage pattern of data to make the PIR service more practical? It’s also in our consideration.

References

[Aso04] D. Asonov. Querying Databases Privately: A New Approach to Private Information Retrieval. Springer Verlag, 2004.

[BIM00] Amos Beimel, Yuval Ishai and Tal Malkin. Reducing the servers' computation in private information retrieval: PIR with preprocessing. In Crypto’00, pages 55-73.

[CGKS95] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval. In Proc. of the 36th Annu. IEEE Symp. on Foundations of Computer Science, pages 41–51, 1995

[Gas04] William Gasarch. A survey on private information retrieval. The Bulletin of the European Association for Theoretical Computer Science, Computational Complexity Column, Vol 82, February 2004, pages 72-107

[GY07] William Gasarch and Arkady Yerukhimovich. Computationally Inexpensive cPIR. 2007 (unpublished)

[IS05] A. Iliev and S. W. Smith. Protecting client privacy with trusted computing at the server. IEEE Security and Privacy, 3(2):20–28, 2005.

[KO97] E. Kushilevitz and R. Ostrovsky. Replication is not needed: Single database, computationally-private information retrieval. In Proc. of the 38th Annu. IEEE Symp. on Foundations of Computer Science, pages 364–373, 1997

[OS07] Rafail Ostrovsky and William E. Skeith. A survey of single-database PIR: Techniques and Applications. In PKC 2007

[SC07] Radu Sion, Bogdan Carbunar. On the Computational Practicality of Private Information Retrieval, In Network and Distributed System Security Symposium (NDSS 2007)

[SS01] S. W. Smith and D. Safford. Practical server privacy with secure coprocessors. IBM Systems Journal, 40(3):683 695, 2001.

[WDDB06] S.Wang, X. Ding, R. Deng, and F. Bao. Private information retrieval using trusted hardware. In 11th European Symposium On Research In Computer Security (ESORICS), 2006.


Comments

 
Topic attachments
I Attachment Action Size Date Who Comment
jpgJPG PIR_fig1.JPG manage 26.0 K 10 Mar 2008 - 03:05 Main.Zhangj1  
jpgJPG PIR_fig2.JPG manage 20.2 K 10 Mar 2008 - 03:05 Main.Zhangj1  
Topic revision: r2 - 09 May 2008 - 13:36:19 - TWikiAdminUser
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback