Review

This paper considers the problem of supporting real physical data independence. It describes a mechanism, called GMAPs (generalized multi-level access paths), in which one can describe the storage schema of the data, relative to a logical schema. Essentially, GMAPs describe the storage schema as views over the logical schema. Using GMAPs, one can describe structures such as secondary indexes on relations, nested indexes, collection based indexes and structures implementing field replication.

The paper then considers the problem of query optimization when the storage is described as GMAPs. Essentially, the problem boils down to that of answering queries using views, because a query over the logical schema needs to be translated into an expression over the views describing the storage. This is the first paper to describe an extension of System-R style optimization that is able to exploit materialized views that are not mentioned in the query. As such, the algorithm is also useful for query optimization when materialized views are available.

A related paper by Chaudhuri et al. [2] also considers the problem of query optimization in the presence of materialized views. In that paper, the authors consider multi-set semantics for queries, which slightly simplifies the ways in which views can interact to answer a query. Other papers have later considered the theoretical underpinnings of the problem of answering queries using views, and studied it in the context of data integration systems.

I think that the idea put forth by GMAPs will become even more important as relational database systems are used to store non-relational data. For example, several recent groups have considered storing XML data (and more generally, semi-structured data) in relational databases. In this context, there is a clear mismatch between the logical model of the data and its storage schema. GMAPs provide a method for enabling relational databases to perform efficiently in the presence of such a mismatch.