AbstractComputing the natural join of a set of relations is an important operational in relational database systems. The ordering of joins determines to a large extent the computation time of the join. Since the number of possible orderings could be very large, query optimizers first reduce the search space by using various heuristics and then try to select an optimal ordering of joins. Avoiding Cartesian products is a common heuristic for reducing the search space, but it cannot guarantee optimal ordering in its search space, because the cheapest Cartesian-product-free (CPF, for short) ordering could be significantly worse than an optimal non-CPF ordering by a factor of an arbitrarily large number. In this paper, we use programs consisting of joins, semijoins, and projections for computing the join of some relations, and we introduce a novel algorithm that derives programs from CPF orderings of joins. We show that there exists a CPF ordering from which our algorithm derives a program whose cost is within a constant factor of the cost of an optimal ordering. Thus, our result demonstrates the effectiveness of avoiding Cartesian products as a heuristic for restricting the search space of orderings of joins.
The abstract is also available as a LaTeX file, a DVI file, or a PostScript file.
Preliminary versionA preliminary version of these results was presented in: Shinichi Morishita. Avoiding cartesian products in programs for multiple joins. In Proceedings of the Eleventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 368-379, San Diego, California, 2-4 June 1992.
Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems -- sequencing and scheduling; H.2.4 [Database Management]: Systems -- query processing; I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search -- plan execution, formation generation
General Terms: Algorithms, Database Theory, Languages
Additional Key Words and Phrases: Cartesian product, database scheme, join expression tree, join strategy, optimality, semijoin
Selected papers that cite this one
- Walter Hower. Revisiting global constraint satisfaction. Information Processing Letters, 66(1):41-48, 15 April 1998.
Selected references
- Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. Journal of the ACM, 30(3):479-513, July 1983.
- Philip A. Bernstein and Nathan Goodman. Power of natural semijoins. SIAM Journal on Computing, 10(4):751-771, November 1981.
- Ashok K. Chandra and Philip M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pages 77-90, Boulder, Colorado, 2-4 May 1977.
- Nathan Goodman and Oded Shmueli. The tree projection theorem and relational query processing. Journal of Computer and System Sciences, 28(1):60-79, February 1984.
- Yehoshua Sagiv and Oded Shmueli. Solving queries by tree projections. ACM Transactions on Database Systems, 18(3):487-511, September 1993.
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. Access path selection in a relational database management system. In Philip A. Bernstein, editor, Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pages 23-34, Boston, Massachusetts, 30 May-1 June 1979. ACM, New York.
- Arun N. Swami. Optimization of large join queries: Combining heuristic and combinatorial techniques. In James Clifford and Bruce G. Lindsay and David Maier, editors, Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, pages 367-376, Portland, Oregon, 31 May-2 June 1989. SIGMOD Record 18(2), June 1989.
- Arun N. Swami and Anoop Gupta. Optimization of large join queries. In Haran Boral and Per Eke Larson, editors, Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 8-17, Chicago, Illinois,, 1-3 June 1988. SIGMOD Record 17(3), September 1988.
- Y. C. Tay. On the optimality of strategies for multiple joins. Journal of the ACM, 40(5):1067-1086, November 1993.
- Eugene Wong and Karel Youssefi. Decomposition -- a strategy for query processing. ACM Transactions on Database Systems, 1(3):223-241, September 1976.