AbstractCaching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper, we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault rate, with particular applications to large-scale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and mth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
The abstract is also available as a LaTeX file, a DVI file, or a PostScript file.
Preliminary versionA preliminary version of these results was presented in: Jeffrey Scott Vitter and P. Krishnan. Optimal prefetching via data compression (extended abstract). In 32nd Annual Symposium on Foundations of Computer Science, pages 121-130, San Juan, Puerto Rico, 1-4 October 1991. IEEE.
Categories and Subject Descriptors: D.4.2 [Operating Systems]: Storage Management; D.4.8 [Operating Systems]: Performance -- stochastic analysis; E.4 [Coding and Information Theory] -- data compaction and compression; I.2.6 [Artificial Intelligence]: Learning
General Terms: Algorithms, design, performance, theory
Additional Key Words and Phrases: Caching, competitive analysis, databases, data compression, fault rate, hypertext, Markov source, prediction, prefetching, secondary stage, universal prefetcher
Selected references
- Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929-965, October 1989.
- Raymond Board and Leonard Pitt. On the necessity of Occam algorithms. In Proceedings of the Twenty Second Annual ACM Symposium on Theory of Computing, pages 54-63, Baltimore, Maryland, 14-16 May 1990.
- Allan Borodin, Sandy Irani, Prabhakar Raghavan, and Baruch Schieber. Competitive paging with locality of reference (preliminary version). In Proceedings of the Twenty Third Annual ACM Symposium on Theory of Computing, pages 249-259, New Orleans, Louisiana, 6-8 May 1991.
- Amos Fiat, Richard M. Karp, Michael Luby, Lyle A. McGeoch, Daniel D. Sleator, and Neal E. Young. Competitive paging algorithms. Journal of Algorithms, 12(4):685-699, December 1991.
- Sandy Irani, Anna R. Karlin, and Steven Phillips. Strongly competitive algorithms for paging with locality of reference. In Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 228-236, Orlando, Florida, 27-29 January 1992.
- Anna R. Karlin, Steven J. Phillips, and Prabhakar Raghavan. Markov paging (extended abstract). In 33rd Annual Symposium on Foundations of Computer Science, pages 208-217, Pittsburgh, Pennsylvania, 24-27 October 1992. IEEE.
- Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts (extended abstract). In 31st Annual Symposium on Foundations of Computer Science, volume I, pages 382-391, St. Louis, Missouri, 22-24 October 1990. IEEE.
- P. Krishnan and Jeffrey Scott Vitter. Optimal prediction for prefetching in the worst case. In Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 392-401, Arlington, Virginia, 23-25 January 1994.
- Daniel D. Sleator and Robert E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28(2):202-208, February 1985.
- Ian H. Witten, Radford M. Neal, and John G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520-540, June 1987.