Consider a multiprocessor with k identical processors, executing parallel programs consisting of n processes. Let T$-s$/(P) and T$-d$/(P) denote the execution times for the program P with optimal static and dynamic allocations, respectively, i.e., allocations giving minimal execution time. We derive a general and explicit formula for the following maximal execution time ratio: g(n, k) $EQ max T$-s$/(P)/T$-d$/(P), where the maximum is taken over all programs P consisting of n processes. Any interprocess dependency structure for the programs P is allowed only by avoiding deadlock. Overhead for synchronization and reallocation is neglected. Basic properties of the function g(n, k) are established, from which we obtain a global description of the function. Plots of g(n, k) are included. The results are obtained by investigating a mathematical formulation. The mathematical tools involved are essentially tools of elementary combinatorics. The formula is a combinatorial function applied on certain extremal matrices corresponding to extremal programs. It is mathematically complicated but rapidly computed for reasonable n and k, in contrast to the np-completeness of the problems of finding optimal allocations.

For the execution of an arbitrary parallel program P, consisting of a set of processes with any executable interprocess dependency structure, we consider two alternative multiprocessors. The first multiprocessor has q processors and allocates parallel programs dynamically; i.e., processes may be reallocated from one processor to another. The second employs cluster allocation with k clusters and u processors in each cluster: here processes may be reallocated within a cluster only. Let T-d(P, q) and T-c(P, k, u) be execution times for the parallel program P with optimal allocations. We derive a formula for the program independent performance function [GRAPHICS] Hence, with optimal allocations, the execution of P can never take more than a factor G(k, u, q) longer time with the second multiprocessor than with the first, and there exist programs showing that the bound is sharp. The supremum is taken over all parallel programs consisting of any number of processes. Overhead for synchronization and reallocation is neglected only. We further present a tight bound which exploits a priori knowledge of the class of parallel programs intended for the multiprocessors, thus resulting in a sharper bound. The function g(n, k, u, q) is the above maximum taken over all parallel programs consisting of n processes. The functions G and g can be used in various ways to obtain tight performance bounds, aiding in multiprocessor architecture decisions.

3. Optimal worst case formulas comparing cache memory associativityLennerstad, Håkan

In this paper we derive a worst case formula comparing the number of cache hits for two different cache memories. From this various other bounds for cache memory performance may be derived. Consider an arbitrary program P which is to be executed on a computer with two alternative cache memories. The rst cache is set-associative or direct-mapped. It has k sets and u blocks in each set; this is called a (k, u)-cache. The other is a fully associative cache with q blocks-a (1, q)-cache. We derive an explicit formula for the ratio of the number of cache hits h(P, k, u) for a(k, u)-cache compared to a (1, q)-cache for a worst case program P. We assume that the mappings of the program variables to the cache blocks are optimal. The formula quantifies the ratio [GRAPHICS] where the in mum is taken over all programs P with n variables. The formula is a function of the parameters n, k, u, and q only. Note that the quantity h ( P, k, u) is NP-hard. We assume the commonly used LRU (least recently used) replacement policy, that each variable can be stored in one memory block, and that each variable is free to be mapped to any set. Since the bound is decreasing in the parameter n, it is an optimal bound for all programs with at most n variables. The formula for cache hits allows us to derive optimal bounds comparing the access times for cache memories. The formula also gives bounds ( these are not optimal, however) for any other replacement policy, for direct-mapped versus set-associative caches, and for programs with variables larger than the cache memory blocks.