# Dynamic Reconfiguration: Architectures and Algorithms

Also observe that Algorithm 2. Note also that the data communicated in Step 3 of Algorithm 2. Problem 2. We now illustrate the use of neighbor localization in an algorithm to construct a list of input elements in sorted order. Chain Sorting. Let be N elements from a totally ordered set. The object of chain sorting is to string these elements in increasing or decreasing order in a list.

The standard sorting problem outputs the information as an array of elements in the sorted order. A chain sorted list is a step shy of a sorted array: ranking a chain sorted list and relocating its elements produces a sorted array. In this numbers where section, we design an algorithm to chain sort N R-Mesh. Note that each input takes log N on a The algorithm has three stages a value from the set see Figure 2. This guarantees a stable7 chain sorting algorithm.

The second stage constructs a list L of elements of V in ascending order. This concatenated ified by L. Note that if list strings all the input elements in the stably sorted order. The key to the first two stages is neighbor localization. The chain sorting algorithm assumes the neighbor localization algorithm to output not just a list of neighbors, but also to flag the first element of this list; as noted in the paragraph following Theorem 2.

That is, if and then appears before in the final output. Stability is important in extending sorting algorithms for small-sized inputs to work for larger-sized inputs. For Stages 2 and 3, processor in Steps 1 and 2 of Stage 2. Perform neighbor localization on row 0 to construct list L of active processors in ascending order of their column indices. Stage 1: Step 1 flags all processors of row that hold inputs of value after the broadcast.

Step 2 strings these inputs together as list and each have value and if Notice that if two inputs then the definition of neighbor localization ensures that precedes in list this property of will ensure that the chain sorting is stable. Step 3 of Stage 1 simply moves each list to row 0. Stage 2: At the start of this stage, processor is flagged to indicate heads its list as a result of neighbor localization in if element is also flagged to indicate if Stage 1.

Similarly, processor In Step 1, each processor obtains a is the last element of pointer to the first element of list if it exists. Similarly in Step 2, each processor obtains a pointer to the last element of list if it exists. Note that since input values are drawn from the set the rows of the R-Mesh suffice for the communications in Steps 1 and 2.

Step 3 flags processor as The Reconfigurable Mesh: A Primer 39 active, if list exists; that is, if value appears in the input. The neighbor localization of Step 4 now simply strings the input values active processors in the sorted order. Observe that since enough room exists for all input values. Stage 3: Suppose that value follows in list L; that is, after the next higher value in the input is Here the R-Mesh concatenates list to the end of list Figure 2. In Step 2, processor receives a pointer to the first element of list from processor which it holds in recall that processor obtained a pointer to the first element of in Step 1 of Stage 2.

In Step 3, processor uses to point the last element of list to the first element of list processor obtained a pointer to the last element of in Step 2 of Stage 2. Since the neighbor localization of Stage 2 lists the values in the correct order, the output is chain sorted. Since the neighbor localizations of Stage 1 create lists in order of input indices, the sorting is stable.

Since each step runs in constant time, so does the entire algorithm. Initially, each processor of row 0 holds an input. In this section we present an algorithm for this problem, in the process illustrating a simple way to generate sub-R-Meshes of an R-Mesh. The mesh topology readily lends itself to a decomposition into sub-meshes. In this decomposition, neighboring processors of each sub-mesh have to be neighbors in the underlying mesh as well. An R-Mesh, however, can also decompose into sub-meshes whose neighboring processors need not be neighbors in the underlying mesh topology.

This ability is crucial to the following maximum finding algorithm. Maximum Finding. Let be N elements drawn from a totally ordered set. The maximum finding problem is to find the largest of the inputs. Since inputs can be distinguished by their indices, no loss of generality arises in assuming that all inputs are distinct. We first develop a fast, but inefficient, algorithm for maximum finding, which, in turn, leads to a much more efficient algorithm.

Initially, processors of a row hold the inputs. Proof outline: For let processor hold input First broadcast to all processors of column Next, use a row broadcast from processor to send to all processors of row At this point, processor holds inputs and and sets a flag to 1 if and only if Input is the maximum if and only if each of the flags is 0; that is, their OR is 0.

At the other extreme, the binary tree algorithm of Theorem 2. We now generalize these results to an R-Mesh, for any algorithm running on an Let each processor in the top row of the R-Mesh hold an input. Output: Processor 0,0 holds begin if then use Lemma 2. Determine using Lemma 2. Let processor hold 2. Recursively find the maximum of the elements in the sub-R-Mesh formed by columns of the original R-Mesh. The top row of this sub-R-Mesh holds the local maxima, and Step 3 recursively finds their maximum.

Since Steps 1 and 2 run in constant time, the number of levels of recursion determines the running time of the algorithm. With denote the time to find the maximum of N elements on an let R-Mesh using Algorithm 2. From Lemma 2. Sub-R-Mesh Generation. Step 1 of Algorithm 2. By creating buses, an R-Mesh can also decompose into sub-meshes whose neighboring processors are not neighbors in the underlying mesh topology. With this ability, we can treat inputs as though they are in adjacent columns of the R-Mesh, a prerequisite for running the algorithm recursively.

Note also that insisting that the inputs be in actually adjacent columns necessitates moving the local minima to contiguous columns which would time. A major drawback of this approach to embedding mapping vertices to processors and edges to buses is that the embedded graph must be of degree at most 4. Sections 2. We employ this approach in an algorithm to rank a linked list.

List Ranking. Let L be a list of N elements. The rank of an element of list L is the number of elements preceding it in L. For example, the first and last elements of L have ranks 0 and N — 1, respectively. List ranking determines the ranks of all elements of a list. It is an important procedure, useful in converting a list into an array, a much more regular structure. Relocating the elements of a list according to their ranks transforms a list into an array with elements in the order of the original list.

Let list L contain elements in some order. Consider list element with successor Let and denote their ranks in the list. Our approach to list ranking hinges on the, rather obvious, observation that That is, the rank of an element can be computed by incrementing the rank of its predecessor if any. On the R-Mesh, this strategy takes the following form. For element assign a bundle of N row buses indexed corresponding Connect row buses and by to the N possible ranks of a column bus. This configuration guarantees that a signal on bus Since the precise value of is not known will also traverse bus a priori, the algorithm must be prepared to handle all possible values of In other words, the R-Mesh connects buses and for each The R-Mesh configures its buses in this manner for each be the first element element of the list and its successor if any.

By configuring buses as described above the R-Mesh guarantees that for and then sending a signal on bus receives the signal if and only if the any element of the list, bus rank of is Figure 2. Observe that this method generalizes the stair step approach to adding bits in Section 2. That algorithm steps down one row for each 1 in the input. The algorithm here steps down one row from one bundle to the next for each element of the list. In both cases, a signal reaches a given row if the count number of 1s, list rank is Algorithm 2. Pointer points if is the last element of L; otherwise, to the successor, of For processor holds element and pointer Processor also holds hat is set to 1 if and only if is the first element of L.

For processor broadcasts and where to all processors in columns 2. This ensures that each row resp. Processors satisfying the if condition simply connect the horizontal and vertical buses passing through them. In general, this signal reaches horizontal bus if and only if the rank of is Therefore, Step 4 finds the correct rank of each element. Since each step runs in constant time, we have the following result. Initially, N processors of a row hold the list. With techniques similar to those of Section 2. Graph Distance Embedding. The main idea of the list ranking algorithm is to suitably embed the list in the R-Mesh.

This embedding takes the form of a bus whose shape resembles the list and whose path through the R-Mesh points to the answer much like the algorithms of Section 2. Notice that the embedding does not break a bus at each vertex of the list, yet captures the notion of succession in the list. It does this by snaking different segments of the bus corresponding to different vertices of the list through different portions of the R-Mesh each corresponding to a position in the list.

This approach extends to distance-related problems in other graphs as well. Besides the approach used in this section and earlier in Section 2. In this section, we use the R-Mesh to embed connectivity information of a graph. Moreover, unlike the previous two sections, we now place no 47 The Reconfigurable Mesh: A Primer restrictions on the graph itself. We illustrate this technique through an algorithm for connectivity. Let be an N-vertex graph. For each row column of the R-Mesh, construct a bus spanning the entire row column. Let the bus Suppose vertex connects by an edge to in row represent vertex vertex and by another edge to vertex The algorithm connects row using the column bus. Processor of its ports.

Processor signal in Step 2. Similarly, the step places the E and W ports together, creating row buses. The if part of the statement causes processor to connect buses in row and column The clause in the condition of Step 1 ensures that buses in row and column are always connected. As noted cessor earlier, this bus configuration connects buses in rows and if and only if vertices and have a path between them; a formal proof is left as an exercise Problem 2.

In Step 2, the source indicates its presence on the bus of row note has configured it could write on any that since processor similarly looks for the signal to arrive at any of port. The bus configuration guarantees that it receives the signal if accordingly, processor and only if vertices and are connected in sets the flag in Step 3. Initially, the processors hold the corresponding bits of the adjacency matrix of the graph. Connectivity Embedding.

Compared to the graph embedding of Section 2. Second, the embedding applies to arbitrary graphs including directed graphs as explained in Chapter 3. The embedding of Section 2. It exploits the fact that the graph is a list and performs the embedding to facilitate quick computation of the rank by establishing buses for all N possible ranks of an element. Another noteworthy point about the method in this section is that it uses an entire row bus to represent a vertex and an entire column to connect row buses vertices.

Consequently, the bounded degree of R-Mesh processors does not curtail the degree of the embedded graph. For example, Algorithm 2. We illustrate this technique in an algorithm for adding N numbers on an R-Mesh. Data Dissection and Collection. On the other hand, some algorithms exploit specific knowledge of the input elements.

For instance, the problem of sorting N O log N -bit integers integer sorting has an O N -time sequential solution, whereas comparison-based sorting that makes no assumptions about the length or representation of the inputs requires time. As a result, resource bounds for R-Mesh algorithms often depend on both the number and size of the inputs. For example, the chain sorting algorithm of Section 2.

Conversely, after having obtained a dissected output, the R-Mesh has to collect the pieces and reconstruct the output in the conventional word format. In this section, we illustrate some ideas on data dissection and collection; a more detailed discussion of the topic appears in Chapter 4. We also present some algorithms for them.

The standard unary and binary representations are well known. The unary representation of a non-negative integer usually consists of a string with the rightmost bits set to 1. For our discussion we will represent with ones so that the representation of each non-negative integer including 0 has at least one 1. More formally, for integer the unary representation of for is where if and only if for each For example if then the unary representations of 0,1,2,3,4 are ,,,,, respectively.

We use the term word format to denote the conventional non-distributed binary representation with all bits local to a single processor. This gives a method for adding a number represented in Figure 2. We now consider the problem of dividing an integer by 2; that is, computing for a given integer Once again, it is useful to represent in distributed unary.

Both input and output are in distributed unary format in the leftmost column of the R-Mesh. Collecting Distributed Binary Bits into a Word. Specifically, let be a The Reconfigurable Mesh: A Primer 53 integer in distributed binary representation. The aim here is to collect the value of into a single processor. A simple method is to use the binary tree algorithm in Illustration 1 of Section 1.

We describe a constant time method here.

### Bestselling Series

Dynamic Reconfiguration: Architectures and Algorithms offers a comprehensive treatment of dynamically reconfigurable computer architectures and algorithms. Dynamic Reconfiguration: Architectures and Algorithms [Ramachandran Vaidyanathan, Jerry Trahan] on razewavi.ml *FREE* shipping on qualifying offers.

The input is distributed over a row of the R-Mesh. Proof outline: Without loss of generality, let processor where the bit of the input. The idea is for processor hold to compare with the bit in the binary representation of for all then Checking this condition amounts If to determining the AND of bits in a row of the R-Mesh, which, by Theorem 2.

Adding N integers. We now apply the ideas developed so far to design a constant time algorithm for adding N integers with The inputs and output are in the conventional word format. Let the binary representation of S be The algorithm must ultimately compute bits for each In what follows, we derive an expression for these output bits in terms of where and Key to this expression input bits is a recurrence relation for the carry generated by the addition of bits of the input numbers , in terms of the carry the first and the bits of the inputs This leads to a method in which a carry, given the carry and input sub-R-Mesh generates the bits for The R-Mesh cascades these carry generators for each to produce all the carries needed to compute the sum bits We now detail the various parts of the algorithm.

An Expression for the Sum Bits: For sum, as follows: For Clearly, let define the local denote the carry due to bits Therefore the final sum bit at position for is mod 2. Key to computing the bits of the sum are the quantities and depends only on the bits of the inputs, can depend on While of the inputs. The novelty of the following solution bits to the problem is in the use of the bus to propagate the information to construct the value of from bits Computing the Carry Bits: Observe first that since for equals the most significant the binary representation of and log N bits of the final sum S.

Next, use Lemma 2. Sub-R-Mesh using and the input bits is responsible for computing for all as explained above. By definition, obtains from see Figure 2. This carry For generation scheme starts signals from the W port of processor 0,0 of as and from the N port of each processor of that These signals flow seamlessly across the sub-R-Meshes, has as the algorithms implied by Lemmas 2. Generating the Sum: The input is in the word format.

Within sub-R-Mesh a processor simply extracts the required bit from the input. To of the final sum for sub-R-Mesh extracts obtain bit from column N see Figure 2.

The Let available R-Mesh suffices for this conversion as denote the value of the first bits of S. The last log N bits of S that are in distributed unary format; the R-Mesh can easily equal convert these to the word format as well. Computing gives the result in word format. Initially, the inputs are in the first N processors of a row of the R-Mesh. Function Decomposition. The principal idea in the preceding result is the cascading of several small R-Mesh solutions additions and divisons by 2, in this case to solve a larger problem carry propagation, in this case.

Indeed, some of the algorithms discussed so far in this chapter can be expressed conveniently in terms of function decomposition. Simply cascade functions for where if the input bit is 0, and if the bit is 1 see Figure 2. The function decomposition technique often relies on the ability of the R-Mesh to embed a function in the form of a bus from each point in the domain to its image in the range. Functions computed by concatenating other functions can themselves serve as building blocks for more complex functions.

Each such building block must, however, work in one step by providing a path to the solution, after the initial phase to configure buses. Although the literature on dynamic reconfiguration abounds with models, including variations of the R-Mesh, we will adopt the R-Mesh in most of this book for the following reasons. Simplicity: With the popularity of the mesh as an interconnection topology, the R-Mesh is simple to describe and understand. Indeed the definition of Section 2.

Of these, most published results use the R-Mesh sometimes with minor variations as the model of computation. Universality: Although the R-Mesh as defined in Section 2. Chapter 3 discusses models with these abilities. Assume that processor holds input bit 2. For Theorem 2. Running such an algorithm on an N-processor platform involves assigning vertices of the tree to processors. Therefore, the labeling of vertices plays a vital role in determining the communication requirements of the algorithm. Notice in this figure that if processors and with communicate in a step, then for all processor is not involved in a communication during that step.

### Top Authors

This permits a pair of processors to use a segment of the bus exclusively. As a result, the algorithm runs optimally in steps. Modify it to run with two data movement steps. For each 2. Design a constant time algorithm to move all N elements to row 0. Let be a bijection. For each of the following cases, design the fastest possible algorithm for permutation routing with respect to a P includes any one processor from each column. Is this bound tight? Consider the following partial function such that for all if is an integer; otherwise, is not needs defined. Consider an R-Mesh in which processor for to communicate with processor for each which is defined.

### Die Dagstuhl-Stiftung erhielt eine Spende von:

Find a lower bound on the order of data movement in constant time. Assume that the inputs are initially in row 0. Given an input sequence input bits of the algorithm , the automaton goes to the state indicative of their sum. Conversely, running an input stream of N symbols through the automaton of Figure 2. How does your solution relate if at all to the solution to Problem 2. Express the size of the R-Mesh in terms of the number of states, input symbols, and output symbols in the finite automaton.

For 2. The Reconfigurable Mesh: A Primer 63 2. A linear bus is defined to be acyclic if its connected component in the configuration graph is acyclic. Consider an acyclic linear bus between end points L and R. This bus is oriented if and only if each processor on the bus knows which of the at most two processors adjacent to it on the bus is closer to L. Prove that the general neighbor localization problem see definition in Problem 2. Why is it important for the linear bus to be acyclic and oriented for solving neighbor localization?

What sized R-Mesh would you use? What changes, if any, are needed How will this change the 2. Use this observation to prove that for any constant and R-Mesh can find the maximum of N integers in constant time. The prefix maxima of this array is the sequence where for Adapt the algorithm of Lemma 2. R-Mesh can find the 2. Prove that a distributed sub-R-Mesh with congestion C can operate as an R-Mesh in which each broadcast on the bus requires at most C steps. What assumptions, if any, are needed about the embedding?

Characterize sub-R-Meshes in this category. Unless specified otherwise, all edges could be active in a step. If a model steps, then can run an arbitrary communication step of is said to admit a emulation of in in a host graph a An embedding of a guest graph consists of i a mapping of vertices and ii a mapping of each edge of to vertices of of to a path possibly of length 0 between vertices and of The dilation of the embedding is the length of the longest path of to which an edge of is mapped.

R-Mesh admits a 1-step em b Prove that an ulation of an N-vertex balanced binary tree. Again use the smallest R-Mesh possible. Modify the algorithm to determine the first element. The new algorithm must run within the same resource bounds as the original one. Your algorithm must run in constant time on an R-Mesh. View the list as an N vertex, N — 1 edge, directed graph, and let the diagonal processors of the R-Mesh represent these vertices. Your embedding should generate a single bus that traverses the diagonal processors vertices in their order in the list.

The Reconfigurable Mesh: A Primer 67 2. For a constant time solution, use an R-Mesh. Will the algorithms of Section 2. For let be integers. Suppose function can be embedded in an R-Mesh. The two-dimensional R-Mesh model of Section 2. Many of the techniques presented in Section 2. Here we point to some of the sources for these techniques.

Subsequently, other results were developed for various routing scenarios [59, , , , , , , , ]. In their paper on adding bits, Nakano and Wada [] provided a chronological listing of results for this problem. In a later work, Bertossi and Mei  improved on the result of Corollary 2. Although we presented the exclusive OR algorithm as a corollary to the modulo addition result, Wang et al. Vaidyanathan [] applied neighbor localization, a generalization of the bus-splitting technique of Miller et al.

The neighbor localization problem itself was first defined by Hagerup []. The chain sorting algorithm of Section 2. Hagerup [] provided randomized PRAM algorithms for chain sorting. The underlying idea of the maximum finding algorithm of Section 2. The more general result of Theorem 2. Olariu et al. Similar results also exist for other reconfigurable models [, ]; Problems 2. Hayashi et al. It is particularly relevant toward separating the capabilities of different R-Mesh versions. Chapter 9 explores this issue; bibliographic notes at the end of Chapter 9 provide references on this topic.

Wang and Chen [] proposed the connectivity approach of Section 2. Chen et al. Jang and Prasanna [] and Bertossi and Mei  detailed the various number representation formats and methods to convert among them; the names of the representations used in this book are different from theirs, however. Schuster [], Trahan et al. Dharmasena and Vaidyanathan [81, 82, 83] discussed binary trees and their labeling Problem 2. Leighton [], among others, described details of the concept of embedding one graph in another Problems 2. This model suffices for most of the discussion in this book.

Also, other reconfigurable models often provide a different and sometimes equivalent view of dynamic reconfiguration than the R-Mesh that can help in understanding dynamic reconfiguration. This chapter deals with these variations on the R-Mesh idea. Then we briefly describe other reconfigurable models, restricting our discussion to their salient features; subsequent chapters will discuss some of these models further.

Although this chapter introduces the reader to several dynamically reconfigurable models, the coverage is not comprehensive. Bibliographic notes at the end of the chapter point to some of the models not discussed here and to references for further reading. What is the cost of permitting the model to have buses of arbitrary shape? What is the implication of using processors of very small word size? Does an extension of the R-Mesh idea to higher dimensions add to its computing capability?

These questions raise important issues that impact the cost, power, and implementability of reconfigurable models. Specifically, we discuss five additional facets to the reconfigurable mesh: 1. Restrictions on the structure of the bus, Bit-model constant word size R-Mesh, Concurrent and exclusive bus accesses, Higher dimensional reconfigurable meshes, and Directed buses. The first two represent restrictions of the R-Mesh model of Chapter 2, while the rest enhance the model. These considerations are, for the most part, independent of each other; for example, it is possible to have a reconfigurable mesh capable of concurrent writes, with or without directed buses.

Chapter 9 uses simulations to relate some of these models and conventional models of parallel computation such as the PRAM. In this section we describe derivatives of this model that place restrictions on the structure of buses. Section 3. If an algorithm for a problem of size N is designed to run in T steps on a model with P N processors, then for M represents the slowdown factor resulting from the use Models of Reconfiguration 73 of fewer processors, and F N, M represents an overhead. All known methods for scaling arbitrary algorithms on the unrestricted R-Mesh have an overhead, F N, M , that depends on N.

As discussed below, restricted R-Meshes fare better in this respect. An HVR-Mesh permits only the port partitions and within each processor. This requires a bus in the HVR-Mesh to lie entirely within a row or a column, so that a bus is representable as a horizontal or vertical line see Figure 3. This restriction severely curtails the power of the model. Moreover, algorithms for it scale with optimal constant overhead. Figure 3. Note that the LR-Mesh allows a bus to be a simple cycle as shown in the southeast corner of Figure 3.

In fact, all algorithms connectivity Algorithm 2. As stated, the algorithm for list ranking Algorithm 2. That is, the FR-Mesh can simulate any unrestricted R-Mesh step in constant time, although with a polynomial blowup in the number of processors. Whether the LR-Mesh can for solve this problem in constant time is a longstanding open problem. Algorithms on the FR-Mesh also scale well. Although the overhead, F N, M is not constant, it is independent of the problem size N and depends only on the available machine size M.

Tree R-Mesh. Unlike the restrictions described so far, no set of allowed port partitions adequately describes the Tree R-Mesh; in fact, it is possible for a Tree R-Mesh to have processors with all fifteen possible port partitions. In fact, it is possible to prove that every Tree R-Mesh algorithm can run on an LR-Mesh without loss of speed or efficiency.

Therefore, all Tree R-Mesh algorithms also scale optimally. Individually, the LR-Mesh is not known to have a constant time solution to some graph problems that the FR-Mesh can solve in constant time. On the other hand, the LR-Mesh can solve in constant time many other problems using far fewer processors than the FR-Mesh. From this point of view it is sometimes useful to think of R-Mesh restrictions as algorithmic rather than hardware restrictions. Since many algorithms use processor addresses as data for instance in chain sorting , it is also customary to assume the bus width to be the same as the processor word size; this ensures constant time movement of one word of data.

This model is called the word model. We now describe a restricted version of the R-Mesh called the bit model.

## Dynamic Reconfiguration: Architectures and Algorithms

The bit model of the R-Mesh restricts processor word size and bus width to be constants. The bit model is useful in designing special purpose hardware for a small suite of applications. It has also been suggested as a basis for designing asynchronous reconfigurable circuits. Impact on Algorithm Design. Constant word size and bus width have important implications for designing algorithms on the bit-model R-Mesh.

In the following, we assume the bit model to have a nonconstant number of processors. In constant time, it can only distinguish among a constant number of sets of processors such as those within a constant distance on the underlying mesh, or those flagged by a constant number of bits. Self addresses: One of the most common assumptions about any model of parallel computation is for each processor to have its own address. In a word-model R-Mesh, a processor can use this information in many ways such as determining its position relative to the diagonal or R-Mesh border, or identifying itself to a neighbor in neighbor localization.

Clearly, assuming knowledge of processor self-addresses is not valid for the bit model. For example, if the algorithm requires all processors on the to execute a step, then it can do main diagonal with indices so by checking a hardwired flag. Algorithm form: In the word model, a non-constant time algorithm can assume a recursive or iterative form. Iterative algorithms on the bit model must ensure that the terminating condition can be evaluated in constant time.

Global Bus Configurations: Among the data movement algorithms of Chapter 2, row and column broadcasts are clearly possible on the bit model. Routing an arbitrary permutation poses a problem, however, as the index of the destination processor would be too large for the bit model. The binary tree algorithm Algorithm 2.

This is because a processor requires its address and the iteration number both non-constant to determine its function during the iteration. Models of Reconfiguration 77 On the other hand, the algorithm for finding the OR of N bits runs on the bit model. Indeed, the bit-model R-Mesh can run all instances of neighbor localization in which neighbors that may be arbitrarily far apart on the underlying one-dimensional R-Mesh communicate constant sized data rather than processor indices.

Similarly, the maximum finding algorithm implied by Lemma 2. The algorithm of Theorem 2. These algorithms use the input to configure a bus that carries a 1-bit signal to the answer. The bit model can also run connectivity. For example, chain sorting requires an output involving log N-bit pointers.

For these problems, a solution may be possible if the output is represented in distributed format see Section 2. The Power of Bits. Relationship to Other Restrictions. Since the word-size of an R-Mesh is independent of the restrictions in Section 3. Of these, the ERCW model has found little, if any, application in dynamic reconfiguration, so we will not consider it further.

In most implementations of a bus, there is little advantage to restricting bus reads to be exclusive. We will not discuss the EREW model further until Chapter 9 that deals with the relative powers of different models. In the CRCW model, there is no restriction on the number of simultaneous accesses to the bus. When several processors attempt to write simultaneously to a bus, a write rule defines the bus value that a processor reading from the bus would obtain.

We will consider the following write rules, which are well known in the context of the PRAM. Therefore, this rule only permits a reader to detect whether a concurrent write has occurred. When writers are all writing the same value, then this value is the bus value; otherwise, the bus value is the collision symbol. Note that an arbitrary writer could succeed possibly changing each time the algorithm is executed.

Therefore algorithms using this rule should work for any choice of writer. We will employ these rules, however, as algorithmic conveniences, using other write rules to simulate them on reconfigurable models. Models of Reconfiguration 79 In Section 3. Such simulations permit algorithm design on the R-Mesh using the most convenient write rule, without getting caught up in details of their implementation. In some situations, this approach can even speed up the resulting algorithm see Problem 8. Relationship to Other Models.

That is, each restriction can access its buses exclusively or concurrently. Therefore, this rule applies to the bit model as well. For R-Mesh has an underlying sional mesh. If is a constant, then the Partitioning, titioning its set of Communication, and Computation phases see Section 2. Universality of the Two-Dimensional R-Mesh.

For most cases, the size of the simulating R-Mesh is optimal. We now present a part of this result for running three-dimensional R-Mesh algorithms on a two-dimensional R-Mesh; Section 9. For the most part we will use a two-dimensional R-Mesh, occasionally expressing ideas in three dimensions before converting them back to two dimensions.

Priority Simulation. Part a of this figure shows a sample bus with 9 writing ports. Although the example shows only one bus with multiple writes, the underlying ideas apply independently to all such buses. For clarity, Figure 3. The first iteration, that is, iteration solves priority based on only the most significant digit, of the ports. The Local Priority Problem.

For subset of the ports of and let be the value of the smallest digit of the ports in the local priority problem is to find set Given and of all ports with this smallest value of the digit. For the example of Figure 3. Given For digit the center digit , the smallest is to determine set with this smallest value for digit 1 value is 0.

The set of elements of is The Role of Local Priority. Subsequently, we will derive a solution to the local priority problem itself. The initial set of and consists of all writing ports. Iteration where active ports is solves the local priority problem for index and the current to determine the set Set becomes and active set, is the set of active ports for the next iteration if any. Since we obtain the following result. LEMMA 3. Solving Local Priority. We now establish that an R-Mesh, can solve the local priority problem in constant time.

Specifically, the input to the problem at hand is an index where and a set of active ports. Therefore, for a given instance of the local priority problem, and are fixed. The output of the algorithm is the set described by Equations 3. Assume that indicates membership in set holds all necessary information about each sub-R-Mesh including the bus configuration and active port indices. This is one aspect in which using a three-dimensional simulating R-Mesh simplifies the algorithm design.

Step 1: For each configure exactly as Step 2: In each sub-R-Mesh, each active port writes a signal The purpose of this write to its bus if and only if its digit digit is to indicate the presence of at least one active port whose Concurrent writes in this step use the same valued has the value signal and pose no problem on the COMMON CRCW R-Mesh. During of each reads from its bus. Thus, port flags itself as an element of if and only if Figure 3.

All active ports on all four sub-R-Meshes read from the bus. Since Since and ports 1 and 3 are members of are both larger than the minimum digit, ports 5 and 9 are not elements of That is, LEMMA 3. We now use the example of Figure 3. For Iteration 2 see Figure 3. Therefore, all of whose ports have a 0 Iteration 0 selects subset as the most significant digit digit 2. As explained earlier, Iteration 1 see Figure 3. Putting it All Together. As noted at the start of the algorithm, we restricted our discussion to only one bus of the simulated R-Mesh This restriction is without loss of generality, as we now explain.

Let denote the bus of that we considered in our description of the algorithm. Let the corresponding buses of sub-R-Meshes Models of Reconfiguration 85 be It is easy to verify that the algorithm uses a port Since the of only if it traverses one of the buses algorithm uses only the copies of and the edges in the third dimension among them, the algorithm readily extends to all buses of the simulated From Lemmas 3. Converting to Two Dimensions. With Theorem 3. The dimensionality of the R-Mesh alters only the topological properties of the underlying mesh and, therefore, is independent of word size and bus access rules.

In other words, a write to a bus can be read from any port incident on the bus. The variant of the R-Mesh that we describe here Directed R-Mesh or DR-Mesh allows control of the direction of information flow in each segment of a bus. For example, in the non-directed bus of Figure 3. On a directed bus Figure 3. The six unshaded circles have no directed path from the source of the information. On the Models of Reconfiguration 87 other hand, some points that receive the information have multiple paths from the source.

Much of the motivation for the directed model stems from the observation that, in practice, fiber-optic buses allow directional propagation of data and electronic buses with active components on them are directed. The directed model also admits elegant solutions to some problems, notably for directed graphs, and offers some theoretical insight. To make the definition of a DR-Mesh precise, we will assign directions to the ports of processors.

An incoming port is only permitted to bring information into a processor and an outgoing port takes information out of a processor. Each processor of a two-dimensional DR-Mesh has four incoming ports and four outgoing ports, with adjacent ports externally connected by directed external edges as shown in Figure 3.

Within the same block of the port partition, all incoming ports connect to all outgoing ports. The block allows information written to or entering from port but not vice versa. In to travel to port travels out of the block information arriving at port processor through both ports and Block presents a more interesting situation. If information arrives at both and as a result of a single write that reaches and by different paths , then we treat this at port as an exclusive write.

Consequently, the DR-Mesh passes this information on to all output ports in the block is the only one in our example. In the CRCW model, an additional case can could receive information from multiple sources arise in which port itself. Here, the concurrent write rule see including a write to port Section 3. Suppose that this processor assumes the partition shown in Figure 3. Let the processor write values to ports respectively, with no writes to the remaining and the processor does ports.

1. Beginning Programming All-In-One Desk Reference For Dummies;
2. Handbook of Cultural Health Psychology.
3. Analog ai chip;
4. In the Church (A léglise), op. 3, no. 8 (piano)?
5. Publisher Policy.

The values entering port Consequently, the value leaving sulting from values and at port is the collision symbol. The write rule that the DR-Mesh uses to resolve concurrent writes determines this combination. It is generally assumed that the Models of Reconfiguration 89 value read is the value leaving the port. Problem 3. To give an example in which directionality of the buses plays a critical role in an algorithm, we now present a DR-Mesh algorithm connectivity for solving the reachability problem, the counterpart of directed graphs.

Let be a directed graph, and let and be two of its vertices. Given the adjacency matrix of and vertices and the problem is to determine if has a directed path from to As in the connectivity algorithm Algorithm 2. The DR-Mesh configures its processors so that each column has two buses one in each direction running the entire length of the column, and each row has a bus directed leftward from its diagonal element and a bus directed rightwards from its diagonal element see Figure 3.

Note that from vertex if and only if processor all vertices of the graph could simultaneously check their incoming ports to determine whether they are reachable from THEOREM 3. Relationship to Other R-Mesh Variants. The ideas of word size and higher dimensions are completely independent of directed buses. Concurrent bus access is impacted by directedness as two ports on the same bus may receive information from different sets of sources. In fact, the reachability algorithm runs on a directed FR-Mesh.

On the contrary, we limit the discussion here to representatives from some important classes of reconfigurable models. The bibliographic notes at the end of this chapter point to many other models. Models of Reconfiguration 3. It consists of a set of processors connected by external edges according to some underlying connected graph. Each processor can internally partition its ports that connect it to its neighbors in the underlying graph to form buses as in the R-Mesh.

Indeed, the R-Mesh is a special case of the RN in which the underlying graph is a mesh. Like the R-Mesh, the RN has variants based on word size, bus shapes, bus accesses, and directedness; indeed, many of the R-Mesh variants described in Section 3. Problems 3. Some R-Mesh solutions, on the other hand, use all or a large number of processors simultaneously.

Section 9. A typical one-dimensional optical model has the structure shown in Figure 3. It consists of a linear arrangement of processors connected to a U-shaped optical bus structure comprising a data bus and buses for addressing. Information traverses this bundle of buses in one direction. Processors write to the buses at the transmitting segment and read from the receiving segment. We describe these features below. Information Pipelining. Let the optical delay on the bus between any pair of adjacent processors be the same and denote this delay by Consider the case where processor writes to the bus at time can accurately ascertain that the above write The next processor will not reach its own write port until time Processor can therefore write to the bus at time simultaneously with processor provided that the written signal stream of optical pulses is less than units in duration.

Letting denote the time unit corresponding to one then processor can write optical pulse, this means that if a message bits long. This idea readily extends to allow all processors to simultaneously and independently write to the bus. If each write is short enough to not overlap with the write from the previous processor, then multiple signals can travel in tandem along the bus. This ability to pipeline data endows optical models with large data movement capacity see Figure 3. Coincident Pulse Addressing. The second feature of optical models that distinguishes them from other models is coincident pulse addressing.

This addressing scheme uses two buses, the reference and select buses, that carry independent pulses. When these pulses simultaneously reach a processor at its receiving segment that is, coincide in time at the processor , then the processor reads from the data bus at that point in time. We now illustrate the use of coincident pulse addressing through a constant time algorithm to add N bits.

Before we can proceed with the algorithm, we must introduce a new feature to the optical model described so far. The model has the ability not shown in Figure 3.

## Dynamic Reconfiguration: Architectures and Algorithms

Each of the fixed and programmable delays is of duration. Models of Reconfiguration 95 Let each processor hold an input bit; for the illustration in Figure 3. The objective is to add the input bits. Each processor with a 1 introduces a delay in its transmitting segment of the select bus. Assuming units when input, the select pulse lags behind the reference pulse by it reaches the U-turn between the transmitting and receiving segments.

In the receiving segment, the select pulse gains on the reference pulse, units per processor, due to the fixed delays in the reference and data buses. It coincides with the reference pulse at processor which reads the signal issued by processor 0. Conversely, if processor receives the signal, then the sum of the input bits is where Figure 3. In general, optical models employ pipelining and the coincident pulse technique, often simultaneously, to solve many complex problems. Some models possess additional features, such as the ability to segment the bus.

Higher dimensional optical models also exist. Though reconfigurable models and FPGA-based systems have evolved relatively independently, there is common ground. In models such as the R-Mesh, a global configuration could result from a set of relatively independent, local configurations, made on the basis of local data. A typical FPGA, in contrast, requires reconfiguration information to be supplied to it from an external source. This makes the configuration speeds of FPGAs pin-limited. By loading hardware specific to each phase, the FPGA can often perform the entire computation faster than with a single static configuration, even accounting for the time cost of reconfiguration in an FPGA.

This obstructs finely adapting hardware resources to a given problem instance. Selfreconfiguration is neither pin limited nor is it restricted to using only compile-time information. More details appear in Chapters 8 and 9. For a problem of size N and for some constant elements processors, nomially bounded instance of a model has wires, gates, etc. In the following definitions, we loosely use the term to refer to a polynomially bounded instance of model model-instance is as powerful as model if, for every problem that Model can solve, there is an instance of that can solve an instance of the problem as fast as Model is more powerful or model can solve at least one is less powerful if, in addition, an instance of problem faster than Figure 3.

The well known PRAM model is as powerful as often more powerful than conventional point-to-point network models such as meshes, hypercubes, etc. Indeed, for reconfigurable models, concurrent writing ability does not add to their power. Unlike the PRAM, where concurrent writing ability adds to the power, reconfigurable models with the ability to read concurrently can emulate concurrent writing ability. Rather surprisingly, the ability to fuse buses is more powerful than the ability to segment buses.

Indeed, if the model is permitted to fuse buses, then the ability to segment adds nothing to the power. The LR-Mesh that restricts its buses to be linear is provably more powerful than models with only segmenting ability. Because of data pipelining, optical models restrict their buses to be linear and acyclic. The data pipelining itself allows for algorithms to run efficiently, but does not contribute to the power of the model.

Therefore, optical models also occupy the same place as the LR-Mesh in the power hierarchy. Directed models occupy the next tier in the hierarchy. In fact, it can be shown that the directed and non-directed LR-Meshes are of the same power. It should be noted that though optical models use unidirectional buses, they are really not directed models as the buses cannot select the direction of information flow in parts of the bus. Assuming the standard set of constant-time arithmetic and logical operations including addition, subtraction, multiplication, division, bitwise AND, OR, NOT, Exclusive OR , the word model is only as powerful as its bit-model counterpart.

Similarly, beyond two dimensions, the dimensionality of the model does not contribute to its power. Although reconfigurable models are, in general, more powerful than conventional ones, their power is not unlimited. They will give the reader a better handle on the underlying reconfiguration ideas in the remaining chapters that deal with more narrowly focused topics. Two buses are different if the corresponding verse all components of the configuration graph see page 20 of the LR-Mesh are different. Your algorithm should flag processors in row if and only if the input number has value 3.

Will the simulation work for other concurrent write rules? Derive an expression for the R-Mesh.

This R-Mesh size is based on a general simulation algorithm. Determine the port partitions needed to effect these connections. How many partitions are possible if no two incoming ports can be in the same block of the partition? Under this rule, a single write arriving at a port by different paths is considered a concurrent write. Consider a DR-Mesh in which reads are entry consistent; that is, a port reads the value entering the port. What variant of the RMBM would you use? Hint: Use Problems 3. It has three data input bits, three data output bits, one state input bit, and a rotation output bit.

The inputs to the shift switch are produced by a processor that controls the switch. In shift switch has input and output bits and shift general, a elements. Each switching and rotation element can be configured to connect its input s to output s in various ways. The contents, of the 1-bit state buffer determine the internal configurations of the switching and rotation elements. In swich configurations in states the shift switch connects input to output where state In state the switch has the effect of shifting the input by one bit.

The rotation bit could be fed back to achieve a rotate. How long would your method require?

### Recommended for you

The experimental results using the CRUSADE algorithm shows that significant cost savings are realized by using dynamic reconfiguration of programmable devices with a number large real-life field examples as compared to without it. Each tuple is picked and its merge is explored by creating multiple modes for PPE and followed with scheduling and finish time estimation. Each element of the merge array has a tuple that specifies a pair of PPEs that can be merged into a composite PPE with multiple modes resulting from dynamic reconfiguration. Each of these modes requires a unique configuration software also known as a software image. Figure 4. This approach works well on sequential resources.