Journal article
Processor-efficient sparse matrix-vector multiplication
Computers & mathematics with applications (1987), Vol.48(3), pp.589-608
2004
DOI: 10.1016/j.camwa.2003.06.009
Abstract
The matrix-vector multiplication operation is the kernel of most numerical algorithms.Typically, matrix-vector multiplication algorithms exploit the sparsity of a matrix, either to reduce the time taken or the memory used. In the case of parallel implementations of numerical algorithms, the sparsity of a matrix can also be used to reduce the number of processing elements (PEs) required. For example, in Leiserson's systolic array algorithm for matrix-vector multiplication the nonzeros in the matrix are partitioned into bands and each band is fed into a different PE. Similarly, in Melhem's algorithm, the nonzeros are partitioned into stripes and each stripe is fed into a distinct PE. However, in many practical applications, such as the numerical solution of partial differential equations, a matrix is much sparser than is indicated by its bandwidth or by its stripe width.
A new measure of sparsity of a matrix called
staircase width is introduced. A
staircase in a matrix
M = (m
i,j)n×n
is a set of matrix positions
S = {(i, j) | m
i,j ≠ 0, such that no position in
S is strictly to the northeast of another position in
S. The
staircase cover of a matrix is a partition of the positions of nonzero entries in the matrix into staircases and the
staircase width of a matrix is the size of a smallest staircase cover of the matrix. The staircase width of a matrix is never larger than the bandwidth or the stripe width of a matrix. Moreover, it is significantly smaller than either for some commonly occurring types of matrices such as arrowhead matrices. Experiments with sparse matrices derived from a variety of engineering problems suggest that, in practice, the staircase width of a matrix is about half the stripe width of the matrix. An efficient algorithm to compute a minimum width staircase cover of a matrix is presented. Staircase covers are used as the basis of a new parallel algorithm to compute a matrix-vector product on a linear systolic array of PEs. This algorithm uses as many PEs as the staircase width of the matrix, and therefore, saves on the number of PEs relative to algorithms of Leiserson and Melhem. It is shown that despite using fewer PEs, the algorithm uses no more time steps than Leiserson's or Melhem's algorithm.
Details
- Title: Subtitle
- Processor-efficient sparse matrix-vector multiplication
- Creators
- L.S Heath - University of IowaC.J Ribbens - University of IowaS.V Pemmaraju - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Computers & mathematics with applications (1987), Vol.48(3), pp.589-608
- DOI
- 10.1016/j.camwa.2003.06.009
- ISSN
- 0898-1221
- eISSN
- 1873-7668
- Publisher
- Elsevier Ltd
- Language
- English
- Date published
- 2004
- Academic Unit
- Computer Science
- Record Identifier
- 9984259437302771
Metrics
15 Record Views