# EETE OCTOBER 2012

DESIGN & PRODUCTS DIGITal SIGNal PROCESSING Calculating N matrices in parallel in order to improve the core utilization we can calculate N matrices in parallel. the matrices are ordered in the following manner as shown in example 6: Example 4. SC3900 code to compute a0(ei – hf) and a1(ei –hf). Inv(detA) calculation inv(detA) is calculated using the following equation: the determinant norm is used to determine the scale factor and to scale detA. the last stage is to multiply each cofactor value by inv_det in order to get invA value. example 6. sC3900 code for N matrices inversion Hermitian matrix inversion In many cases, the A matrix is Hermitian. One example is the N matrices are ordered as a three-dimensional array, the minimum mean square error (mmse) equalizer in a 4x4 Lte where the last dimension is the matrix number. We can assume MIMO receiver. A Hermitian matrix (or symmetrical) is a square that N is a multiple of four. thus, we can calculate four matri- matrix with complex entries that is equal to its own conjugate ces inversion in parallel. the code can be unrolled by four. the transpose – that is: instruction LD.4L which loads four 32-bit elements can be used in order to load four complex elements. the sC3900 has wide data buses that enable loading two 256-bits in parallel to the four DmU instructions in a single cycle. thus, reading data from The inverse of a Hermitan matrix is Hermitian as well. In this memory is done seamlessly, without affecting the DMU utiliza- case, only the upper part must be calculated and the diago- tion. nal is real. We can achieve significant optimization by taking advantage of these attributes. For 4x4 matrix, only ten elements should be calculated instead of sixteen. Furthermore, the determinant of a Hermitian matrix is real thus the inversion is straightforward. 1/detA calculation can be calculated efficiently using the dedicated SC3900 recipro- cal instructions. scaling is performed using the instructions CLB.LFT.X which counts the leading bits and ASH.LFT.X which performs arithmetic shift left in order to scale the determinant to the range 0.5 – 1. the reciprocal instructions reCip and mACm.sU.x perform the inversion in an efficient manner, and give a throughput of 2 inversions in a cycle. example 7. sC3900 code after loop unrolling Further optimization Further optimization may be used if we need to solve the linear equation y = Ax. in this case, inv(A) is not required. instead of multiplying each cofactor element by inv(detA), calculate (transpose(C) * y) and then multiply by inv(detA), the following equation can be used: example 5. sC3900 code to scale det and calculate inv_det 20 Electronic Engineering Times Europe October 2012 www.electronics-eetimes.com

EETE OCTOBER 2012