real-time simulation of power system stability using parallel digital signal processors

ELSEVIER Mathematics and Computers in Simulation 38 (1995) 283-292

~ i MATHEMATICS AND

COMPUTERS N SIMULATION

Real-time simulation of power system stability using parallel digital signal processors

M. Lavoie a,* V. Qu6-Do h, J.L. Houle c, j. Davidson d

a E c o l e de Technologie Sup~rieure, 4750 Henri-Julien, Montrdal, Que, Canada H2T 2C8 b Institut de Recherche d'Hydro-Qu~bec, Varennes, Que, Canada J3X 1S1

c Ecole Polytechnique de Montreal, Montreal, Que, Canada H3C 3A 7 d Universit~ du Quebec ~ Montreal, Montreal, Que, Canada H2T 2C8

Abstract

An electrical power system usually includes machines with their regulators and controls linked through power lines, transformers, capacitor banks, reactors, and loads. In stability analysis, two distinct strategies are possible for solving the differential equations and network equations describing such a system: a simultaneous approach and a partitioned approach. For real-time numerical simulation, the partitioned approach is easier to distribute over parallel processors [2,11] and is therefore more appropriate in this case. In order to demonstrate the feasibility of the real-time simulation of power system stability, this paper presents an implementation of the partitioned approach using parallel digital signal processors (DSP). A network of typical size consisting of 8 machines and 50 buses is simulated. The synchronous machine model has been developed using the block diagram implementation technique. Two algorithms applied to the solution of the network algebraic equations are described and their performances are evaluated on different processors.

Keywords: Numerical algorithms; Parallel processing; Real-time simulation; Transitent stability; Numerical stability; Digital signal processors

1. Power system stability equations

An electrical network usually includes machines with their regulators and controls linked through power lines, transformers, capacitor banks, reactors, and loads. In an electrical network, the dynamic behavior of machine regulators and control systems (static compensators, dc-ac converters) are represented by the following non-linear differential equations:

dx d--t =F(x, u), y=G(x, u) (1)

* Corresponding author.

0378-4754/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0378-4754(94)00038-L

284 M. Lavoie et al. / Mathematics and Computers in Simulation 38 (1995) 283-292

where x is the state variable vector, u is the input vector, y is the output vector and F and G are generally non-linear functions.

Simplification around operating points produces the following system of linear differential equations:

dx a t =Ax + Bu(V), y = C x + Du(V) (2)

where F is the vector of node voltages, A and C are state matrices, and B and D are the input coefficients.

The linking network is modeled by

YV=I(x , V), (3)

where Y is the admittance matrix and I the vector of incoming node currents.

2. Simultaneous vs partitioned approach

The solution of (2) and (3) by digital computers requires that (2) be digitized and then integrated using an algorithm of the form [1].

h Xk=Xk_ 1 + -~[ A(x k + Xk_l)+ B(uk(Vk) + U k _ l ( V k _ l ) ) ] , (4)

where h is the integration step and k is the time step such that t = kh. The digitizing process transforms the differential equation into a set of simultaneous algebraic equations. Two distinct strategies were thereafter applied to solve (2) and (3): a simultaneous approach and a partitioned approach.

In the simultaneous approach, (3) and (4) are combined and solved as a single set of simultaneous equations. The advantage of this method over the partitioned approach is that no delays are introduced between the solutions of (2) and (3). The solutions are therefore generally more accurate and numerically more stable. The main disadvantages are the complex- ity of the algorithms considering the large number of equations involved and the difficulty in distributing the equations among parallel processors.

In the partitioned approach, (2) and (3) are kept distinct and solved independently. Here, the results of (2) are the inputs of (3) and the results of (3) are used by (2). The advantages are that (2) and (3) may be solved during the same time period by parallel processors. Furthermore, machines of different nodes are considered uncoupled and they can be modeled using additional parallel processors. The main disadvantages are that the numerical solutions may be less accurate and the algorithm may be numerically less stable due to the delays introduced by alternately solving (2) and (3). These disadvantages become less significant as the time step is reduced. The computation load is thus increased but parallel processing ultimately produces a net gain in speed.

M. Lavoie et al. / Mathematics and Computers in Simulation 38 (1995) 283-292 285

This paper presents the real-time simulation of power system stability using the partitioned approach implemented in parallel digital signal processors.

3. Methodology

A subnetwork composed of generating machines and a transmission system is modeled using interconnected blocks. As shown in Fig. 1, each machine block receives the input current Imi and produces a voltage Vmi. DSP #2, representing the transmission network, receives currents from the machines in DSP #1 and computes the bus voltages according to (3). Imi is then given by

Vmi- Vbi ]me- (5)

gSli

where Vmi is the internal voltage of machine i, Im~ is the current of machine i injected into the transmission network, Vbi is the voltage of bus i, X , is the machine armature reactance and i indicates the machine index.

The start-up procedure of real-time stability simulation is as follows: • A steady-state load flow is pre-calculated, off-line. This establishes the initial bus voltages

as well as the active and reactive power (P/ and Q~) of each machine. • An initial load equivalent to P~, Q~ is thus connected to each machine not yet connected to

the network. Voltage regulators and governor references are set such that the machine will provide the necessary power (P~, Qi) and the desired terminal voltage Vai. Numerical proce- dures are used to bring all machines to the steady-state condition quickly.

• Abruptly all machines are disconnected from the load (Pi, Qi) and are switched to the network. Transients should not occur since the currents from each machine remains constant, as if it were still connected to the load (Pi, ai), and the terminal voltage remains Vbi.

• When all system components have reached their steady-state condition, switching can be applied to the network. Switching is produced by suddenly changing one element in the Y matrix.

Solution ofYV= I

of the transmission net-

~ work

D~P,~ D~Pn2 Fig. 1. Real-time implementation to simulate the power system stability of a network.


4. Real-time implementation

The real-time implementation of a test-case subnetwork including 8 machines and 50 buses requires two DSPs: one to solve (3) for the 50 buses of the test transmission system and the other to simulate 8 complete machine models. The bus voltage equations are solved using Gaussian elimination accelerated by the sparse matrix technique [1,3,7]. Each machine is modeled according to the block diagram of its various components: generator, excitor, stabilizer, turbine, and governor. Import-export routines allow an exchange between the two DSPs of the machine currents I,, i and the bus voltages Vbi at every time step.

This prototype demonstrates the feasibility of the real-time simulation of power system stability on a typical network and provides an experimental workbench as well as expertise for solving the stability problem for much larger networks. The algorithm was first developed and implemented on a dual TMS320C30 DSP processor. The printed circuit-board is a single-slot 6U VME motherboard containing 4 Mbytes of global memory accessible by the two DSPs and through the VME bus. Each DSP operates at 33 MHz and has access to 64K words of shared multi-access global RAM and 8K words of local RAM.

In phase two, the binary code was ported on a dual TMS320C40 digital signal processor configuration with the following characteristics:

• single-slot 6U full size VME board; • two 40-MHz TMS320C40 DSPs at 40 MOPS and 80 MFLOPS of computing power; • up to 5 Mbytes of zero-wait-state SRAM and 32 Mbytes of shared multi-access global

DRAM; • 6 high-speed interconnection parallel ports; • data transfer rate up to 20 Mbytes /sec per parallel port. This two-phase strategy was adopted in order to take advantage of the stable development

support available on the C30 hardware. These tools are currently being ported on the C40 hardware. A great deal of work has been necessary to bring the C30 software support to the C40 environment.

5. Network algorithms

The network matrix used by the real-time system originates from a network declaration file in ST600, the off-line stability simulation program used by Hydro-Qu6bec. This program has the capability to produce a MATLAB-compatible file containing the admittance values of the matrix to be solved. This compatibility has allowed continuous comparisons of the results produced by the real-time algorithms and those produced by MATLAB.

This test network was chosen because it is considered to be representative of typical networks found in the Hydro-Qu6bec power system. It consists of 50 buses and 8 synchronous machines. Fig. 2 represents the distribution of the 166 non-zero (nz) elements of the test matrix as produced by MATLAB.

The algorithm used for solving the network equations (3) is an adaptation of those described in [8] to complex matrices. The performance results were measured on some modern processors and reflect the speedup obtained after application of basic sparse-matrix techniques.


0 \ " , ' .

% * •

20 , I; . . . ~ . ~ . ¢ . . . . . . . .

40 ,*$. *.* * $~

50 . t .t 0 2 0 4'0

nz = 166

Fig. 2. Element distribution in test admittance matrix.

The two algorithms are modified Gaussian elimination (MGE) followed by backward substitution, and lower and upper (LU) decomposition followed by forward and backward substitu- tions. It has been known for some time that computing speed requires memory space. Therefore, considering the limited dimensions of the matrices and the available memory, no attempt was made to preserve storage through economical matrix storage techniques. Instead, speed was the prevailing goal. Portability and a short development cycle being the main concerns, all the programs mentioned in this paper were written in the C programming language.

5.1. Modified Gaussian Elimination

Solving YV = I by the MGE technique consists in transforming the admittance matrix V into the upper triangular form of Fig. 3. This transformation is achieved by eliminating all elements to the left of the diagonal. The algorithm is shown in Fig. 4, where the multiplication factor Yji/Yii is applied to both (6) and (7).

In a matrix as sparse as the one shown in Fig. 2, the computation time required by the MGE transformation can be greatly reduced if mathematical operations are strictly limited to the non-zero complex elements of the lower triangular part of the matrix (58 or 4.7% in our test case). Once the upper triangular form of the admittance matrix has been obtained, the solution for V is obtained by backward substitution (8) and (9). The performance presented in this paper was obtained by skipping computations for most of the zero elements in the Y matrix. In the case where all elements are computed, the MGE technique proved to be 30 times slower.

5.2. Lower upper matrix decomposition

The LU decomposition technique consists in transforming the matrix of Fig. 4 into an upper triangular and lower triangular matrices as shown in Fig. 5.

I YII YI2 I YL.q l

y = .

L 0 " Yn in IJ Fig. 3. Admittance matrix after Gaussian elimination.


i* Gauss~an el ;minat~on ~/

for (i = I to n in steps of 1)

B E G I N for

for(]' = i + l to n in s teps of l )

if (module Yji ~ 0 ) then

B E G I N if

lpnz = position of last pivot row non-zero element

lnz =position of last nz element on current row

if( lpnz > lnz ) Inz = lpnz

for (k = j t o l n z in steps of 1)

B E G I N lor r , = r , r,, ~ (6)

E N D for Y

E N D if

/* Backward substitution */

E N D for

vo = z~ (8) Yo.

Fig. 4. Modified Gaussian elimination algorithm.

Mathematically, this transformation consists in expressing the equation YV= I as ( L U ) V = L ( U V ) - - I . The solution comes from applying a forward substitution to solve z = UV= L - 1 I and finally a backward substitution to solve V = U- lz . The algorithm of the L U decomposition is shown in Fig. 6. The L U algorithm, including the decomposition, required 400 milliseconds on a SPARC ELC and 200 milliseconds on a SPARC 10, as shown in Table 1. Attempts were made to speed up the algorithm. Introducing conditional statements in any of the loops of the algorithm produced no noticeable speedups.

5.3. Discussion

L U decomposition is generally considered to be the fastest algorithm available for solving matrices [8,11]. The contradictory results obtained thus require some explanation.

The MATLAB program was used to obtain the L and U matrices of the test matrix. Fig. 7 shows the distribution of the nz elements in the upper triangular matrix (U) of our test case. The U matrix contains 325 nz elements. The lower triangular matrix (L) (not shown here) contains 308 nz elements. The original matrix contained only 166 nz elements.

1 0 01[ i • ' U l , n 1 ] L21 1 I) . U22 " " .U2 ,n- I

L n q . l Lnq.2 l L 0 " " Un_ l.n_lj

Fig. 5. Lower (L) and upper (U) triangular matrices.


/* LU decomposition */

for( i = 1 to n in steps of 1)

BEGIN for

for( j = 1 to i in steps of 1

La~ U L~ = r ~ - / ~ - ~ (10 ~,

for( j = i to n in steps of 1) y-i

(11) ~ 0

END for

/* Forward substitution */

z I : LI---- ~

zi = ~ , - L 0 z~

/* Backward substitution */

V~ = z ~ u~

V , = U . - - z , -

J

Fig. 6. L U decompos i t ion algori thm.

In the application of the column-oriented MGE algorithm, each element of a row is to be multiplied by a pivot element and subtracted from the previous value of the element.

In complex arithmetic, this elimination process requires 8 multiplications, 4 additions and 2 subtractions for each element of the row to the left of the diagonal. Fig. 8 shows a tracing of the 203 n z elements manipulated by the MGE algorithm. To obtain this figure, the MGE

Table 1 Execut ion t imes in mil l iseconds

C P U models Algor i thms

M G E c M G E r L U

Init ial Init ial Reduced Initial Init ial Reduced matr ix matr ix matr ix matr ix matr ix matr ix with w / o with w / o with with indices indices indices indices indices indices

P C / 4 8 6 44 12 354 140 50 (33 MHz) S P A R C E L C 44 55 17 400 128 44 S P A R C IPX 40 11 107 38 S P A R C 10 16 18 4,5 200 46 16 TMS320C30 38 40 14 700 160 60 TMS320C40 19 19 7.7

290 M. Lauoie et al. / Mathematics and Computers in Simulation 38 (1995) 283-292

0 • • •• • t 1 0 ' " ' - , . ' i " '"" "

NI.itI... .~

e¢$ m ~

4O

5O 0 2'0 40

nz = 325

Fig. 7. Element distribution in the upper triangular factor of the test matrix.

algorithm inserted a 1 in a tracing matrix each time it encountered an nz element. This tracing was then written to a MATLAB-compatible matrix file. The matrix tracing file was then retrieved by MATLAB. Fig. 8 shows the result.

From Fig. 8, we can estimate that in this particular case, MGE executed a minimum of 203 x 14 = 2,842 operations in the elimination phase of the algorithm to produce the upper triangular matrix necessary for the backward substitution process. To get to the same point, L U

had to execute decomposition followed by forward substitution, or roughly 30,000 operations, hence an accountable ratio of approximately 10.

6. Synchronous machine modeling

The synchronous machine algorithm simulated in DSP #1 fully models the generator, excitation system, stabilizer, turbine, penstock, and governor. It has been extensively described in [4] so only the general block diagram is presented in Fig. 9. Each part of the machine is modeled according to its block diagram using a library of basic blocks, such as adders, multipliers, transfer functions, non-linear functions, etc. This principle allows the implementa-

10

2O

3O

4O

50 0

tlt,. mill .

"18 •

. tI|k #Ira m l

20 40 nz = Z03

Fig. 8. Tracing of e lements manipulated by MGE.


_ Reference -power / r 1

I I Gate. / I GO RNOR I o0o ,ng i s0e0d I ST 'L'ZERt 51

Refer .... I EXC~TATIosl I POWEFC I voltage ~- SYSTEM I ~

Field l ' ~ - ~ ' " - voltage~ ---------q I

Voltage

Fig. 9. General block diagram of the machine model.

tion of almost any kind of control system based on its block diagram (machine, protection system, static compensator, etc.). Basically, the functional principle of a synchronous machine is as follows: the generator produces a terminal voltage according to the field voltage and the stator current. The excitation system regulates the terminal voltage close to the reference voltage. The excitation system also has an input from the stabilizer which has the function of temporarily reducing (or increasing) the terminal voltage to compensate for sudden power changes until the governor has had time to react. The governor adjusts the gate opening to regulate the speed and the power to their reference values. Finally, the turbine algorithm models the mechanical swing equation by adjusting its speed to the difference between the electrical and the mechanical torques.

7. Test results

The accuracy of the results obtained by the MGE and LU algorithms were compared to those obtained with MATLAB. Both algorithms produced exactly the same results to within 0.001% (1/10000) on any of the 50 complex voltage results.

Table 1 shows the average values obtained from 1000 consecutive executions of the MGE algorithm and 100 consecutive executions of the LU algorithm to solve our 50-bus test matrix. Execution times on the TMS320C40 are shown to take less than half of a full 60 Hz cycle, or less than 8.33 milliseconds. This is slightly faster than real-time for the simulation of the power system's transient stability behavior.

The execution time for each synchronous machine (not shown in Table 1) is 250/zs on the TMS320C30. The 2 ms required to produce the 8 machine currents is far below the time required for solving YV = I (3).

8. Conclusion

In this paper, the results of a real-time simulation of the power system's transient stability behavior have been presented. The partit ioned approach was implemented in parallel digital

292 M. Lavoie et aL / Mathematics and Computers in Simulation 38 (1995) 283-292

signal processors. Preliminary results indicate that the real-time simulation of such systems using commercially available hardware is feasible for typical networks of 50 nodes.

Future efforts will be directed towards increasing the number of machines solved by one DSP. The CPU processing power is available and this only remains to be done. While this paper was being written, new and more general accelerating schemes were proposed. In the near future, new algorithms will be thoroughly investigated on a variety of matrices. Specifi- cally, tests on the IEEE-118 bus matrix are planned. Eventually, parallel algorithms will be sought that allow the real-time simulation of power system stability using parallel digital signal processors.

Acknowledgements

The authors are very grateful to IREQ for its support of this project.

References

[1] F.L. Alvarado, Parallel solution of transient problems by trapezoidal integration, IEEE Trans. Power App. Syst. (1979) 1080-1090.

[2] J.S. Chai and A. Bose, Bottlenecks in parallel algorithms for power system stability analysis, IEEE/PES 1992 Winter Meeting (1992) 1-7.

[3] J.S. Chai, N. Zhu, A. Bose, and D. Tylavsky, Parallel Newton type methods for power system stability using local and shared memory multiprocessors, IEEE/PES 1991 Winter Meeting (1991) 1-7.

[4] V.Q. Do and A.O. Barry, A real-time model of the synchronous machine based on digital signal processors, IEEE-PES 1992 Winter Meeting (1992) 1-7.

[5] A. Edstrom and K. Walve, Fast power system simulator for transient stability and long term dynamics, CIGRE, Proceedings of the 33rd Session, International Conference on Large High Voltage Electric Systems, Vol. 2, Paris, France (1990) 1-6.

[6] A. Edstrom and K. Waive and G. Andersson, Models and algorithms for a fast power system simulator, Tenth Power Systems Computation Conference, Graz, Austria (1990) 735-741.

[7] M. LaScala, A. Bose, D.J. Tylavsky and J.S. Chai, A highly parallel method for transient stability analysis, IEEE Trans. Power Systems 5 (1990) 1439-1446.

[8] W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical Recipes in C: The Art of Scientific Computing (Cambridge Univ. Press, Cambridge, 2nd Ed., 1992).

[9] H. Taoka and S. Abe, Fast solution method of sparse network equations with an array processor, IEEE/PICA, May 6-10, San Francisco, California (1985) 215-221.

[10] H. Taoka, I. Iyoda, H. Noguchi, N. Sato and T. Nakazawa, Real-time digital simulator for power system analysis on a hypercube computer, IEEE/PES 1991 Winter meeting, (1991) 1-7.

[11] D.J. Tylavsky et al., Parallel Processing in Power Systems Computation, An IEEE committee report by a task force of the computer and analytical methods subcommittee of the power systems engineering committee, paper 91 SM 503-3 PWRS, 1991 IEEE/PES Summer Meeting (1991) 1-9.

[12] A. Valette, F. Lafrance, S. Lefebvre and L. Radakovitz, ST600: Programme de stabilit6 - - manuel d'utilisation, Institut de recherche d'Hydro-Qufbec, Varennes, Qu6bec, Canada, janvier 1987.

real-time simulation of power system stability using parallel digital signal processors

Documents