mstp routine troubleshooting manual (issue 1)

91

Upload: deintkap-airport

Post on 03-Apr-2015

327 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: MSTP Routine Troubleshooting Manual (Issue 1)
Page 2: MSTP Routine Troubleshooting Manual (Issue 1)
Page 3: MSTP Routine Troubleshooting Manual (Issue 1)

-i-

Contents

Chapter 1 Hardware Faults......................................................................... 1

1.1 NE Off-management and Service Interruption Caused by the

Replacement of O4CSD Board ..................................................................... 1

1.2 NE Off-management Caused by S320 NCP Fault................................... 3

1.3 2M Service of a Site’s S360 Equipment Reports for AIS Alarm............. 5

1.4 CSB Board Malfunction.......................................................................... 7

1.5 All Optical Boards in Self-loop Report LOF or LOS .............................. 8

1.6 Optical Board Failure Leads to B1 Error................................................. 9

1.7 CSC16x16 Board Malfunction in Power-up.......................................... 13

Chapter 2 Performance Faults .................................................................... 1

2.1 Optical Board Causes B1 Error Codes .................................................... 1

Chapter 3 Data Configuration Faults ......................................................... 1

3.1 SEC Board Reports LFD and VC12 Extensible Markup

Mismatching.................................................................................................. 1

3.2 Service Commissioning of TGE2B-E Board in an Office Fails .............. 2

Page 4: MSTP Routine Troubleshooting Manual (Issue 1)

-ii-

3.3 AU not Configured with Service in 10G Optical Board of ZXMP

S390 Reports AU-AIS Alarm........................................................................ 4

Chapter 4 Power Faults ............................................................................... 1

4.1 Service Boards in Some Slots Report Channel Alarm............................. 1

4.2 Some Boards’ Service Failure ................................................................. 3

Chapter 5 Protection Faults ........................................................................ 1

5.1 MS Switching Causes Temporary Break in Service................................ 1

5.2 Timeslot Configuration Confusion Causes Path Protection

Configuration Failure .................................................................................... 2

5.3 Cross-connect Board Failure Causes MS-ring Switching

Unsuccessful ................................................................................................. 8

5.4 LP16 Board Failure Causes MS Protective Switching Unsuccessful .... 11

5.5 ZXMP S360’s OL1 Board Fault Causes Path Ring Switching

Failure ......................................................................................................... 13

5.6 S360 MS Switching Causes Part Services Unstable.............................. 16

Chapter 6 NM Faults ................................................................................. 21

6.1 E300 NM Alerts “Database Disconnected”........................................... 21

6.2 T31 NM’s Client Program Cannot Start up Normally........................... 27

6.3 E300 NM S320 NEs’ Board Indicator Lights Cannot Flash.................. 29

Page 5: MSTP Routine Troubleshooting Manual (Issue 1)

-iii-

Chapter 7 ECC Faults................................................................................ 31

7.1 Board Reset with Telnet ........................................................................ 31

7.2 NCP Board Fault Causes ECC Failure .................................................. 32

Chapter 8 Clock Sync Faults ..................................................................... 35

8.1 Clock Configuration Error Causes Unstable Clock ............................... 35

Chapter 9 ASON Faults ............................................................................. 39

9.1 Call Connection Cannot Reply 1—Insufficient Bandwidth................... 39

9.2 Call Connection Cannot Reply 2—Restriction of Route Policy ............ 41

Chapter 10 Interconnection Faults ........................................................... 45

10.1 10G Optical Boards of ZTE S390 Equipment and Marconi MSH64

Equipment Fails in Interconnection............................................................. 45

10.2 C2 Byte Causes ATM Service Interconnection Failure....................... 49

Page 6: MSTP Routine Troubleshooting Manual (Issue 1)
Page 7: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-1-

Chapter 1 Hardware Faults

1.1 NE Off-management and Service Interruption Caused by the Replacement of O4CSD Board

Fault Description

The Network Management (NM) software detects the Regenerator

Section (RS) and Multiplexing Section (MS) error codes at the O4CSD

board of a site’s S320 equipment. The cause of the fault is located as

O4CSD board failure.

After the 04CSD board is replaced on the spot, the network element

(NE) is off management and the service is interrupted. Insert the former

O4CSD board, and then the NE’s monitoring is recovered, and the

service is recovered. However, the RS and MS error codes still exist.

Cause Analysis

The O4CSD version configured by the NM software is

inconsistent with the version of the replaced board.

The replaced 04CSD board is faulty.

Troubleshooting

1. Confirm and acquire the information on the spot:

Page 8: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-2-

The PCB hardware version of the former O4CSD board is

20010900, and the software version is 20040716. In the NM

software, the hardware version is all configured as 100. And the

NCP software version is 20050728.

The PCB version of the O4CSD board replaced on the spot is

20040900, and the software version is 20060905. The board of

this version requires the hardware version configured at the NM

software to be all 200, and the NCP software version to be

20061027 or higher.

Based on the above, the fault is caused by low NCP software version

and the inconformity of NM configuration with the new version

configuration requirement of the O4CSD board. The fault leads to

abnormal operation of the equipment.

2. Replace with the former O4CSD board, and confirm that the NE’s

monitoring is recovered.

3. Upgrade the NCP board version to 20061027.

4. Modify the version setting of the O4CSD board to 200 through the

NM software.

5. Pull out the O4CSD board, and then insert the new board.

Page 9: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-3-

Conclusion

During the process of fault handling, note whether the new board

version is consistent with the on-site board version when replacing the

O4CSD board or NCP board.

1.2 NE Off-management Caused by S320 NCP Fault

Fault Description

The NE of a site’s S320 equipment is off management, and its

underlying NEs are all off management. However, the service of each

off-managed NE is normal. The neighbor NEs over the ring fail to

report LOS and MS-RDI alarms.

Cause Analysis

The neighbor NEs on the ring did not report LOS and MS-RDI

alarm, indicating that the optical path is normal.

The off-management situation is less likely caused by the

simultaneous failure of multiple optic boards. Therefore, the fault

is basically judged as NCP board down or NCP board failure.

Troubleshooting

1.Tele-reset this NCP board.

Page 10: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-4-

Since this NE is off-management, the NE’s NCP cannot be reset at the

NM software directly. However, since DCC connection failure is not

reported, use the following methods to tele-reset the NCP board.

(1)Telnet to any neighbor NE of the off-management NE.

Command format: telnet NE’s IP address

(2)Check the optical port connection by using the if command:

Command format: if –a

( 3)Tele-reset the NCP board of the faulty NE by using the

resetpeerncp command:

Command format: resetpeerncp 6 1

Command description: 6 refers to the slot No. of the board connected

with the faulty NE; 1 refers to the port No. of the optical port connected

with the faulty NE

2.If the fault cannot be tele-processed, it has to be processed on the spot.

(1)Pull and plug the NCP board on the spot to see if the problem is

solved. If not, proceed to the next step.

(2)Re-initialize the NCP board to see if the problem is solved. If not,

it must be the NCP board failure. Proceed to the next step.

(3)Replace the NCP board and the problem is solved.

Page 11: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-5-

Conclusion

The NE suddenly escapes from management, and there is no alarm in

the corresponding optical path of the upstream NE. Solve the problem

by tele-reset the NCP board first. Then re-initialize the NCP board on

the spot or replace the NCP board.

1.3 2M Service of a Site’s S360 Equipment Reports for AIS Alarm

Fault Description

There are AIS, down time and remote defect indication in all 2M

services of the 2# EP1 board in a site’s S360 equipment. Related

channels over the ring also report AIS and down time. The service

corresponding to the central site reports AIS and remote defect

indication.

Cause Analysis

This fault is caused by blocked path. The possible reasons are:

Faulty 2# EP1 of the site

Abnormal slot configuration

Failure in the cross-connect board or optical board of the site

through which the service passes.

Page 12: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-6-

Troubleshooting

1. Reset the 2# EP1 board. If the fault still exists, proceed to the next

step.

2. According to the whole network service report of the NM software,

check the service slot configuration to see if it is normal.

3. Upload and compare the timeslots of all services in passing

through the NE to see if the NM software and NE NCP service are

consistent.

4. Drop services to the tributary board hop by hop by using

dichotomy, and then locate the fault, which is between two sites.

5. Switch the two sites’ cross-connect (DXC) boards respectively to

see if the problem is solved. If not, proceed to the next step.

6. Perform loopback at the terminal side of the optical port for the

site that drop services. If the alarm disappears, it is the fault in the

optical board of this site.

7. Replace this optical board on the spot. Restore the previous data,

and the problem is solved.

Conclusion

For path troubleshooting, dropping with dichotomy or AU loopback

method is effective, and can quickly locate the problem in a segment,

thus saving great time.

Page 13: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

1.4 CSB Board Malfunction

Fault Description

One ZXMP-S360 equipment malfunctions. In the cross-connect board

inserted with a TCS4, CSB passes the power-on self-test (POST). Then

the red light and green light go out simultaneously, not on any more.

Cause Analysis

The red light and green light go out simultaneously after POST,

indicating that the time-division module of the CSB board fails to detect

the clock signal sent by the clock board. It is the crystal oscillator

failure of the clock board.

Troubleshooting

-7-

1. According to preliminary judgment, it is the failure in CSB board.

Replace the CSB board with the time division module, and the

problem still exists.

2. Replace it with the CSC board (whether it has the time division

module) to see if it works normally.

3. In the test, the CSB board can work normally without the time

division module.

4. Debug with the CSB board with no time division module. The

devices connecting both ends of the equipment are detected

reporting Lose Of Frame (LOF), and the self-loop local end’s

Page 14: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-8-

optical board is also detected reporting LOF. It is judged as the

failure in the clock board.

5. Replace the clock board, and the problem is solved. Insert the time

division module to the CSB board and the board can work

normally.

Conclusion

Only the CSB board with the time division module is conducted with

the test of this software. If the cross-connect board can work normally

without the time division module or with the CSC board (no matter

inserted with the time division module), use the alarm of the time

division module to locate the fault, which is in the clock board. It thus

eliminates the potential hidden trouble of the equipment.

1.5 All Optical Boards in Self-loop Report LOF or LOS

Fault Description

All optical boards of a NE report LOF and LOS. Self-loop optical

boards, and the alarm still exists.

Cause Analysis

All optical boards report LOS or LOF alarms, and even after the

self-loop of optical boards. The possibility of the damage in all optical

boards is small.

Page 15: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-9-

First, check the NCP board to locate the trouble and report the error

alarm.

Then, check the clock board to see if it is faulty, because it can lead to

unusable framing clock in the whole system. The signals transmitted by

the optical board cannot form frame.

Troubleshooting

1. Replace the NCP board to see if the problem is solved.

2. If not, replace the clock board and the problem is solved.

Conclusion

The optical boards in self-loop give out the alarm. The problem may be

caused by the self-loop optical boards, or by the NCP board or clock

board.

1.6 Optical Board Failure Leads to B1 Error

Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission

network. The whole network consists of three-end ZXMP-S360 NEs, to

form an unprotected link structure. The transmission rate is 2.5 Gbit/s.

The network architecture is shown in Figure 1-1. The central office is

located in NE A.

Page 16: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

Figure 1-1 Network Architecture

The optical fiber connection relationship is shown as in Figure 1-1. 2M

service is between NEs.

Query the monitor performance data at the NM software.

Checking NE A:

The service with NE B has a large amount of lower order

errors-V5 BBE in the tributary;

The service with NE C has a large amount of lower order

errors-V5 BBE in the tributary;

Some B1 BBEs are detected in the 5# OI16 line every 15

minutes’ performance;

LP16 has B2 BBE and B3 BBE errors.

Checking NE B:

There is no error in 5# OI16 and 11# OI16 lines;

10# LP16 and 13# LP16 have B2 FEBBE and B3 FEBBE

errors.

-10-

Page 17: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-11-

The service with NE A has a large amount of V5 FEBBE in the

tributary;

The service with NE C is normal.

Checking NE C:

The service with NE A has a large amount of V5 FEBBE in the

tributary;

Cause Analysis

Analyze the performance data in the line first. There are three kinds of

error codes monitoring the overhead byte in lines, including B1, B2 and

B3. They respectively monitor the quality of routes between the start

point and the end point.

B1 only monitors the route between two sites’ regenerator

section (RS), and the error codes are ended within the RS. That

is, B1 error codes of NE A and NE B will not be transferred to

NE C.

B2 only monitors the route between two sites’ multiplexing

section (MS), and the error codes are ended within the MS. NE

A and NE B are ADM type NEs, so B2 error codes will not be

transferred to NE C.

B3 only monitors the route between higher order paths of two

sites. Obviously, the routes monitored by B3 contain the routes

monitored by B2 and B1, and the routes monitored by B2

Page 18: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-12-

contain the routes monitored by B1. Since the services of the

same AU drop in NE B and NE C, B3 error codes generated by

NE A and NE B will not be transferred to NE C.

Based on the analysis of Fault Description, the error codes occur

between NE A and NE B. Therefore, it is hard to confirm whether it is

the receiving failure in NE A or the transmitting failure in NE B.

Troubleshooting

Locate the trouble by eliminating sites one by one.

1. Measure the receiving optical power of NE A to see if it is normal

(short-distance sensitivity is -18 dBm, long-distance sensitivity is

-28dBm).

2. Self-loop the 5# optical board of the local site A. If there are still

error codes in the local site, the fault is in NE A.

3. Replace the 5# optical board of NE A. If the error codes of the

whole network disappear, the problem is solved.

Conclusion

If there are B1 error codes, the fault is located between two points. If

the optical power is normal, the fault is in the optical board. Then leave

B2, B3, and V5 error codes alone. After B1 error codes are solved, if

the problem still exists, solve B2, B3, and V5 error codes respectively.

Page 19: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-13-

Since the routes monitored by B1 contain B2, B3, and V5 routes, it is

thought that B1 can cause B2, B3, and V5 error codes. Surely, there are

exceptions, such as error codes in RS overhead. Then there is B1 yet no

B2 and B3. However, this situation is very rare.

Fully apprehend the generation principles of B1, B2, B3 and V5 error

codes, and their relationship. Normally, the generation of error codes is

related to the corresponding optical board. Therefore, the fault should

be located step by step. Analyze layer by layer according to the error

generation mechanism of different layers. Finally, the fault is located in

the optical board.

Note:

Check the NM software performance value periodically, and process the

error codes if they are detected. Otherwise, when the error codes reach a

certain amount, it will affect the normal receiving of services. If it is

severe, the service may be interrupted.

1.7 CSC16x16 Board Malfunction in Power-up

Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission

network. The whole network consists of six-end ZXMP-S360 NEs, to

form a path protective ring structure. The transmission rate is 2.5 Gbit/s.

Page 20: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

The network architecture is shown in Figure 1-2. The central office is

located in NE A.

Figure 1-2 Network Architecture

The optical fiber connection relationship is shown as in Figure 1-2.

The original equipment is CSC8x8, which is now upgraded to CSC

16x16. After correct configuration at the NM software, CSC16x16 fails

to operation in power-up. NOM and ALM lights are ever bright. The

NM software alarms for mismatching board type.

Cause Analysis

CSC board fails to function. The possible reasons are as below:

Incorrect configuration at the NM software

-14-

Page 21: MSTP Routine Troubleshooting Manual (Issue 1)

Hardware Faults

-15-

The version of the NCP program is too out of date.

CSC board is not inserted well.

TCS board is not inserted well.

CSC board malfunctions.

Troubleshooting

1. After confirmation, the version of the NCP main program is

v1.00.023. The hardware and software version of CSC 16x16 is

consistent with the version of NCP.

2. Check the NM software configuration and it is correct.

3. Pull and plug the CSC board. The fault still exists.

4. Pull out the TCS 16x16 module inserted in the CSC board, and

reinsert it. The fault disappears.

Conclusion

The earliest NCP version matching the CSC16x16 is V1.00.001. The

NCP program of V1.00.023 has no problem in matching the CSC16x16.

If the configuration at the NM software is correct, check if the

CSC16x16 is inserted tight, or if the board is faulty.

Note:

Page 22: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-16-

When replacing the CSC board, the TCS time division board over it

should be inserted tight. Otherwise, the board may not function

normally.

Page 23: MSTP Routine Troubleshooting Manual (Issue 1)

Performance Faults

Chapter 2 Performance Faults

2.1 Optical Board Causes B1 Error Codes

Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission

network. The whole network consists of four-end ZXMP-S360 NEs, to

form a MS protective ring. The transmission rate is 2.5 Gbit/s. The

network architecture is shown in Figure 2-1. The central office is

located in NE A.

Figure 2-1 Network Architecture

The optical fiber connection relationship is as below:

The 11#OI16 board of NE A connects the 5#OI16 board of NE B, and

the 11#OI16 board of NE B connects the 5#OI16 board of NE C. There

are 2M services between the NEs.

-1-

Page 24: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-2-

Query the monitor performance data at the NM software, and some B1

BBE error codes are detected in the 5#OI16 of NE B, being about a

dozen every 15 minutes’ performance. However, the 11#OI16 of NE A

is sound in performance and has no error code.

Cause Analysis

The fault with only B1 error is easier to be processed. There are the

following causes of B1 error codes:

The optical board at the transmitting end malfunctions, leading

to error codes in the transmitted signals.

The optical board at the receiving end malfunctions, leading to

error codes during processing even if normal signals are

received.

The optical path fails. The optical power received by the

receiving end is too low, which goes beyond its sensitivity. It

leads to B1 error codes.

Clock failure

Troubleshooting

Since the ring is MS-ring, in order not to affect the service, MS

switching between NE A and NE B is implemented through the NM

software for troubleshooting.

Page 25: MSTP Routine Troubleshooting Manual (Issue 1)

Performance Faults

-3-

If the service switching is normal, switch the clock board and the fault

still exists.

Check if the received light is normal. If not, check whether the internal

and external connection of ODF rack is loose. If yes, check the optical

interface inside the optical board. Though this situation is very rare, it

needs to be checked.

After eliminating the cause of external optical power according to the

checks above, replace the 11#OI16 board of site A and observe its

performance value to see if there is B1 error code. Thus, the fault is

located at the 11#OI16 board of site A.

Conclusion

The fault analysis lists out four possible causes. During normal usage,

the case of sudden lessening received light and beyond the optical board

sensitivity is very rare. Therefore, the problem caused by the third

reason is the rarest. For the optical board of the transmitting end and the

optical board of the receiving end, most faults occur in the transmitting

end. Therefore, start troubleshooting at the transmitting end.

Page 26: MSTP Routine Troubleshooting Manual (Issue 1)
Page 27: MSTP Routine Troubleshooting Manual (Issue 1)

Data Configuration Faults

-1-

Chapter 3 Data Configuration Faults

3.1 SEC Board Reports LFD and VC12 Extensible Markup Mismatching

Fault Description

In the new commissioning of the ZXMP S390 equipment in an office,

after the service from SEC board to SFE8 between sites are configured,

the SEC board alarms for VCG Loss Of Frame Delineation (LFD) and

VC12 Extensible Signal Markup Mismatching.

Cause Analysis

LFD: When the frame header of GFP cannot be locked (being in search

and pre-sync state), it reports LFD alarm. If in locked status, the alarm

disappears. These alarms are caused by inconsistent encapsulation

protocols adopted by both ends. Therefore, both ends should select the

same GFP for encapsulation. If V1.0 SFE board is adopted in one end, it

should be upgraded to V2.0.

Troubleshooting

The software version of SFE8 on the spot is V2.0. Replace the

encapsulation protocol with GFP encapsulation to solve the problem.

Page 28: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-2-

Conclusion

In case of this fault, first check if the encapsulation protocols of the two

ends’ Ethernet network boards are consistent. If not, change their

encapsulation protocols into consistent.

Note:

The encapsulation mode of S330 equipment’s SFE board is determined

by the logic software of the board. That is, the logic software of the

boards adopted by GFP and HDLC’s protocol is inconsistent. The

encapsulation protocol should be consistent with the logic software

during usage.

3.2 Service Commissioning of TGE2B-E Board in an Office Fails

Fault Description

ZXMP S390 equipment is applied in an office, mainly used for 1000M

Ethernet transparent transmission board service. Recently, due to

network expansion, a new group of devices arrives. The commissioning

of 1000M Ethernet transparent transmission service is required between

new S390 sites and old sites, yet it fails on the spot. There should be no

hardware fault in the newly delivered devices and boards. According to

the on-site maintenance personnel, replace the TGE2B-E board

Page 29: MSTP Routine Troubleshooting Manual (Issue 1)

Data Configuration Faults

-3-

(hardware version: B030801) in the new equipment with the previous

TGE2B-E board (hardware version: B030300) and the commissioning

of service is successful.

Cause Analysis

After analysis, there is no problem in the board software version and

version matching, but in the interconnection setting of different

hardware version TGE2B-E board. The main difference of the two

hardware version TGE2B-E boards is: the B030801 version supports

standard LCAS protocol, yet the B030300 version does not support it.

Therefore, when the service is established between the two hardware

version boards, the LCAS function cannot be enabled; otherwise, the

service setup failure. Recover the field data configuration and the

problem is solved.

Troubleshooting

Change the NM software configuration. Disable the LCAS function at

both ends, and their service is normal. The problem is then solved.

Conclusion

The setting of LCAS function should be consistent at both ends, being

enabled or disabled.

Page 30: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-4-

3.3 AU not Configured with Service in 10G Optical Board of ZXMP S390 Reports AU-AIS Alarm

Fault Description

The 10G optical boards of site A3 and site B3 in an office report

AU-AIS alarm. However, the AU channel reporting these alarms is not

configured with service and acts as the protective AU channel of MS.

Compare the content of equipment database

Compare the equipment data with the NM database and they are

consistent.

Cause Analysis

Check the history alarms and NM setting by restoring the field data. It is

found that the Idle AU detection setting item under the Alarm menu in

the field NM data is set as enabled, which causes the AU not configured

with service reports the AU-AIS alarm.

In the SNCI mode, the system sends AU-AIS to the idle channel by

default. For instance, there are site A and site B, and they are

interconnected. Site A sends the AU-AIS, if site B is configured with

Idle AU Channel Detection, site B will detect the AU-AIS and report it.

Page 31: MSTP Routine Troubleshooting Manual (Issue 1)

Data Configuration Faults

-5-

Troubleshooting

Deselect the Idle AU Detection Setting item at the NM software to

cancel the alarm.

Conclusion

The AU channel not configured with service reports AU-AIS because

the NM software is enabled with the Idle AU Detection Setting

function.

Page 32: MSTP Routine Troubleshooting Manual (Issue 1)
Page 33: MSTP Routine Troubleshooting Manual (Issue 1)

Power Faults

Chapter 4 Power Faults

4.1 Service Boards in Some Slots Report Channel Alarm

Fault Description

Service boards of part slots prompt for channel alarm or pointer loss

alarm.

Take the expanded subrack for instance: it is inserted with the following

boards, as shown in Figure 4-1.

Figure 4-1 Boards Inserted in Expanded Subrack

-1-

Page 34: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-2-

Service configuration is: Each AUG of the 7# OL4 optical board is

configured to two EP1 boards. The 32nd tributary of the second EP1

board is disused. OL4 optical board has four AUGs in total. Ideally, it

needs eight EP1 boards to completely download the service, without

occupying the time division resource.

However, the service configured to two EP1 boards of 22# and 23# by

the first AUG of the 7# OL4 optical board always reports loss of TU12

channel alarm indication signal and loss of TU12 pointer. The service of

other slots' tributary boards is all normal.

Cause Analysis

The subrack is inserted with two power boards. Due to over-low current

output, the power board cannot supply power and becomes the load.

Therefore, the power supply to a specific slot’s board is too low, and the

board cannot work normally.

Troubleshooting

Pull out the power clock board with over-low output, or replace the

power clock board.

Conclusion

Whether the current output of the power clock board is stable will affect

the normal operation of all boards. Therefore, when more than one

boards malfunction, first check the operation status of the power clock

board.

Page 35: MSTP Routine Troubleshooting Manual (Issue 1)

Power Faults

-3-

4.2 Some Boards’ Service Failure

Fault Description

The S360 equipment is inserted with multiple boards, yet there is only

one power board, which may lead to service failure of some boards.

Cause Analysis

The subrack is inserted with multiple boards. Though they can work,

due to insufficient power supply and voltage, part high power

consumption chips of some boards cannot work normally.

Troubleshooting

Replace it with dual power clock boards.

Conclusion

When many boards are inserted in the subrack, the S360 equipment

should be configured with dual power clock boards.

Page 36: MSTP Routine Troubleshooting Manual (Issue 1)
Page 37: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-1-

Chapter 5 Protection Faults

5.1 MS Switching Causes Temporary Break in Service

Fault Description

During the networking of S360 device, NE A inserts the alarm to NE B

to implement Multiplex Section (MS) switching test. The transmission

service is interrupted for 7 seconds. Insert the alarm from NE B to NE

A to implement switching, and the test is normal.

Cause Analysis

Read the version of the board through the NM software, and the

difference is found. Refer to the table below for details.

NE Name NE A NE B

NCP 0X200106061030 0X200103051000

LP16 0X200204301640 0X200107021742

CSC 0X200107131058 0X200107131058

From the aspect of NE board version, this fault is supposed to be caused

by the inconformity in the software version of NE A and NE B’s

boards.

Page 38: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-2-

The distance of software time between NE A’s NCP board and LP16

board is too long, which leads to inconsistent version of the new LP16

and the old NCP board.

Troubleshooting

Upgrade the software of the NCP, LP16 and CSC boards, and the fault

disappears.

Conclusion

Before the stop production of S360 device, the final version is launched.

The old version boards of the existing network should be upgraded to

this final version as possible, to avoid the fault caused by too large

version discrepancy in the device’s board.

5.2 Timeslot Configuration Confusion Causes Path Protection Configuration Failure

Fault Description

At the initial stage of a GSM engineering project, many BSCs’ SDH

slot configuration at the equipment room side is not in accordance with

the standard (that is, Ts1~Ts16 respectively correspond to the 1st to

16th E1 of the ET1 tributary board). The drop of many Ts is confused,

as shown in Figure 5-1.

Page 39: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

Figure 5-1 Network Timeslot Configuration

However, after the network reconstruction for MS-ring, due to the

confused grounding of slots in two directions and the reutilization of

slots in direction A and direction B, the path protection cannot be

configured.

When BTS side and BSC DDF side E1 lines are completed, the site is

also put into commercial application. Therefore, it is required to

complete path protection within the shortest period when E1 lines need

not to be remade.

-3-

Page 40: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-4-

Cause Analysis

Under the condition when the BSC side and BTS side E1 ports are

unchanged, the path protection configuration can be completed by

adjusting the configuration of slots at the BSC side and BTS side.

Besides, since the network is already in commercial application, it is

required to complete the configuration within the shortest period as

possible to shorten the service interruption time. Therefore, the slot

configuration information which needs to be configured as MS

protected NE NCP is loaded, to assure consistent data between the NE

NCP and the NM software. Then, export the related service report to

calculate the port information, and match the BSC side port to the BTS

side port in one-to-one mode. Implement slot configuration in off-line

mode. After that, export the related service report and contrast it with

the former port information to see if they are consistent.

Note:

During the slot delivery after configuration, the instantaneous break of

service might occur.

Troubleshooting

1. Calculate the sites working normally over the ring and observe the

performance of the network formed in ring, including 15-minute

Page 41: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-5-

analog/digital performance, and 24-hour analog/digital

performance, to confirm that the network can reach the

configuration requirement of path protection.

2. Check if the optic connection of the network is correct.

3. Upload the NCP configuration information of all sites over the

ring, to assure that the data is the latest.

4. Export the current slot and port configuration information; that is,

the report in “related service query” and save it.

5. All NEs over the ring are off-line (For this step, the configuration

can be checked outside the equipment room).

6. Configure the BSC side SDH according to the one-to-one

correspondence relationship of the slot and the port, as shown in

Figure 5-2.

Page 42: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

Figure 5-2 Network Timeslot Configuration after Reconfiguration

7. At the BTS side, according to the corresponding port information,

and the corresponding slot of two directions’ drop, all other slots

are through, completing the path protection configuration.

8. After configuration, export the service report. Check the modified

port information, to keep consistent with the previous port

information.

9. Set the NEs over the ring as on-line.

-6-

Page 43: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-7-

10. Download the updated slot configuration to all NCP boards over

the ring.

11. The path protection configuration is completed.

12. Check the operation status of sites with BSC engineers to assure

that the path protection has been configured.

13. In direction A and direction B, at the optical path connected with

the SDH at the BSC side, insert the MS-AIS step by step, to assure

that the path protection configuration is successful.

Conclusion

1. Pay attention to the slot configuration mode and method at the initial

stage of engineering project. Make periodic check and patrol, so that the

problem can be detected and processed in time.

2. When the timeslot configuration is confused, this method can serve

as reference.

3. Fully apprehend the relationship between the port and the slot, as

well as the configuration method of path protection.

Page 44: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

5.3 Cross-connect Board Failure Causes MS-ring Switching Unsuccessful

Fault Description

A network consists of three ZTE’s ZXMP-S360 equipments, to form a

2.5G MS protective ring, as shown in Figure 5-3.

A

BC

11#

11#

5#

5#

11# 5#

Figure 5-3 Network Structure

The 5# OI16 of NE A reports B1 error, and generates B2 error

simultaneously.

The 10# LP16 of NE B reports the remote end B2 error.

NE A and NE B report MS protection switching event simultaneously.

However, the service from point A to point B is interrupted.

Cause Analysis

Whether the MS protective switching is normal? Check the MS

-8-

Page 45: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-9-

protective switching status of site A and site B. Choose

Maintenance>Diagnosis>Protective Switching to query the status

of site A’s 5 # OI16 and site B’s 11 # OI16, being

“Auto-switching completed, waiting for recovery”. It proves that

the MS switching event has occurred. Check the switching status

of site C, and it shows no request.

Whether the MS protection configuration is correct and whether it

starts up normally? Query the register 0x50009 (2-byte) of each

site’s 7#LP16, which all display “0100”. It proves that they are

configured with two-fiber bi-direction MS protecting protocol and

are in start-up status. Query the 0x40000 (3-byte) of 7#LP16,

and they respectively display east-direction APS ID,

west-direction APS ID and this node APS ID. The configuration is

correct. Query the 0x40005 (1-byte) of 7#LP16, and site A

displays 01, site B displays 00, site C displays 04. It means that

switching occurs in west direction in site A, switching occurs in

east direction in site B, and no switching occurs in site C. It

proves that the switching is successful. Query the register address

3000 (4-byte) of site A’s LP16 board to check the received K1K2

bytes in west and east direction; query the 3001c (4-byte) to check

the transmitted K1K2 bytes. If all are normal, check the K1K2

bytes of site B and site C. If no error, it proves that the MS

protection is normal.

It may be the fault in a board which leads to the problem in the

Page 46: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-10-

latter eight AU service channels used as protection and the service

failure after switching. It can only be judged by loopback. Check

the flow direction of service between point A and point B during

switching: suppose that the service between point A and point B

goes to upper and lower tributary at the 2#EP1. During the

switching between A and B, the service flow direction of point A

is: 2#EP1 of point A→5#OI16 AU1 of point A→(switching)

11#OI16 AU9 of point A→5#OI16 AU9 of point C→11#OI16

AU9 of point C→5#OI16 AU9 of point B→(switching) 11#OI16

AU1 of point B→2#EP1 of point B. The switching process is then

completed.

After analysis, we can make corresponding loopback operation.

Troubleshooting

1. Loopback operation: Hang the BER table on a tributary of 2# EP1

in point A, and then loopback the AU mentioned in cause analysis

section by section. The board with fault is then located.

2. Loopback at the line side of 5# OI16’s AU9 in site B, and it is

normal.

3. Loopback at the terminal side of 11# OI16 in site B, yet the

service fails. Locate the fault in site B.

4. The faulty boards might be CSC, 10# LP16 or 4# LP16. Since

there is a standby board for CSC board, switch the cross-connect

Page 47: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-11-

board to switch the service to the standby cross-connect board.

The service is then recovered.

Conclusion

MS protective switching is unsuccessful, which is normally located at

the LP16 board or optical board. Yet the cross-connect board is

neglected. Sometimes, change the thought can locate the fault faster.

5.4 LP16 Board Failure Causes MS Protective Switching Unsuccessful

Fault Description

The network consists of five ZXMP-S360 equipments to form a 2.5G

MS protective ring, and the service is normal.

During network operation, the service is interrupted when the optic

fiber is disconnected. The MS switching is unsuccessful.

Cause Analysis

The configuration of the MS is abnormal.

Check if the MS APS starts up normally.

LP16 board failure

Optical board failure

Cross-connect board failure

Page 48: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-12-

Troubleshooting

1. Check the configuration of the MS and it is normal.

2. Check if the MS APS starts up normally. It can only be judged

from the register. Read the 50009 byte of ROM register of LP16

board in No. 7 slot. The length of the read-out byte is set as 2

bytes, and the read-out value is 0100. It indicates that the MS

configuration is correct and it can start up normally (for LP16F

board, read the a0009 byte).

3. Then, read the K1K2 bytes of each site. Read the 30000 byte of

ROM register of LP16 board in No. 7 slot. The length of the

read-out byte is 4 bytes. The data respectively means that the node

receives K1 byte in east direction, receives K2 byte in west

direction, receives K1 byte in west direction, and receives K2 byte

in west direction. In normal operation, the first four bits of K1

byte and the last four bits of K2 byte should all be 0. However, the

first four bits of K1 byte and the latter four bits of K2 byte read

out at a site are not all 0. Therefore, the 7# LP16 of this site is

doubtful.

4. Replace the 7# LP16 board of this site according to the steps

below:

(1) Disconnect the 5# optical fiber of this site.

(2) Suspend the APS protocol of the two sites whose optical fiber is

disconnected.

Page 49: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-13-

(3) Set the 77777 register of the two sites’ cross-connect board to

01.

(4) Pull out the 7# LP16 board to replace it.

(5) When the LP16 board is normal, start up the APS protocol of

the two sites.

(6) Set the 77777 register of the two sites’ cross-connect board to

00.

(7) Reconnect the optical fiber.

Conclusion

The register is frequently used during troubleshooting. It is also a very

effective tool.

5.5 ZXMP S360’s OL1 Board Fault Causes Path Ring Switching Failure

Fault Description

A local transmission network adopts ZTE’s ZXMP S360 equipment in

networking. The whole network consists of four ZXMP S360 NEs to

form a path protecting ring. The transmission rate is 155 Mbit/s.

The network structure is shown as in Figure 5-4, and the central office

is located at NE A.

Page 50: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

Figure 5-4 Network Structure

The connection relationship of optical fiber is: 10# OL1 of NE A is

connected to 7# OL1 of NE B, and 10# OL1 of NE B is connected to 7#

OL1 of NE C.

Service configuration: There are 2M services from NE B, C, and D to

NE A. The B-to-A work path is from the 7# of NE B to the 10# of NE

A, and the protecting path is from link B-C-D to the 7# optical fiber of

NE A. The services of the three NEs are in the same AU.

One day, the fiber between NE A and B is disconnected, and all 2M

services from NE B to NE A are interrupted.

Cause Analysis

1. For the path ring protection failure, first judge if the service

configuration is correct. The protecting path is checked and found

to be normal. To assure the consistency of the NM data and NE

-14-

Page 51: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-15-

data, re-deliver timeslots to each site of the protecting path, yet the

fault still exists.

2. Check the timeslot configuration, the protection timeslot from NE

B to NE A is in straight through from NE C and NE D. Since the

service of NE C is in straight through from NE ED, the

straight-through service of NE D is judged as normal. To further

confirm the situation, configure the NE A-to-NE B service to NE

C and the service is normal. It therefore proves that the

straight-through of NE D is normal.

3. After eliminating the doubt on NE D, mainly check NE C and NE

B. Since the local service of NE C is normal and only the

straight-service fails, it may be caused by the fault in EP1 board,

cross-connect board, or 7# OL1 board of NE C, or caused by the

fault in EP1 board, cross-connect board, or 10# OL1 board of NE

B itself. Since the service of NE B is broken, check and operate

NE B first.

4. To locate the faulty site, loopback at AU's terminal side for NE B's

10# optical board, and find that the tributary board alarm of NE B

still exists. Loopback at the terminal side of NE C's 7# optical

board, and the alarm of NE A corresponding to NE B's service

disappears. Thus, the fault is surely in NE B.

Page 52: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-16-

Troubleshooting

1. First, switch the cross-connect board of NE B, and the problem

still exists.

2. Then, replace the EP1 board, and the problem still does not

disappear.

3. Finally, replace the OL1 board and the fault disappears. In this

way, the fault is judged to be in the 10# OL1 board of NE B.

Conclusion

For the service break caused by protection switching, check if the

protection configuration and data are correct. Then locate the fault point

and analyze the reason.

Shrink the range of fault location through methods such as switching

back or changing configuration.

5.6 S360 MS Switching Causes Part Services Unstable

Fault Description

Figure 5-5 shows a 2.5G MS-ring consisting of six-end S360

equipments. When the MS does not switch, all services are normal.

There is no abnormal alarm and performance in service board or optical

board.

Page 53: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

One day, when MS switching test is implemented over the ring, after

the switching between NE D and NE E, a short break occurs in part

services every 3 to 5 seconds. Switch back and the service is recovered.

Figure 5-5 Network Structure

Cause Analysis

The service is broken when protection switching occurs, so the problem

is located in the channel, which can be analyzed from the following two

aspects:

NE D or NE E’s cross-connect board fault

NE D or NE E’s LP16 board fault

Troubleshooting

In the STM-16 two-fiber MS protection (MSP) ring consisting of S360

equipments, the services go through the working channel when there is

-17-

Page 54: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-18-

no switching. The LP16 board near the cross-connect board processes

the service of 1-8# AU.

After MS switching occurs, part services pass through the protection

channel, and the LP16 board used for protection needs be used (that is,

away from the LP16 board of cross-connect board).

Follow the operations below to be away from the LP16 board of

cross-connect board:

1. Switch the master/standby cross-connect boards of the above NEs

at the NM software, and the master/standby cross-connect boards

of all sites are verified as normal.

2. The maintenance personnel provide an important clue: the air

conditioning of NE E’s site malfunctions before (already repaired

now). Therefore, the LP16 hardware may be faulty due to high

temperature. After the LP16 board is replaced, in the switching

status, part services passing through the protection channel are still

broken, yet one break in every 2 minutes, a little bit improved.

Since the problem is still not solved, the LP16 of NE D’s site is

also doubtful.

3. In the equipment room of NE D, the temperature is found to be a

little high. Check the fan and the dust screen. The dust screen is

heavily dusted, and the fan does not rotate. The power of the fan is

Page 55: MSTP Routine Troubleshooting Manual (Issue 1)

Protection Faults

-19-

on. So the malfunction of fan is doubted to be caused by high

temperature or heavy dust. Dissipate the heat of the equipment

with the fan, and three fans in the fan subrack start working. When

the temperature returns to normal, the fault still exists.

4. Replace the 24# LP16 board, and the fault disappears. The fault of

the 24# LP16 board is likely caused by high temperature.

Conclusion

Check the operation environment of the equipment periodically. In case

of the failure in air conditioning in the equipment room, repair it

instantly.

Clean the dust screen of the equipment periodically and check if the

fans of the equipment work normally. In case of the fan fault, replace

the fan in time.

Page 56: MSTP Routine Troubleshooting Manual (Issue 1)
Page 57: MSTP Routine Troubleshooting Manual (Issue 1)

NM Faults

-21-

Chapter 6 NM Faults

6.1 E300 NM Alerts “Database Disconnected”

Fault Description

In the process of transmitting E300 V3.18R2 version NM software, the computer and

the NM software are restarted due to sudden power failure, and the login to the NM

client end fails. The detail info table displays “Database disconnected”.

Check the NM process and find that the database service

process—dbsvr.exe is not started or disappears quickly after startup.

At the dbman tool page, execute 3 and then 1. Check the operation

status of each database, and find that the status of the config. database is

suspend, and the object status of the config. database is unknown, as

shown in Figure 6-1.

Page 58: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

Figure 6-1 NM Process Page

Cause Analysis

1. Analyze the log file and find that the config. database is suspended

in the Sybase database, so the dbsvr.exe process cannot start up

normally.

The faulty section in ..\db\dbsvr .log shows that the dbsvr.exe process

keeps restarting yet fails all along.

1 2008/10/14 13:35:07 20480

ZXONM E300 for NT DBSVR V3.18 R2P08a

COPYRIGHT(C) 2001-2007

2 2008/10/14 13:35:07 20481 [dbserver]

-22-

Page 59: MSTP Routine Troubleshooting Manual (Issue 1)

NM Faults

-23-

dbserver thread starts up

3 2008/10/14 13:35:07 20482 [dbserver]

dbserver thread exits

4 2008/10/14 13:35:07 28674 [dbserver]

DBSVR exits

1 2008/10/14 13:35:22 20480

ZXONM E300 for NT DBSVR V3.18 R2P08a

COPYRIGHT(C) 2001-2007

2 2008/10/14 13:35:22 20481 [dbserver]

dbserver thread starts up

3 2008/10/14 13:35:22 20482 [dbserver]

dbserver thread exits

4 2008/10/14 13:35:22 28674 [dbserver]

DBSVR exits

2. The faulty section in ..\db\dboperate_error.log first attempts the

config. database recovery yet fails. Finally, it keeps reporting that

the config. database is suspended.

Database 'TransDB' has not been recovered yet - please wait and try

again.

Page 60: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-24-

Database 'TransDB' cannot be opened. An earlier attempt at recovery

marked it 'suspect'. Check the SQL Server errorlog for information as to

the cause.

3. The faulty section

in ..\sybase\ASE-12_5\install\SQL_ZXONM.log has no TransDB

online status, yet has the above recovery attempt and suspended

record.

Record of normal startup and online status:

00:00000:00001:2007/06/19 09:09:56.44 server Recovering database

'TransDB'.

00:00000:00001:2007/06/19 09:09:56.44 server Redo pass of

recovery has processed 1 committed and 0 aborted transactions.

00:00000:00001:2007/06/19 09:09:56.51 server Checking external

objects.

00:00000:00001:2007/06/19 09:09:56.52 server The transaction log in

the database 'TransDB' will use I/O size of 2 Kb.

00:00000:00001:2007/06/19 09:09:56.52 server Database 'TransDB' is

now online.

Record of recovery attempt and suspended status:

00:00000:00001:2008/10/14 08:16:24.31 server Database 'TransDB'

has not been recovered yet - please wait and retry.

Page 61: MSTP Routine Troubleshooting Manual (Issue 1)

NM Faults

-25-

00:00000:00001:2008/10/14 10:11:30.89 server Database 'TransDB'

cannot be opened. An earlier attempt at recovery marked it 'suspect'.

Check the SQL Server errorlog for information as to the cause.

4. To sum up, the config. database in the Sybase database fails after

sudden power down and is suspended. The cause is that the

internal in-database mode of sybase in the NM software of E300

V3.18R2 or above version is changed to the asynchronous mode.

Though the efficiency of writing to database is raised, the risk of

the database being suspended due to the sudden power down in the

process of in-database is great.

5. The NM software is low in efficiency when processing huge data

in sync mode, typically represented in database error report and

history data loss. In the rare case when the history data for daily

in-database is huge all along, operations such as wrap connection

and storage transfer of the database every six hours will take up

great CPU and memory resources, and the NM software cannot

operate. The consequence is also quite serious.

Troubleshooting

1. Assure the electric safety of the NM software, and prevent sudden

power failure during the operation of the NM software. If there is

no guarantee to the power supply, change the in-database mode for

sites which have no huge history data or low requirement on

history data.

Page 62: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-26-

2. In case of problem, the temporary solution can be adopted. That is,

recover the database in usage to readable by using the dbman tool,

and then re-activate the database.

3. Re-install the NM software (Upload after the latest backup data is

ready or the more recent data is recovered).

Use the DBMAN tool to solve the fault of Sybase’s being suspended.

The steps of re-creating database after recovery are as below:

(1) Set the database to "bypass recovery" status.

1>sp_configure "allow updates",1

2>go

1>use master

2>go

(2) Set the database to " readable" status.

1>update sysdatabases set status=-32768

2>where name="database_name"

3>go

1>shutdown with nowait

2>go

(3)Restart Sybase by using the dbman tool (execute 2 and then 1) for

Page 63: MSTP Routine Troubleshooting Manual (Issue 1)

NM Faults

-27-

Start dataserver. Use the dbman tool (execute 4 and then 1) to

Backup database. Backup the data of the current NM software

(may select whether to backup history data).

(4)Re-create database through the dbman tool (respectively execute

3->3, and 3->2) for Drop database and Create database. After

creating the database, recover the backup data through the dbman

tool.

Conclusion

The above problem should be noted if the NM software of E300 V3.18

or above version is adopted in the engineering. This solution can

recover the data of the NM software in usage, and avoid NM software

reinstallation.

6.2 T31 NM’s Client Program Cannot Start up Normally

Fault Description

After installing the T31 client program at a maintenance terminal, click

the client program yet the login page cannot appear normally. The

program is not loaded.

Cause Analysis

1. Check Task Manager of the operation system (OS) and find no

client program loading or related java process. It indicates that the

Page 64: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-28-

client program is not executed by the OS and directly exits

abnormally.

2. Close some run programs of the OS, such as antivirus software

and firewall. Then start up the client. There is certain probability

of starting up the client program normally.

Troubleshooting

1. Check the hardware configuration of the computer. If it is uni-core,

1G memory and built-in video card, it might be caused by

insufficient memory. It needs to modify the configuration of T31

client program to verify the problem.

2. Modify the file \ums\clnt\bin\run.bat, and search for ‘set

JVM_MX=-Xmx512m’. If the value is too big, it may cause over

large demands on memory. The computer with insufficient

memory and low hardware configuration may not run the program

normally.

Conclusion

1. T31 NM software requires high hardware configuration of

computer. Even the client is used solely, it may cause insufficient

memory and the program may not run normally.

2. Some other software may also lead to the abnormal running of

client program. Try to avoid installing programs such as firewall

and antivirus software on the OS of computer.

Page 65: MSTP Routine Troubleshooting Manual (Issue 1)

NM Faults

-29-

6.3 E300 NM S320 NEs’ Board Indicator Lights Cannot Flash

Fault Description

In some S320 NEs of E300 NM software, all boards’ indicator lights are

shown grey in the boards view, unlike the boards view of other S320

NEs (flashing slowly in green normally).

Cause Analysis

1. NCP board down or S port blocked

2. Abnormal working of the NM software

3. Old NCP board version

Troubleshooting

1. The NM software implements communication test for S port, and

the S port of all boards is tested as normal. The problem of NCP

board down or S port blocked is eliminated.

2. The board performance is checked as normal by the NM software,

and other NEs’ indicator lights are also normal. The abnormal

working of the NM software is eliminated.

3. Check the version update of the network. The E300 NM software

is upgraded from 3.16 to 3.18 version, yet the 3.18 version should

be cooperated with the new version of NCP to support the flashing

Page 66: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-30-

of indicator lights in NE’s boards view. The NCP version is found

to be old and cannot support the indicator light function.

4. After the version problem is confirmed, there are two solutions for

it: 1. keep the status unchanged since the board’s indicator light

function does not affect the normal maintenance; 2. upgrade the

NCP version of the S320 equipment to keep it consistent with

other NCP’s version. It is suggested to adopt the second solution

to achieve the indicator light function.

Conclusion

After upgrading the NM software, check if the new NM software

function can be used in the current network. Once a problem is found,

check in time and confirm if it needs to upgrade part NEs with old

version.

Page 67: MSTP Routine Troubleshooting Manual (Issue 1)

ECC Faults

-31-

Chapter 7 ECC Faults

7.1 Board Reset with Telnet

Fault Description

The IP and ID of the newly deployed ZXMP-S320 equipment are

already set at the central equipment room, and the commissioning of the

service is not started yet. The engineering construction team installs the

equipment at the subordinate sites, and connects the optic fiber. In this

way, to open a new service only needs to set the data at the central site,

with no need of testing at the subordinate sites.

However, after the engineering construction team completes the

installation and returns, when to open the service, the newly established

NE can be ping successfully at the NM software, yet the NCP time

cannot be acquired.

Cause Analysis

If the NE can be ping, the configuration of the IP address is

correct.

Check if the ID setting is correct. After enquiry, the ID setting

has no problem.

Check if the NCP state is normal. Telnet the NCP board.

Page 68: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-32-

Troubleshooting

1. After Telnet the NCP board, execute the resetmcu 1 command to

reset the NCP board.

2. Acquire the NCP time at the NM software, and it is acquired

normally. The NCP monitor is normal.

Conclusion

Method for judging if some devices support the resetmcu command:

After Telnet the NCP board, check with the Help command. If there is

the resetmcu command, the equipment supports it; if not, the equipment

does not support it.

7.2 NCP Board Fault Causes ECC Failure

Fault Description

Site A and B adopt the ZXMP S360 equipment. Site C, D, and E adopt

the ZXMP S320 equipment. Site A is the access NE, as shown in Figure

7-1. The NM software of site A can monitor other sites except site B.

Page 69: MSTP Routine Troubleshooting Manual (Issue 1)

ECC Faults

Figure 7-1 Fault Analysis

Cause Analysis

1. Telnet the NCP board at site A. Check the connection status of the

port and find that the route of site B’s optical direction is already

established.

2. The IP address of site B can be ping at the NCP board of site A,

yet the IP address of site B cannot be ping at the NM computer.

3. Access a laptop at site B to implement normal monitor over the

whole network. It is judged that site B is normal and the fault is in

site A.

4. Telnet the NCP board of site A and check the ECC route. If it is

normal, the optical board has no fault.

5. Finally, the fault is located in the NCP board of site A.

-33-

Page 70: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-34-

Troubleshooting

Reset the NCP board of site A and the problem is solved. Keep

observing. If there is any problem again, replace the board.

Conclusion

Get familiar with ECC related commands and usage. Solve the problem

on the basis of judgments from many aspects.

Page 71: MSTP Routine Troubleshooting Manual (Issue 1)

Clock Sync Faults

Chapter 8 Clock Sync Faults

8.1 Clock Configuration Error Causes Unstable Clock

Fault Description

ZXMP S360 equipment is adopted in forming chain network, as shown

in Figure 8-1. NE A and NE I are equipped with external clocks. After

the commissioning of the equipment, the clocks have always been

unstable, and there are sudden AU PJ pointer adjustments. When the

link is broken, some sites’ clocks loose lock.

Figure 8-1 Network Structure

Cause Analysis

If there is AU pointer adjustment, it might be caused by the clock sync

problem. The rule for processing clock sync faults is: Whether there is

B1, B2? Is there only the TU pointer adjustment? Process the AU

-35-

Page 72: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-36-

pointer adjustment. Switch the optical receiving direction.

Replace the clock board.

The clock board makes the external clock or extracted line clock as the

input of phase-locking circuit to compare the phase. Therefore, the

quality of this board’s crystal oscillator will affect the quality of the

clock.

Troubleshooting

1. Reach site I and check the clock configuration of each site.

Clock setting in site I: first, extract the clock of site H; then

the external clock.

Clock setting in site A: first, the external clock; then, extract

the clock of site B.

Clock setting of other sites: extract the line clocks at both

sides.

2. Get the clock status of each site, and they are all locked. However,

site B, C, D extracts the clock in site A direction, site H extracts

the clock of site I, site A and I are external clocks. After analysis,

the clock instability should be caused by configuration error.

3. It is confirmed that site A's clock is a 3-level clock instead of a

G.811 clock, and its clock level is lower than that of site I.

Therefore, site I adopts its own external clock, and does not enable

the S1 byte.

Page 73: MSTP Routine Troubleshooting Manual (Issue 1)

Clock Sync Faults

-37-

Note:

The S1 byte is not configured only in the ring network. Here, the cause

of line clocks being extracted from both sides is that the S1 byte is not

enabled.

4. Enable the S1 byte and do not change the clock setting. Extract the

clock from the equipment adjacent to site A (here is the G.811

clock) as the external clock of site A.

After modification, the whole network’s NEs synchronize site A, and

the external clock of site I is used as secondary reserved clock. When

the external clock of site A fails and enters into auto-oscillation, it sends

S1:0B. After receiving it, site I starts up switching and sends S1:04.

Conclusion

For ZTE SDH serial equipment, note the following points:

Do not configure the internal clock if possible.

If the internal clock is not configured, it enters into 24-hour

auto-oscillation is; if the internal clock is configured, it enters into

internal clock directly. Besides, the internal clock can hardly be

switched to other clock. For instance, external clock 1, external clock 2,

and internal clock are configured. External clock 1 and external clock 2

can switch mutually. However, to switch the internal clock back to

external clock, it needs the NM software to resend the command.

Page 74: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-38-

Adopt SSM

Currently, ZTE’s 10G equipment only supports the ITU-T standard.

ZXMP S360 equipment not only supports the ITU-T standard

mode, but also supports the self-defined mode. For the ring

network consisting of 10G and ZXMP S360 equipments, only the

ITU-T standard mode can be adopted.

The protection of SSM cannot be formed in the following two

kinds of situations:

(1) Ring network, which has the access of two external clocks to

achieve active/standby protection.

(2) Ring network, in which the access clock is internal clock.

Page 75: MSTP Routine Troubleshooting Manual (Issue 1)

ASON Faults

-39-

Chapter 9 ASON Faults

9.1 Call Connection Cannot Reply 1—Insufficient Bandwidth

Fault Description

In a network, a line section is interrupted due to cut-over. After the

optical fiber is broken, part service fails. Check the view of the route

called by the service, and find that its protective connection is not

established.

Cause Analysis

1. Check the connection status of the call with the set-up failure and

find no abnormal attributes configuration or limit strategy.

2. Check the TE resource of the line, and find that an NE at the

broken fiber has only two-line STM-16 resources. Among them,

an optical direction has already displayed 16 AU4s of the total

bandwidth, and the idle bandwidth is 0, which indicates that all

bandwidth is engaged.

3. Due to insufficient bandwidth, other services cannot set up new

call connection over this line.

Page 76: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-40-

Troubleshooting

1. Check the fully-configured line resource, and confirm the service

path passing through it.

2. Check if the service path can travel other route, and empty the

AU4 resource as possible.

3. Optimize the rerouting method manually and designate the path of

call service to recover the interrupted connection. Or, select the

service, and send the command of startup recovery to recover the

service.

Conclusion

1. During the configuration of the Mesh network, services should be

balanced, to prevent a large numbers of services from engaging a

line’s resource. Otherwise, when the line is disconnected, there are

no enough idle timeslots to be allocated for connection recovery.

2. When the service is interrupted, check the line the service passing

through to see if there is idle bandwidth for newly established

connection. If the bandwidth is not enough, adjust part

connections by hand and recover the service in priority. It is

suggested to use no higher than 60% network resources, so that

the left resources can be reserved for recovery.

Page 77: MSTP Routine Troubleshooting Manual (Issue 1)

ASON Faults

9.2 Call Connection Cannot Reply 2—Restriction of Route Policy

Fault Description

In 1+1 SNCP protection service of a Mesh network, after an optical

path is broken, it shows that the protective route is interrupted, and the

protective recovery connection cannot be set up, as shown in Figure 9-1.

In the figure, the protective route is the broken route in red.

Figure 9-1 1+1 SNCP Service Protection

-41-

Page 78: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-42-

Cause Analysis

1. Check the call attribute setting of SNCP service, and the TE link

the protective connection passing through. It is found that the

route policy set for this call service is: selecting “node irrelevant”

and “link irrelevant” items.

2. The service-set route policy has removed the interrupted TE link.

The protective route does not satisfy “node irrelevant” and should

pass through a node in the work path. Therefore, when calculating

the route, the control panel believes that this protective connection

has no route satisfying the condition.

Troubleshooting

1. Edit the route policy of this 1+1 SNCP and deselect the “node

irrelevant” item. Then optimize re-routing for protective

connection, to set up protective connection automatically.

2. Or keep the present route policy unchanged and wait for the

recovery of lines and protective route.

Conclusion

1. When configuring 1+1 SNCP service, take the route in the existing

network into consideration, including work/protective route and

the possible recovery route. If a section of optical path is broken,

the work/protective connection should be able to find the third

independent route as the recovery route.

Page 79: MSTP Routine Troubleshooting Manual (Issue 1)

ASON Faults

-43-

2. When configuring 1+1 SNCP service, if not to select “node

irrelevant” or “link irrelevant”, the work/protective connection

may pass through the same node or link during recovery, which

leads to the risk of interruption in work and protection

simultaneously.

3. It is suggested to set the “reply” attribute when configuring 1+1

SNCP service, so that the service can automatically reply after the

fault recovery of initial work/protecting route.

Page 80: MSTP Routine Troubleshooting Manual (Issue 1)
Page 81: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

-45-

Chapter 10 Interconnection Faults

10.1 10G Optical Boards of ZTE S390 Equipment and Marconi MSH64 Equipment Fails in Interconnection

Fault Description

In a network, ZTE S390 equipment’s 10G optical board is

interconnected with Marconi MSH equipment’s 10G optical board.

The interconnection fails after the debugging by the NM software for

several times. ZTE equipment reports for LOF error, and Marconi

equipment reports for RS great error code.

Cause Analysis

Test ZTE S390 equipment’s 10G optical board by using the SHD

analyzer (ONT50) and no problem is found. No matter adopting the port

self-loop or VC4 timeslot self-loop mode, the meter tests with no

problem and no alarm appears. The error code is 0.

Test Marconi MSH equipment’s 10G optical board. After setting

loopback for the board, the meter tests with problem, and there is

always the alarm code. Get the signal frame transmitted by the

Page 82: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

equipment, and the A2 byte is found to be inconsistent with the

international standard. Refer to Figure 10-1 and Figure 10-2.

Figure 10-1 SDH Frame Overhead 1

-46-

Page 83: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

Figure 10-2 SDH Frame Overhead 2

Hint:

SDH standard’s explanation on A1 and A2 bytes:

The function of framing bytes is to distinguish the starting point of

frame, so that the receiving end and the transmitting end can keep

frame synchronization. The first step of receiving SDH code streams is

to select and separate each STM-N frame from the received signal

-47-

Page 84: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-48-

streams correctly. That is, first locate the starting position of each

STM-N frame, and then identify the position of corresponding overhead

and payload in each frame. A1 and A2 bytes can perform the function of

framing. Through it, the receiving end can locate and separate the

STM-N frame from the information flow, and then find a VC

information packet in the frame through the location of the pointer.

How the receiving end locates the frame through the A1 and A2 bytes?

A1 and A2 have fixed value, namely, fixed bit pattern: A1: 11110110

(F6H), A2: 00101000 (28H). The receiving end checks each byte in the

signal flow. When 3N A1 (F6H)s appear successively, and 3N A2

(28H)s appear subsequently (STM-1 frame has three A1 and A2 bytes

respectively), it judges that it has received one STM-1 frame. The

receiving end distinguishes different STM-1 frames by locating the

starting point of each STM-1 frame, to reach the aim of separating

different frames.

If correct A1 and A2 bytes cannot be received from more than five

frames (625μs) successively; that is, if the framing bytes cannot be

distinguished for more than five frames successively (to distinguish

different frames), the receiving end enters into the loss of frame

alignment (LOA) status and generates related alarms—OOF. If OOF

lasts for 3 ms, it enters into the loss of frame (LOF) status, and the

equipment generates LOF alarms. That is, it sends AIS signal to the

Page 85: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

-49-

downstream direction. The whole service is interrupted. In the LOF

status, if the receiving end receives correct A1 and A2 bytes for over 1

ms, the equipment returns to the framing status (IF) in normal

operation.

Since the 10G optical board of Marconi MSH equipment is not

configured with A2 value as per the standard, it cannot be

interconnected with other manufacturer’s equipment. Since A2 value is

the fixed bit pattern prescribed in the standard, no manufacturer is

allowed to adjust it without permission. ZTE also cannot adjust it

according to Marconi’s setting. Therefore, the 10G optical boards of

ZTE S390 equipment and Marconi MSH equipment cannot be

interconnected successfully.

Troubleshooting

There is no solution at present. Since Marconi MSH equipment’s 10G

optical board is not configured with A2 value as per standard, it cannot

be interconnected with other manufacturer’s equipment.

10.2 C2 Byte Causes ATM Service Interconnection Failure

Fault Description

The former network (consisting of three-end ZTE S600 V2 equipments)

and one ZXMP S360 equipment form a 622 M ring network, as shown

Page 86: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

in Figure 10-3. Among them, one ZXMP S320 equipment is connected

with the ATM equipment through the 155M optical path. The gateway

NE is ZXMP S360, which is interconnected with Huawei’s transmission

equipments through the 155M optical boards, and Huawei’ equipment is

connected with the ATM equipment at the other end.

The whole service channel is 622M and in direct-through. The service

operates normally.

Figure 10-3 Former Network Structure

After the gateway is replaced from ZXMP S360 to ZXMP S380

equipment, the OL1 optical port of ZXMP S380 equipment reports the

-50-

Page 87: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

VC4-RDI error, and the ATM data service fails, as shown in Figure

10-4.

Figure 10-4 Network Structure after Adjustment

Cause Analysis

ZXMP S380 equipment’s processing of C2 value is most likely to be

the problem.

Besides, it is also likely to be caused by inconsistent C2 values of the

interconnected transmission equipments belonging to two different

manufacturers. When the ATM data equipment detects that the C2

value is non x13 or 0x01, the signal is deemed as invalid.

-51-

Page 88: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-52-

When ZXMP S360 equipment is interconnected with Huawei’s

equipment, the C2 valued received by Huawei’s equipment is detected

as 0x01. Therefore, if the C2 value transmitted by ZTE’s equipment is

also 0x01, the service will be successful.

The following are the processing of common optical boards over C2

value:

ZXMP S360 equipment’s OL1 optical board terminates and

regenerates the C2 value. The transmission value is 0x01 by

default.

ZXMP S320 equipment’s O4CSD optical board terminates and

regenerates the C2 value. The transmission value is 0x02 by

default.

ZXMP S320 equipment’s OIB1 optical board terminates and

regenerates the C2 value. The transmission value is 0x01 by

default.

ZXMP S380 equipment’s OL1 and OL4 optical boards feed

through the C2 value.

Therefore, it is OK to interconnect with the ZXMP S360 equipment. As

for the failure after the gateway NE is replaced with the ZXMP S380

equipment, it is because that the C2 value transmitted by the ZXMP

S320 equipment is 0x02, being inconsistent with the C2 value of the

interconnected equipment at the other end.

Page 89: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

-53-

As for the 2500E equipment reporting VC4-RDI error, it is because that

Huawei’s equipment sets the higher order overhead as feed-through and

it is the remote alarm sent from the ATM data equipment side. When

Huawei’s equipment terminates the C2 value, there is no alarm.

Troubleshooting

1. At the OL1 optical port of ZXMP S380 equipment, implement

loopback at the port line side for the ATM data equipment at the

Huawei equipment side. The service is not successful.

2. Implement loopback for the ATM equipment at Huawei’s

equipment and the service is recovered. Here, Huawei’s optical

port is in feed-through status for higher order overhead.

3. Set the higher order overhead of Huawei’s optical board for

interconnection as terminated. The RDI error of ZTE ZXMP S380

equipment’s OL1 board disappears, yet the service is still

unsuccessful. The terminal-loopback service at the Huawei

equipment side also fails.

4. The C2 byte transmitted by the ZXMP S380 equipment is 0x02, as

read at Huawei’s equipment. The C2 value transmitted by

Huawei’s equipment is 0x13, as read at the 2500E equipment.

5. To recover the service, interconnect the ATM service through the

OL1 board of ZXMP S360 equipment. That is, connect the OL1

Page 90: MSTP Routine Troubleshooting Manual (Issue 1)

MSTP Routine Troubleshooting Manual

-54-

board of II-model equipment between ZXMP S380 equipment and

Huawei’s equipment in a chain.

Conclusion

The transmission equipment’s rules of detecting the trace ID

mismatching alarm (J0、J1、J2) are as below:

1. If the NM software is configured with the expected value, the

transmission equipment will detect according to the expected

value.

2. If the NM software is not configured with the expected value or if

the configured expected value is deleted, the board deems the

expected value to be any character. That is, any value received

will not be deemed as the mismatching of trace ID.

Note:

Since it is a gross command, the delete command will not be shown. It

just has no trace ID item in the overhead setting command.

The rules of setting boards J0, J1, and J2 are as below:

Adopt the 16-byte frame format of E.164.

Page 91: MSTP Routine Troubleshooting Manual (Issue 1)

Interconnection Faults

-55-

Set as UNITRANS by default.

Input “0DH 0AH” for the last two positions.

Fill other vacant positions with “20H” (space).

The settings of the NM software exclude the check value.

The transmit value and expected value set at the NM software

should no higher than 13 digits.

Explanation on C2 byte:

For time division and crossing, the transmit value of C2 byte is

fixed as 02.

For ET1, TT1, and the data board using TU11 and TU12, the

transmit value of C2 byte is fixed as 02.

For ET3, TT3, and VC4, the transmit value of C2 is fixed as 02,

and VC3’s C2 value is fixed as 04.

For ET4, the transmit value of C2 is 0x12.

For the data board using VC4, the transmit value of C2 is 0x16

(HDLC/PPP encapsulated), 0x18 (LAPS encapsulated), or

0x1B (GFP encapsulated).

For the ATM board, the transmit value of C2 is 0x13.