iub troubleshooting.pptx

36
INITIAL Iub TROUBLESHOOTI NG OVERVIEW

Upload: nicholas-wilson

Post on 25-Dec-2015

125 views

Category:

Documents


21 download

TRANSCRIPT

Page 1: IUB TROUBLESHOOTING.pptx

INITIAL IubTROUBLESHOOTIN

GOVERVIEW

Page 2: IUB TROUBLESHOOTING.pptx

Agenda Objective CPP O&m Concepts Protocols O&m Client Services Counters Overview Performance Management Iub over ATM Initial Counters Iub Analysis Fail After Admission IP Iub Throughput Questions?

Page 3: IUB TROUBLESHOOTING.pptx

OBJECTIVE

Main idea is introduce to the transport engineer the basic concepts of troubleshooting on Iub interface, by presenting initial counters and KPIs, that could help to define which area needs further investigations.

Based on these conclusions, network optimization services can be performed.

Page 4: IUB TROUBLESHOOTING.pptx

Moshell is a suite of tools for O&M of CPP-based nodes.

CPP is the Connectivity Packet Platform on which are based the following nodes: RNC, RBS, MGW, RXI.

Information collected by CPP counters every 15 minutes in stored in xml files (ROP files).

Information are read and stored into a SQL database on a daily basis. 

 

CPP O&M CONCEPTS

Page 5: IUB TROUBLESHOOTING.pptx

Protocols used for accessing these services:› http› unsecure protocols (unencrypted): telnet, ftp, iiop› secure protocols (encrypted): ssh, sftp, ssliop

NODE

OSE shell (COLI)

File system

MIB

CM (Configuration Mgmt) FM (Fault Mgmt) PM (Performance Mgmt)

HTTP (80) FTP (21) / SFTP (22)

TELNET (23) / SSH (22)

IIOP (56834)

/ SSL IOP (56836)

TCP/IP

Ethernet or IPoverATM

RS232

MoShell

Hyper Terminal

Scanners

PROTOCOLS

Figure 1 - Protocols

Page 6: IUB TROUBLESHOOTING.pptx

The O&M client services

› Configuration Service (CS): Read and change configuration data; configuration data is stored in the MO attributes

› Alarm Service (AS): Retrieve the list of alarms currently active on each MO

› Notification Service (NS): Subscribe and receive notifications from the node, informing about parameter/alarm changes in the MOs

› Inventory Service (IS): Get a list of all HW and SW defined in the node

› Log Service (LS): Save a log of certain events such as changes in the configuration, alarms raising and ceasing, node and board restarts

› Performance Measurement (PM): Setup that are stored in MO pm-attributes and output to an XML file every 15 minutes.

Page 7: IUB TROUBLESHOOTING.pptx

COUNTERS overviewCOUNTER TYPES:

• Peg: a counter that is increased by 1 at each occurrence of a specific activity.

• Gauge: a counter that can be increased or decreased depending on the activity in the system.

• Accumulator: a counter that is increased by the value of a sample. It indicates the total sum of all sample values taken during a certain time. The name of an accumulator counter begins either with pmSum or pmSumOfSamp.

• Scan: a counter that is increased by 1 each time the corresponding accumulator counter is increased. It indicates how many samples have been read.

• Probability Density Function (PDF): is a list of range values. If the value falls within a certain range, the range counter for that range is increased.

Page 8: IUB TROUBLESHOOTING.pptx

COUNTERS OVERVIEW Counter Reset Behavior

Counter values can be either reset at the end of ROP Period or can be accumulated up to the counter limit.

In a counter that is not reset after ROP period, the incremented value during a ROP period is the difference between two consecutive ROPs.

Counter Classification

Counters can be grouped by NE Type:

RNC

RXI

RBS

Or by area of interest:

Radio Network – RNC specific counters

Radio Network – RBS specific counters

Transport Network counters

Page 9: IUB TROUBLESHOOTING.pptx

iUb over atm

RNC

Locally terminated AAL2 conns

Locally terminated AAL2 signalling

RBS 1(Hub - AAL2

switching)

Forwarded AAL2 signalling

AAL2 switchedconns

RBS 11

RBS 12

RBS 2

Shared PVCs for AAL2 signalling towards RBS1,

RBS12 and RBS 13Shared AAL2 paths for AAL2 conns set up towards RBS1,

RBS11, RBS12

PVCs for Q.2630AAL2 pathsPVCs for NBAP, Node Synch

Note: O&M (Mub) PVCs omitted for simplicity means PVC termination

AAL2 switching cluster

ATM/SDH Transport Network

AAL2 Access Point

Figure 3 - Iub configuration example

Page 10: IUB TROUBLESHOOTING.pptx

iUb over atmAAL2 CAC and resources usage:

AAL2 connection admission control (CAC) is executed before a new AAL2 connection is set up in the system.

AAL2 connections in UTRAN are always initiated by RNC.

RNC reserves a CID and the relevant bandwidth, and forwards the establish request message through the AP. It will contain, the allocated CID, the traffic descriptors and QoS

Page 11: IUB TROUBLESHOOTING.pptx

iUb over atmCID

Because of standardization constrains, no more than 248 AAL2 connections can be simultaneously established on a single AAL2 path: more than 248 connections can be established between two adjacent nodes if more than one AAL2 path is configured.

When an AAL2 connection is allocated on an AAL2 path, a Channel Identifier (CID) is reserved and assigned by the node that is originating or forwarding the AAL2 connection request.

Figure 4 – AAL2 Connections table

Page 12: IUB TROUBLESHOOTING.pptx

iUb over atmIn particular:

The AAL2 path capacity assumed by CAC is equal to: the configured PCR, for CBR AAL2 paths the configured MCR, for UBR+ AAL2 paths zero, for UBR AAL2 paths

Flow Control:

The Flow Control function has been conceived to dynamically adapt transmission rate of Best Effort services to Iub available bandwidth by reducing transmission rate during Iub congestion situations

Page 13: IUB TROUBLESHOOTING.pptx

Initial counter checkRecommended to check in an initial investigation as they will give clues

on whether the source of the problem is transport network based.

Checking if the number of Unsuccessful local or remote AAL2 connections is increasing will indicate where potential problems exist, at the NodeB, RXI or RNC. The ‘OutConns’, viewed at AAL2 Access points in RNC looking towards the RXI/NodeB, and AAL2 Access Points in the RXI looking towards the NodeBs are the best counters to observe.

Aal2AppmUnSuccOutConnsLocalQosClassA/B/C/D

Aal2Ap pmUnSuccInConnsLocalQosClassA/B/C/D

Aal2AppmUnSuccOutConnsRemoteQosClassA/B/C/D

Aal2AppmUnSuccInConnsRemoteQosClassA/B/C/D

Page 14: IUB TROUBLESHOOTING.pptx

Initial counter checkThe following counters show the BW utilization.

› VclTp, VplTp, Atmport pmBwUtilizationRx;

pmBwUtilizationTx

To check ATM links utilization› VclTp, VplTp, Atmport pmTransmittedAtmCells

pmReceivedAtmCells

To show number of RRC/RAB Establishment failures after admission› Utrancell pmNoFailedAfterAdm

Page 15: IUB TROUBLESHOOTING.pptx

Initial counter checkTo check for congestion in the control plane

Iub interface UniSaalTp pmNoOfLocalCongestions NbapCommon pmNoOfDiscardedNbapMessages Iublink pmTotalTimeIublinkCongestedDl

Iu/Iur interface NniSaalTp pmNoOfLocalCongestions

To check for interface availability

Iub interface UniSaalTp pmLinkInServiceTime

Iu/Iur interface NniSaalTp pmLinkInServiceTime

Page 16: IUB TROUBLESHOOTING.pptx

Initial counter checkThe following counter shows if Iub Bandwidth is limiting HS services, measured

in %.

OBS. if > 75% cause could be Iub capacity or Radio limitations.

IubDataStreams

pmCapAllocIubHsLimitingRatioSpi<xx>

 

To see HS frame loss IubDataStreams pmHsDataFramesLostSpi<XX> IubDataStreams pmHsDataFramesReceivedSpi<XX>

 

To check ATM link quality Aal2PathVccTp, pmBwLostCells Aal5TpVccTp,VpcTp pmFwLostCells

Page 17: IUB TROUBLESHOOTING.pptx

Initial counter checkCheck the physical layer quality of the transmission link› ImaLink pmSesIma

pmSesImaFe

pmUasIma

pmUasImaFe

› ImaGroup pmGrUasIma

› E1PhyspathTerm,

E1Ttp,E3PhysPathterm pmEs

pmSes

pmUas

 › Os155SpiTtp pmMsEs

pmMsSes

pmMsUas

pmMsBbe

 › Vc12Ttp,Vc4Ttp pmVcEs

pmVcSes

pmVcUas

Page 18: IUB TROUBLESHOOTING.pptx

Iub analysisThe following flowchart summarises an Iub link analysis

procedure based on AAL2 Setup failure rate examination.Strict Admission Traffic

AAL2 Setup Failure

No AAL2 Setup FailureOK

AAL2 Setup Failure

Local

Remote

Lack of CID

Lack of Bw

Bad TN quality

Create MoreClass A VCs

Check PhysicalLayer Quality

Best Effort TrafficNo AAL2 Setup Failure

Check FlowControl Counters

AAL2 Setup Failure

Local

Remote

Lack of CID

Bad TN quality

Create MoreClass B&C VCs

Check PhysicalLayer Quality

Page 19: IUB TROUBLESHOOTING.pptx

AAL2 Setup Failure RateThe following KPIs and AAL2Ap counters are suggested to monitor the AAL2 Setup Failure rate on an Iub link.

Counters Aal2Ap::pmUnSuccOutConnsLocalQoSClass<x> (A/B/C/D)

Number of unsuccessful attempts to allocate AAL2 resources during establishment of outgoing connections on this Access Point (AP). Caused by Rejects in Connections Admission Control (CAC).

  Aal2Ap::pmUnSuccOutConnsRemoteQoSClass<x> (A/B/C/D)

Number of unsuccessful establishments of outgoing connections on this AAL2 Access Point (AP).

Aal2Ap::pmSuccOutConnsRemoteQosClass<x> (A/B/C/D)

Number of successful establishments of outgoing connections on this AAL2 Access Point (AP).

Page 20: IUB TROUBLESHOOTING.pptx

AAL2 Setup Failure Rate KPIs

ssAmoteQoSClatConnspmUnSuccOulQoSClassAtConnsLocapmUnSuccOussAmoteQoSClaonnspmSuccOutC

lQoSClassAtConnsLocapmUnSuccOu

ClassALocalRateFailAAL

ReRe

%100*

]%____2[

ssAmoteQoSClatConnspmUnSuccOussAmoteQoSClaonnspmSuccOutC

ssAmoteQoSClatConnspmUnSuccOuClassAmoteRateFailAAL

ReRe

%100*Re]%_Re___2[

Similar formulae can be used for Class B & Class C.

The AAL2_Fail_Rate_Local_ClassA KPI signals possible problems in the Iub section between the RNC and the next connected node (NodeB or RXI).

The AAL2_Fail_Rate_Remote_ClassA KPI signals possible problems in the Iub section between any intermediate RXI.

Page 21: IUB TROUBLESHOOTING.pptx

CID Utilization EstimateThis is a crude method of calculating the number of CIDs as it does not distinguish between traffic types.

There is a second method using Erlang Counters, that won’t be demonstrated on this presentation.

Counters  Aal2Ap:: pmExisTransConns

The number of existing connections for the Access Point (AP) existing in the node.. Gauge Counter

  Aal2Ap:: pmExisOrigConns

Number of existing connections for the Access Point (AP) originating in this node.

Gauge Counter.

  Aal2Ap:: pmExisTermConns

Number of existing connections for the Access Point (AP) terminating in this node.

Gauge Counter.

Page 22: IUB TROUBLESHOOTING.pptx

CID Utilization EstimateKPI  

where n is the number of paths per AAL2 Access Point.

 

Note: if the RXI is a pure AAL2 switching node, then the pmExisOrigConns and pmExisTermConns counters can be discounted as there can be no originated or terminated connections in the node, only transiting connections.

  This method of CID calculation gives a basic estimate of CID utilization.

In a typical Iub link with one VC (normally vc39) defined for Strict Admission traffic and one VC (normally vc50) defined for Best Effort traffic, the division by 2 in the formula will average the total number of used CIDs over both traffic types. For example, if the counter returns a value of 360, it is not known if this is 180 CIDs in both ClassA and ClassB&C, or maybe 240 in ClassA and 120 in ClassB&C. If it is the latter, then VC expansion is needed, as the maximum number of CIDs allowed per path (248) is being reached.

n

sConnspmExisTranConnspmExisTermConnspmExisOrigsConnectionNoAverage

][__

Page 23: IUB TROUBLESHOOTING.pptx

BW Utilization EstimateBandwidth utilization can be measured per VP and also per VC using counters.

To monitor Best Effort VC utilization is better use ‘Flow Control’ methodology.

Counters

VplTp:: pmTransmittedAtmCells = Number of transmitted ATM cells. This counter is incremented for each transmitted ATM cell. Peg counter.

VplTp:: pmReceivedAtmCells = Number of received ATM cells. This counter is incremented for each received ATM cell. Peg counter.

KPIs %100**)(_

::___2

CRegressAtmPsLengthMeas

stedAtmCellpmTransmitVplTpTxnUtilisatioVPAAL

%100**)(_

Re::___2

PCRingressAtmsLengthMeas

ellsceivedAtmCpmVplTpRxnUtilisatioVPAAL

Page 24: IUB TROUBLESHOOTING.pptx

Physical Layer QualityTN quality

Several counters are available to monitor the availability and the quality of physical and IMA terminations in CPP nodes.

Errored Seconds (ES): seconds with block errors during the PM interval. These counters are incremented for each second where one or more blocks with one or more errors are received.

Severely Errored Seconds (SES): seconds during available time having a severe bit error rate.

Unavailable Seconds: the accumulated unavailable time in seconds during the interval. Unavailable time starts when 10 consecutive SES are detected, and ends when 10 consecutive non-SES are detected. These counters are incremented for each second of unavailable time

Page 25: IUB TROUBLESHOOTING.pptx

Flow Control HSDPA Congestion KPIs:

xxiamesLostSppmHsDataFrxxceivedSpiamespmHsDataFr

xxiamesLostSppmHsDataFrsRatioHSFrameLos

Re

%100*

High frame loss indicates potential congestion problems. <xx> = the supported SPI (Scheduling Priority Indicator)

xxbSpiameDelayIupmHsDataFrtionayDistribuHSFrameDel

This counter indicates the percentage of times where Iub congestion has occurred per SPI (Scheduling Priority Indicator).

Experience has shown that in high loaded Iub cases, this counter could reach values of about 65–75%.

Page 26: IUB TROUBLESHOOTING.pptx

Flow ControlLow HS Throughput Site Analysis Study Case

 Counters were extracted and graphs plotted to illustrate the HS Frame Loss Ratio and HSLimitIub KPIs over time

Page 27: IUB TROUBLESHOOTING.pptx

Flow Control Examining the KPIs resulting graphs below, it was evident that the channel normally reserved for ClassA traffic (vc39), was experiencing abnormally high bandwidth utilization.

The ClassB&C traffic channels (vc50 & vc51) were experiencing abnormally low utilization (next slide).

Page 28: IUB TROUBLESHOOTING.pptx

Flow Control

Page 29: IUB TROUBLESHOOTING.pptx

Flow ControlEnhanced Uplink Congestion KPIs

High frame loss indicates potential congestion problems.

This counter is difficult to post process, so is only recommended to be used with troubleshooting rather than performance monitoring

%100*Re

___FramesLostpmEdchDataceivedFramespmEdchData

FramesLostpmEdchDataRatioLossFrameEul

IubFrameDelaypmEdchDataonDistributiDelayFrameEul ___

Page 30: IUB TROUBLESHOOTING.pptx

Failure After AdmissionWhat is ‘Failure After Admission’?

refers to an RRC/RAB setup failure that occurs after the user has been admitted to the network.

Admission to the network occurs when the user successfully completes an initial RRC Connection Setup request.

An RRC failure that occurs after the initial admission could be if the user wanted to upswitch to a higher rate while on an existing call and the upswitch could not be achieved, due to lack of resources (Radio or Transport). This would be perceived by the user as a slow connection.

On the other hand, a RAB setup failure would be perceived by the user as a failure to setup a call.

Page 31: IUB TROUBLESHOOTING.pptx

Failure After AdmissionIn general, high ‘Failure After Admission’ occurrences are mainly due to:

Transport Network: lack of BW/CIDs, or, Radio Network: lack of Channel Element Availability.

Failure After Admission’ Study CaseTo perform this study case the following procedure is performed:

Identification of a problem site, by extraction of pmNoFailedAfterAdm counter.

AAL2 Setup Failure Rate, counter retrieval and KPI calculation. Graphical Analysis to establish correlation between both

Page 32: IUB TROUBLESHOOTING.pptx

Graphical AnalysisFCAN05A pmNoFailedAfterAdm vs Time

0

50

100

150

200

250

300

350

2009

-04-

24,0

0:00

2009

-04-

24,0

2:15

2009

-04-

24,0

4:30

2009

-04-

24,0

6:45

2009

-04-

24,0

9:00

2009

-04-

24,1

1:15

2009

-04-

24,1

3:30

2009

-04-

24,1

5:45

2009

-04-

24,1

8:00

2009

-04-

24,2

0:15

2009

-04-

24,2

2:30

2009

-04-

25,0

0:45

2009

-04-

25,0

3:00

2009

-04-

25,0

5:15

2009

-04-

25,0

7:30

2009

-04-

25,0

9:45

2009

-04-

25,1

2:00

2009

-04-

25,1

4:15

2009

-04-

25,1

6:30

2009

-04-

25,1

8:45

2009

-04-

25,2

1:00

2009

-04-

25,2

3:15

2009

-04-

26,0

1:30

2009

-04-

26,0

3:45

2009

-04-

26,0

6:00

2009

-04-

26,0

8:15

2009

-04-

26,1

0:30

2009

-04-

26,1

2:45

2009

-04-

26,1

5:00

2009

-04-

26,1

7:15

2009

-04-

26,1

9:30

2009

-04-

26,2

1:45

Time

pm

No

Fai

led

Aft

erA

dm

pmNoFailedAfterAdm

AAL2 Setup Failure Rate vs Time

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

2009

/4/2

4 0:

0

2009

/4/2

4 3:

0

2009

/4/2

4 6:

0

2009

/4/2

4 9:

15

2009

/4/2

4 12

:30

2009

/4/2

4 15

:30

2009

/4/2

4 18

:30

2009

/4/2

4 21

:30

2009

/4/2

5 0:

30

2009

/4/2

5 3:

30

2009

/4/2

5 6:

30

2009

/4/2

5 9:

30

2009

/4/2

5 12

:30

2009

/4/2

5 15

:30

2009

/4/2

5 18

:30

2009

/4/2

5 21

:30

2009

/4/2

6 0:

30

2009

/4/2

6 3:

30

2009

/4/2

6 6:

45

2009

/4/2

6 9:

45

2009

/4/2

6 12

:45

2009

/4/2

6 15

:45

2009

/4/2

6 18

:45

2009

/4/2

6 21

:45

Time

Rat

e % ClassA

ClassB

ClassC

Page 33: IUB TROUBLESHOOTING.pptx

IP Iub ThroughputThe client should define a user throughput threshold, in order to identify the bandwidth target to be delivered (in average) for user.

After that, this threshold should be compared with actual customer average throughput, as defined below:

THROUGHPUT PER USER:

This formula calculates the average Bit-rate per user on Iub interface.

Cells

Cellsskbit sPerCellAvNrHsUser

ntHscVolumePsIpmDlTraffi

sLengthMeassAvUserThrH

)(_

1/

ishhRabEstablestPsHsAdcpmSamplesB

EstablishsHsAdchRabpmSumBestPsPerCellAvNrHsUser

mHsRabEstestPsStreapmSamplesB

abEstsStreamHsRpmSumBestP

bEstablishestPsEulRapmSamplesB

ablishsEulRabEstpmSumBestP

Page 34: IUB TROUBLESHOOTING.pptx

IP Iub ThroughputIf the throughput per user is below defined threshold, should be identified if it has been limited by ‘Flow Control’. This can be done using Iub congestion counter:

Other indication that the transport network is overloaded, could be measured by frame loss counter, that should present values below 2%.

xxspiitingratioIubHspmCapAllocHSLimitIub lim

xxiamesLostSppmHsDataFrxxceivedSpiamespmHsDataFr

xxiamesLostSppmHsDataFrsRatioHSFrameLos

Re

%100*

If frame loss counter returns low values, and Iub presents no limitation

Page 35: IUB TROUBLESHOOTING.pptx

IP Iub ThroughputRNC Iub throughput monitoring KPIs:

Average Iub throughput:

Average Iub throughput regulated within ROP:

Periods of Iub Throughput limitation:

This KPIs observation alows to understand when low performance is due internal RNC limitation, and not by transport network.

apacitypmSamplesC

itypmSumCapacIUB_THR / skbit

gulation

gulationREG skbit ReapacitypmSamplesC

ReitypmSumCapacIUB_THR_ /

egulatedeCapacityRpmTotalTimDURATION_REG_THR_IUB sec

Page 36: IUB TROUBLESHOOTING.pptx

IP IUB EVALUATION FLOWCHART