mb replace

22
Doc Rev -006 README FIRST This AP has been updated to include commands for systems running "Cluster-Mode" (C-Mode) ONTAP. The login name for C-Mode systems is "admin", not "root". The ONTAP version and mode is listed in your dispatch! NSE is supported in DOT 8.1 and higher, 7-mode only - No cluster mode support at this time. The DOT version and mode is listed in your dispatch! The "badging" on the NSE disk canister is embossed as compared to standard disk badging. See picture >> here Presently all NSE systems are HA configured, and all disks in all shelves must be NSE disks. Bug Notes: 1 2 3 Link to Statement of Volatility by Platform is: http://support.netapp.com/info/web/ECMP1132988.html From clustershell, nodeshell commands can be entered by prefacing the 7-Mode command with “run local". Ex: netapp01::> run local sysconfig -v Note, all 7-Mode commands are not supported in C-Mode. This AP has been updated to include additional commands and procedures for a system configured with NSE (NetApp Storage Encryption ) disks. AP doc rev is at top of page - If using hard-copy for secure site, be sure to print all the linked documents in this AP. 489060 3 NDMP, Qtree-SnapMirror, Vol-SnapMirror or SnapVault processes can hang TO/GB See Note 3 TSB-1110- 04 1 TSB-1110-04 is an internal Bulletin: When the disk reassign is performed on the partner (HA-takeover) the GB must be immediately performed, a TO/GB from the repaired node is required to sync the system-IDs. In some versions of ONTAP when the 'disk reassign' command is executed from the partner, ONTAP may print out a warning that states 2 things. (i) The giveback must be done right way - IF a GB will not be immediately performed, the disk reassign needs to be post-poned. (ii) A second TO/GB should be performed from the repaired node. This is covered in the AP. (TSB-1110-04) IF this system has a partner AND the partner did NOT takeover this controller, it is still necessary to sync the new system-ids by executing a TO/GB from the repaired node although no console message is displayed. This is covered in the AP. The AP will cover asking the customer if they are running these processes. If so, there is a link how to disable them. See Note 1 590488 2 In “disruptive” MB w/NVMEM replacements, a TO/GB from the repaired node is req'd. See Note 2 No "Failed" Disks can exist in the target node in a HA config or the disk reassign will not execute. The AP covers this. Known Bugs/Issues - Bug Table and Notes Below Bug Description First Fixed Release README FIRST C-Mode : Has two console command shells, clustershell and nodeshell. The default shell is clustershell. The clustershell if the console prompt includes a double colon ( ) prior to the ">" sign, Ex:, netapp01 > To switch from clustershell to nodeshell, enter 'run local' at the ::> prompt, then the double colons (::) are removed. To exit nodeshell, enter 'exit' or Ctrl-D. disk reassign: A giveback must be done immediately following a reassign of partner disks. After the partner node becomes operational, do a takeover and giveback of this node to complete the disk reassign process. Do you want to continue (y/n)? :: :: Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Upload: ashokjcb

Post on 30-Nov-2015

201 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: MB Replace

Doc Rev -006

README FIRST This AP has been updated to include commands for systems running "Cluster-Mode" (C-Mode) ONTAP.

● The login name for C-Mode systems is "admin", not "root". ● The ONTAP version and mode is listed in your dispatch!●

● NSE is supported in DOT 8.1 and higher, 7-mode only - No cluster mode support at this time.

● The DOT version and mode is listed in your dispatch!

● The "badging" on the NSE disk canister is embossed as compared to standard disk badging. See picture >> here● Presently all NSE systems are HA configured, and all disks in all shelves must be NSE disks.

Bug Notes:1

2

3

Link to Statement of Volatility by Platform is: http://support.netapp.com/info/web/ECMP1132988.html

From clustershell, nodeshell commands can be entered by prefacing the 7-Mode command with “run local".Ex: netapp01::> run local sysconfig -v Note, all 7-Mode commands are not supported in C-Mode.

This AP has been updated to include additional commands and procedures for a system configured with NSE (NetApp Storage Encryption ) disks.

AP doc rev is at top of page - If using hard-copy for secure site, be sure to print all the linked documents in this AP.

489060 3 NDMP, Qtree-SnapMirror, Vol-SnapMirror or SnapVault processes can hang TO/GB See Note 3

TSB-1110-04 1

TSB-1110-04 is an internal Bulletin: When the disk reassign is performed on the partner (HA-takeover) the GB must be immediately performed, a TO/GB from the repaired node is required to sync the system-IDs.

In some versions of ONTAP when the 'disk reassign' command is executed from the partner, ONTAP may print out a warning that states 2 things. (i) The giveback must be done right way - IF a GB will not be immediately performed, the disk reassign needs to be post-poned. (ii) A second TO/GB should be performed from the repaired node. This is covered in the AP. (TSB-1110-04)

IF this system has a partner AND the partner did NOT takeover this controller, it is still necessary to sync the new system-ids by executing a TO/GB from the repaired node although no console message is displayed. This is covered in the AP.The AP will cover asking the customer if they are running these processes. If so, there is a link how to disable them.

See Note 1

590488 2 In “disruptive” MB w/NVMEM replacements, a TO/GB from the repaired node is req'd. See Note 2

No "Failed" Disks can exist in the target node in a HA config or the disk reassign will not execute. The AP covers this.

Known Bugs/Issues - Bug Table and Notes BelowBug Description First Fixed Release

README FIRST

C-Mode: Has two console command shells, clustershell and nodeshell. The default shell is clustershell. The clustershell if the console prompt includes a double colon ( ) prior to the ">" sign, Ex:, netapp01 >To switch from clustershell to nodeshell, enter 'run local' at the ::> prompt, then the double colons (::) are removed. To exit nodeshell, enter 'exit' or Ctrl-D.

disk reassign: A giveback must be done immediately following a reassign of partner disks. After the partner node becomes operational, do a takeover and giveback of this node to complete the disk reassign process. Do you want to continue (y/n)?

:: :: :: ::

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 2: MB Replace

I. Appliance / PCM Visual Checks IX. Verify Battery StatusII. Node Pre-Checks X. Run Diagnostics (20-30 min)III. Node State Check and Shutdown Procedure XI. Verify FC Adapter ConfigurationIV. Capture the Current System Configuration XII Capture new System-ID on replacement ControllerV. Remove the cables and extract the PCM XIII. Disk ReassignVI. Move the battery, SFPs - Exchange the CF Cards XIV. Boot PROM Variable ChecksVII. XV. Boot the Operating System - 'cf giveback' if applicable

XVI. NetApp Storage Encryption (NSE) System? VIII. Set date and time on the RTC XVII. Controller Reg., Enable options, Submit logs, Part Return

I.Step

1 Visually verify if you are working on correct model and READ the STOP box and the other note boxes below.The FAS2040 Appliance has 1 or 2 Processor Controller Module(s) (PCM) integrated into a 12 Bay Shelf

Fig 1

Fig 2

Rear View

Fig 3

2 Continue with Section I on next page.

Action Description

Page 1 of 21

SECTION OUTLINE of a FAS2040 Appliance Processor Controller Module (PCM) ReplacementThis procedure will take 60-90 minutes

FAS2040: Appliance / PCM Visual Checks

Partially Reinsert the Replacement PCM and Reconnect the cables

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

PS-2 AC Switch

One Thumbscrew and cam handle to extract each PCM.

A

B

PS-1 AC Switch

IOIOI Console

Port

BMC Port

Fibre Channel Ports: 0a, 0b

1 Orange Thumbscrew to extract each PCM

SAS Adapter

0d ~!

Ethernet Ports: e0a, e0b, e0c, e0d

The NVMEM LED on the faceplate will start flashing when power is removed from the controller if the system is "waiting for giveback", or the system was not shutdown properly (uncommitted data). Follow the steps in Section V carefully.

STOP !!

FAS Model Number

2u

AC

Controller activity LEDs: If LED flashes GREEN, that controller is online.

" ! " LED is ON when hardware failures are detected or if controller failover is disabled.

Each Controller Module, (A or B slot) has it's own System Serial Number

The Status "!" LED will be "ON" if the PCM is faulted or if HA is disabled

HA (Active-Active) Configurations: 2 PCMs, (A & B)

Non-HA Configurations: 1 PCM in the bottom slot

Page 3: MB Replace

I.Step

3

II.Step

1

2 Adhere to anti-static precautions. (A paper ESD strap is included inside the RMA box if you don't have your own)3 Remove the replacement PCM from the anti-static bag and examine the housing and connector for damage.4 Go to Section III "Node State Check and Shutdown" on next page.

Page 2 of 21

FAS2040: Appliance / PCM Visual Checks (cont.) Action Description

FAS2040: Node Pre-Checks Action DescriptionVerify the "Order Reference 8xxxxxxxxx number on the RMA packing slip is the same as the Part Request (PREQ) number listed in your dispatch notes.

Fig 4

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

FAS2040 PCM

Notes: 1. This Action Plan covers Controller running ONTAP 7-Mode or Cluster-Mode .

2. Procedure will take 60-90 minutes or 90- 120 minutes if has NSE Disks. 3. Note the Caution on NVMEM LEDs in Section V. 4. This Action Plan needs to be followed in step order 5. FC port configuration, disk list and the system date are captured prior to removing the original Controller. 6. Compact Flash (CF) Card needs to be moved from the Original PCM to the Replacement PCM. 7. System variables; date-time, disk reassignment and FC port configuration must be verified before rebooting the system. 8. If a HA configuration and ONTAP 8, the console may report you "must perform a final ' cf takeover' and 'cf giveback' from the 'partner node", the node that was repaired to complete the 'disk reassign' process. Follow the new steps in 'Disk Reassign' and 'Boot the OS' sections carefully.

Page 4: MB Replace

III.Step

1To review the Job Aid on how to connect to console (IOIOI) port and serial emulator options, click > Console Attach Aid

NOTE

Chassis Check: To see if two controllers are installed reference HA figures here > HA Figs

2 Check the state of the node by viewing the console port responses from (each) controller if HA (Active-Active) configuration . HA config requires two controller assemblies installed in the same physical chassis. Detailed messages here>

3

NOTE The "LOADER" prompt will include -A if attached to the top controller or -B if attached to the bottom controller.NOTE HA-config Status Command: After logging in, "cf status" will display the state of the HA . Example of >> cf status cmd

4 Dual Controller Configurationa)

5 If the console response is "LOADER-A|B>", go to Section IV.6 Continue with Section III on next page.

Always capture the node’s console output to a text file, ex: “NetApp-dispatch-num.txt”, even if using the end-user's computer.

Visual Chassis Checks FRONT: Look for an Amber Status ( ! ) LED, Fig 5a, then observe which Activity LED is flashing, which is OFF. The activity LED that is not flashing is not running Data Ontap or the controller is not installed.REAR: Look for the controller that has the Status ( ! ) LED ON, Fig 5b. Both could be on, verify which Activity LED is not flashing - Continue with console response checks in step 2.

Appliance CheckNon-HA Controller Configuration: If the console response is "login" or "password" or the <system prompt>, the end-user will have issue a 'halt' on the system for proper shutdown. Work with NGS if you have questions.

STOP!WARNING for HA configurations:If the failure has caused a HA failover you may have been dispatched on the surviving controller's serial number, not the failed one.

Page 3 of 21

FAS2040: Node State Check and Shutdown Procedure Action Description

If both controllers' are UP and Online: the end-user will have to issue a cf takeover from the partner node if controller failover is active or halt it if controller failover is disabled. Work with NGS if you have questions.

" ! " (LED is ON when hardware failures are detected and if a controller failover has occureed or HA is disabled

Controller Activity LEDs If LED actively flashes GREEN, that controller is online - "A" is online. "A" is the top PCM, "B" is the bottom PCM.

Fig 5a Fig 5b

Front OPS LEDS Controller Fault ( ! ) LED on Rear

The Fault ( ! ) LED on the "B"

PCM is "ON" "A" Top is OFF

AC Power

A

B

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 5: MB Replace

III.Step

7 If the console response is: "Waiting for giveback…..." follow steps 6a-6c. If console response is LOADER, skip to next Section.a) At the "Waiting for giveback ……" prompt, Enter: Ctrl-C b) At the message: "Do you wish to halt this node rather than wait [y/n]? " Enter: yc) After the system drops to the LOADER-A|B> prompt.

IVStep NOTE

1a)

b) Disable the auto-giveback option if enabled from the partner node. (copy-n-paste)

2

3 Enter: printenv This command displays (and captures) all boot environmental variables.

An example of a "printenv" output is here > printenv-C-no-V-MC.pdf

4 Continue with Section IV on next page.

Page 4 of 21

FAS2040: Node State Check and Shutdown Procedure (cont.) Action Description

FAS2040: Capture the Current System Configuration Action DescriptionConfirm the "console" output is being saved to a text file. It will be needed later in this action plan.

The date and time is stored in the system PROM in Greenwich Mean Time, (GMT) also known as Universal Time Clock, (UTC). At the LOADER> prompt, enter: show date Record on paper the system's GMT time and the local time to determine the number of hours (and minutes) the local time is ahead or behind GMT.

IF Cluster-Mode, continue with next step otherwise skip to step 2.After the target system drops to the LOADER-A|B> prompt, login to the partner and check if the auto-giveback option is enabled by entering the following command: You can copy-n-paste the command syntax.

LOADER-A> show date Current date & time is: 06/12/2011 15:59:10

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

LOADER-A> printenv

Cluster-Mode (Run in clustershell) cluster::> sto fa show -node local -fields auto-giveback

Cluster-Mode (Run in clustershell) cluster::> sto fa modify -node local -auto-giveback false

Step 2): Enter: show date

STEP 3): Enter: printenv

Waiting for giveback...(Press Ctrl-C to abort wait) ^C This node was previously declared dead. ..... The HA partner is currently operational and in takeover mode. ..... ..... Do you wish to halt this node rather than wait [y/n]? y System halting... LOADER-A>

Step 7b): Halt the node

Step 7: Hitting Enter displays this prompt Step 7a): Enter: CTRL-C

Information on Partner Status

IF enabled C-Mode will show: node auto-giveback -------------- ------------- Node-B true

Page 6: MB Replace

IVStep

5 From the LOADER-A|B> prompt enter autoboot to initiate a prom bootstrap. a)

b)

c) If asked "Continue with boot?" Answer: y

6 From the *> prompt enter: fcadmin config to log the configuration of the integrated FC host adapters. a) Check if "0a" and "0b" Adapter ports are configured as a "target". If so, it will need to be verified later.

7 Continue with Section IV on next page.

For ONTAP 7.x, ONTAP 8.0.x 7-mode and ONTAP 8.1 (7-Mode, C-Mode), refer to left menu, enter 5 for "Maintenance mode boot".

Page 5 of 21

FAS2040: Capture the Current System Configuration Action Description

When this message appears: "Press CTRL-C for Boot Menu" , press CTRL-C (^C) to load the "Boot Menu". After about 30-40 seconds, the "Maintenance menu" will appear.

NOTE If the original MB fails to boot to the Maintenance menu due to an error, skip to Section V.

LOADER-A> autoboot Loading X86_64/freebsd/image1/kernel:0x100000/3375736 0x538280/3221872 ..... Copyright (C) 1992-2010 NetApp. All rights reserved. ******************************* * * * Press Ctrl-C for Boot Menu. * * * ******************************* ^CBoot Menu will be available. Please choose one of the following: (1) Normal Boot. (2) Boot without /etc/rc. (3) Change password. (4) Clean configuration and initialize all disks. (5) Maintenance mode boot. (6) Update flash from backup config. (7) Install new software first. (8) Reboot node. Selection (1-8)? 5 You have selected the maintenance boot option: ..... ..... In a High Availablity configuration, you MUST ensure that the partner node is (and remains) down, or that takeover is manually disabled on the partner node, because High Availability software is not started or fully enabled in Maintenance mode. FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED NOTE: It is okay to use 'show/status' sub-commands such as 'disk show or aggr status' in Maintenance mode while the partner is up Continue with boot? yes ..... ..... *>

LOADER-A> autoboot Loading x86_64/freebsd/image2/kernel:....0x100000/3386664 0x53b000/3222096 0x84da50/1190096 ..... NetApp Data ONTAP 8.0.1 Cluster-Mode Copyright (C) 1992-2010 NetApp. All rights reserved. ******************************* * * * Press Ctrl-C for Boot Menu. * * * ******************************* ^CBoot Menu will be available. How would you like to continue booting? (normal) Normally (install) Install new software first (password [<user>]) Change user password (setup) Run setup first (init) Initialize disks and create flexvol (maint) Boot into maintenance mode (syncflash) Update flash from backup config (reboot) Reboot node Please make a selection: maint ..... ..... In a High Availablity configuration, you MUST ensure that the partner node is (and remains) down, or that takeover is manually disabled on the partner node, because High Availability software is not started or fully enabled in Maintenance mode. FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED NOTE: It is okay to use 'show/status' sub-commands such as 'disk show or aggr status' in Maintenance mode while the partner is up ..... *>

ONTAP 8.0.x Cluster-Mode Only

Step 5: Enter: autoboot Step 5: Enter: autoboot

Step 5a): Wait for this message, then hit ^C (CTRL-C)

Step 5b): Enter: 5

Step 5c): If this node has a partner node this message will be displayed. Answer: y to the "Continue with boot?" question.

maintenance mode console prompt maintenance mode console prompt

Step 5a): Wait for this message, then hit ^C (CTRL-C)

Step 5b): Enter: maint

ONTAP 7.x , 8.0.x 7-Mode and ONTAP 8.1 (7,C-Mode)

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

*> fcadmin config Local Adapter Type State Status ---------------------------------------------------

0a target CONFIGURED offline

0b target CONFIGURED offline

Example Only STEP 6): Enter: fcadmin config

STEP 6a): Log all the adapters listed as "target" adapters. In our example, adapters 0a and 0b are targets

Page 7: MB Replace

IVStep

8 Enter: disk_list to capture disk models.9 Enter: storage show disk -p to capture multipathing information.

10 Next, from the *> prompt enter: disk show -v to view which SAS and FC Adapter ports are driving disks- See Text Box 10.

11

12 At the *> prompt enter: halt (after prom initialization the console will display the LOADER-A|B> prompt)

13 Go to Section V, "Remove the cables and extract the PCM" on next page.

Take note of all the "unique" Adapter port numbers displayed. See Text Box STEP 11. In this example: SAS Adapters 0c, 0d and FC Adapters 0a, 0b are displayed.

Page 6 of 21

FAS2040: Capture the Current System Configuration (cont.) Action Description

NOTEThe "disk show -v" sample output below is abbreviated console output. If DOT 8.x, a HOME column is also listed with OWNER for each disk, which displays the node's systemname (and system-ID). After the controller is replaced it is necessary to confirm each SAS/FC Adapter port is seeing its storage.

*> disk show -v Local System ID: 122217803 DISK OWNER POOL SERIAL NUMBER HOME -------- ------------------ ----- -------------------- ------------------ 0c.00.0 tsst-2 (142217816) Pool0 3LM17RW900009750Q6SF tsst-2 (142217816) 0c.00.1 tsst-2 (142217816) Pool0 3LM1623E00009750Q7YT tsst-2 (142217816) 0c.00.10 tsst-2 (142217816) Pool0 3LM18TSE00009751QMQU tsst-2 (142217816) 0c.00.4 tsst-2 (142217816) Pool0 3LM19W4J00009751QPD1 tsst-2 (142217816) 0c.00.6 tsst-2 (142217816) Pool0 3LM185P700009750FB2K tsst-2 (142217816) 0c.00.9 tsst-2 (142217816) Pool0 3LM194HA000097510H4Q tsst-2 (142217816) 0c.00.7 tsst-2 (142217816) Pool0 3LM1BQTG00009752XK3D tsst-2 (142217816) 0c.00.5 tsst-2 (142217816) Pool0 3LM1BQLG00009801JB5M tsst-2 (142217816) ..... ..... 0a.41 tsst-2 (142217816) Pool0 JLVT29GC tsst-2 (142217816) 0a.43 tsst-2 (142217816) Pool0 JLVT7BUC tsst-2 (142217816) 0a.33 tsst-2 (142217816) Pool0 JLVS4EHC tsst-2 (142217816) ..... ..... 0b.21 tsst-1 (122217803) Pool0 JLVT0KDC tsst-1 (122217803) 0b.18 tsst-1 (122217803) Pool0 JLVT2HZC tsst-1 (122217803) 0b.28 tsst-1 (122217803) Pool0 JLVS585C tsst-1 (122217803) .... .... 0d.01.11 tsst-1 (122217803) Pool0 9QJ75555 tsst-1 (122217803) 0d.01.0 tsst-1 (122217803) Pool0 9QJ756DN tsst-1 (122217803) 0d.01.3 tsst-1 (122217803) Pool0 9QJ758RZ tsst-1 (122217803) 0d.01.7 tsst-1 (122217803) Pool0 9QJ754ST tsst-1 (122217803) 0d.01.6 tsst-1 (122217803) Pool0 9QJ75925 tsst-1 (122217803) 0d.01.10 tsst-1 (122217803) Pool0 9QJ74TQG tsst-1 (122217803) 0d.01.9 tsst-1 (122217803) Pool0 9QJ758NQ tsst-1 (122217803) 0d.01.5 tsst-1 (122217803) Pool0 9QJ74VNZ tsst-1 (122217803) ..... ..... *> *> halt

A typical listing will display many more disks and FC/SAS adapters than this partial listing.

STEP 10: The disk show -v command prints out the System ID of the Local System (122217803). It also prints the owner of each disk under the HOME heading which lists the node's system name. This system name is (tsst-1) and owns disks: 0b.21, 0b.18, 0b.28, 0d.01.11, 0d.01.0, etc.

NOTE- Partner owned disks are intermixed in the output. The partner hostname is 'tsst-2' and it's System ID is (142217816).

Example Only

STEP 11: Under the DISK heading, all SAS & FC Adapters are listed. In this example SAS adapter 0c and 0d and FC adapter ' 0a and 0b' are seen, but typically there are more. After the controller is replaced, confirm the same adapters are listed meaning there is an active SAS/FC path to the disks.

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Step 12: Enter halt to exit to the LOADER-A|B> prompt

Page 8: MB Replace

V.Step

NOTE

1

(i)

(ii) If a non-successful takeover, the flashing LED indicates uncommitted customer data - Contact NGS

Non-HA Configuration*: If the NVMEM Status LED is flashing, the system was not 'halted' properly:(i)

* The node configuration should have been determined by following Section III. 2 Before proceeding further the state of the NVMEM LED should be resolved if it's valid by reading caution above.3 Label each cable connector with its port number and then unplug the cabling from the connector.4 Pull the cam handle downward and slide the controller module out of the system.

VI.Step

1

2 Remove the top cover on the old PCM to expose the NVMEM battery. Disconnect the cable and remove the battery.

STOP

3 Insert original battery into the replacement PCM. If one exists in it, move it to old PCM. Connect battery cable - Fig 6.

4 Turn the PCM upside down to reveal the Compact Flash (CF) cover.5 Slide the CF cover up and carefully slide the CF card from it's connector and mark it with an "O" for original . Ref Fig 7.

NetApp Label is on the Top Side

67 Go to Section VII, "Partially Reinsert the Replacement PCM and Reconnect the cables" on next page.

Page 7 of 21

FAS2040: Remove the cables and extract the PCM Action DescriptionIf TWO PCMs are installed, DO NOT shut off the power supplies to replace the controller card, BUT DO shut off both power supplies if only ONE PCM is installed .

HA (Active-Active) Configuration*: If the NVMEM Status LED starts flashing ref Page-1, Fig 3, when the PCM is extracted from the chassis:

Remove each SFP/GBICs one at a time, installed in the Ethernet and FC ports from the original Controller Module and fully insert each one into the same port location in the replacement Module. (Do not mix them up!)

Exchange the CF cards between the PCMs. The one marked "O" should now be in the replacement PCM.

Some replacement PCMs have the NVMEM battery pre-installed. If one is installed, remove it and place it in the defective PCM, as it is most often too discharged to complete the part replacement process.

Ask end-user if controller was properly "halted". If not, re-insert controller and if the system does not autoboot, enter: bye at the LOADER-A|B> prompt . If the system boots to the login prompt, login and then enter: halt to properly shutdown. Engage NGS if questions.

On the node to be serviced, loosen the orange thumbscrew, ref Fig 3. Pull down on the cam lever and slide the PCM towards you halfway out of the chassis.

FAS2040: Move the battery, SFPs - Exchange the CF Cards Action Description

Confirm from end-user or NGS that the partner controller had a clean takeover, or if this controller was "waiting for giveback", the flashing LED can be ignored.

STOP!

and

READ

this

CAUTION

Slide the CF cover to expose the CF Card.

Slide the CF card to disengage it from the connector

Fig 7

FAS2040 PCM Bottom View

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Battery and cable connector. Press tab to release connector. Battery is held into the module by Velcro tape.

Connector is next to Heat Sink which may be Hot, let it cool

FAS2040 PCM

Fig 6

Page 9: MB Replace

VII.Step

1 Partially insert the PCM into the slot so that the cables can be attached- DO NOT engage the backplane yet.2

VIII.Step

123

4

a. Immediately Press ^C (CTRL-C) to access the "Boot menu".b. If a 'System ID mismatch' warning message below is displayed, answer : y

c. Next, drop to the LOADER prompt from the Boot Menu by following the linked process > here5 Continue with Section VIII on next page.

Page 8 of 21

FAS2040: Set date and time on the RTC Action DescriptionRe-attach laptop to the console port and capture the display output even if using the end user's computer.Fully Insert the PCM into the slot and raise the cam lever and secure it with Orange thumbscrew.

Cables: Fully insert each cable that was removed to its proper port until it clicks in. Test by pulling on them. Especially the FC and SAS ports!

FAS2040: Partially Reinsert the Replacement PCM and Reconnect the cables Action Description

IF you miss the window to abort the autoboot, look for this message: "Press CTRL-C for boot menu" and complete steps 4a-4c, otherwise if at the "LOADER" prompt, skip to step 5.

IMMEDIATELY after the console message "Starting AUTOBOOT press Ctrl-C to abort…" is displayed, press Ctrl-C (^C) key a couple times to abort the autoboot. See Console output example below.

Phoenix TrustedCore(tm) Server Copyright 1985-2006 Phoenix Technologies Ltd. ....... ....... Portions Copyright (C) 2002-2008 NetApp CPU Type: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz Starting AUTOBOOT press Ctrl-C to abort... Loading x86_64/freebsd/image1/kernel:....0x100000/3386728 0x53b000/3222096 0x84da50/1190096 Autoboot of PRIMARY image aborted by user. LOADER-A>

STEP 3: Press "CTRL-C"

".…" = Deleted lines to save space

Prompt example is from the top controller

.......

....... ******************************* * * * Press Ctrl-C for Boot Menu. * * * ******************************* ^C Boot Menu will be available. Restoring /var from /cfcard/x86/freebsd/varfs.tgz WARNING: System id mismatch. This usually occurs when replacing CF or NVRAM cards! Override system id? {y|n} [n] y

STEP 4b: Enter: y

STEP 4a: Press "CTRL-C"

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 10: MB Replace

VIII.Step

6

7

NOTE Detailed instructions for another method of obtaining the time in GMT and setting the date and time is here> RTC Check8 To set the time issue: set time hh:mm:ss Set the time in GMT using 24 hour format - Do not set the time to local time.

NOTE If this maintenance period spans across the midnight hour in GMT time, the DATE will also need to be set.9 To change the date, issue: set date mm/dd/yyyy (mm = 2-digit month, dd = 2-digit Day, yyyy = 4-digit Year)

10 If the date or time was changed, issue: show date again to verify the GMT date and time are correct.11 Go to Section IX, "Verify Battery Status" on next page.

The original motherboard's GMT time and local time should have been recorded in Section IV. If you don't have it, you can obtain the GMT time from the partner node, or another NetApp appliance or any Unix Server using: date -u (The "-u" option displays the time in GMT/UTC) The new motherboard's Real Time Clock (RTC) must be set within 2 minutes of the time displayed (which is GMT time) for users to be able to re-connect to this appliance.

FAS2040: Set date and time on the RTC (cont.) Action Description

Page 9 of 21

At the LOADER-A|B> prompt enter: show date to display the date and time in GMT on the new PCM

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

LOADER-A> show date Current date & time is: 10/14/2010 16:36:50 LOADER-A>

Time is displayed in 24hr mode

Page 11: MB Replace

IX.Step

1 At the LOADER-A|B> prompt, enter :^G (Ctrl-G) to enter the BMC-shell . The prompt changes to: "bmc shell ->".

2 Check the NVMEM Battery:At the bmc shell -> prompt:

a) Enter: priv set advanced to change to "bmc shell * ->". (Has Asterisk)b) Enter: battery show to display the NVMEM battery status.c) Confirm "status" indicates "ready", "charging" or "full".d) Enter: exit to return to "LOADER-A|B>" prompt.

3 Go to Section X, "Run Diagnostics" on next page.

Page 10 of 21

FAS2040: Verify Battery Status Action Description

bmc shell -> priv set advanced Warning: These advanced commands are potentially dangerous; use them only when directed to do so by Network Appliance personnel. bmc shell*-> bmc shell*-> battery show chemistry :LION device-name :bq20z80 expected-load-mw:81 id :27100010 manufacturer :AVT manufacture-date:4/9/2007 rev_cell :2 rev_firmware :200 rev_hardware :F0 serial :03dc status :charging test-capacity :disabled bmc shell*-> bmc shell*-> exit Press ^G to enter BMC command shell LOADER-A>

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

STEP 2b): Display the battery specs.

STEP 2a): Set advanced shell

STEP 2d): Enter: exit to return to "LOADERA|B>" prompt.

STEP 2c): "status" must indicate "ready", "charging" or "full."

LOADER-A> ^G (CTRL-G) === OEMCLP v1.0.0 BMC v1.5 === bmc shell ->

STEP 4: Enter: (CTRL-G) to enter the BMC-shell.

The following text displays if the battery is "detected". If nothing is displayed make sure the battery connector is fully seated. If still no output call NGS, otherwise continue.

Page 12: MB Replace

X.Step

1 Test the Replacement Tray with diagnostics by entering boot_diags at the "LOADER-A|B>" prompt.2

3 If asked "OK to run NVMEM diagnostic (yes/no)?" Answer: yes

4 Continue with Section X on next page.

Page 11 of 21

FAS2040: Run Diagnostics (20-30 minutes) Action Description

In the Diagnostic Menu, enter: run mb mem cf-card (These diagnostics tests are basic confidence tests on the new motherboard, memory, and CompactFlash)

LOADER-A> boot_diags Loading X86_ELF/diag/diag.krn:..0x200000/12629600 0xe0b660/4226832 0x1213570/8 Entry at 0x00200000 Starting program at 0x00200000 Copyright (c) 1992-2009 NetApp. hat_fill_pd_pae: page already VALID addr 0xd8000000 Diagnostic Monitor version: 5.4.3 built: Tue Oct 6 15:00:13 PDT 2009 -------------------------------------- all Run all system diagnostics mb FAS2040 motherboard diagnostic mem Main memory diagnostic cf-card CompactFlash controller diagnostic stress System wide stress diagnostic Commands: Config (print a list of configured PCI devices) Default (restore all options to default settings) Exit (exit diagnostics) Help (print this commands list) Options (print current option settings) Version (print the diagnostic version) Run <diag ... diag> (run selected diagnostics) Options: Count <number> (loop selected diagnostic(s) (number) of passes) Loop <yes|no> (loop selected diagnostic(s)) Status <yes|no> (print status messages) Stop <yes|no> (stop-on-error / keep running) Xtnd <yes|no> (extended tests / regular tests) Mchk <auto|off|on|halt> (machine check control) Cpu <0|1> (default cpu) Seed <number> (random seed (0:use machine generated number)) Enter Diag, Command or Option: Bad input! At prompt type help to see commands menu. Enter Diag, Command or Option: run mb mem cf-card WARNING! Do not run the NVMEM diagnostic immediately after a system crash or if there is a possibility that log data is stored. Run only on new boards, or after a normal system shutdown, or if there is no chance of preserving customer data. OK to run NVMEM diagnostic (yes/no)? yes

NOTE: New RUN Command options

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

STEP 3: Enter: yes

STEP 2: Enter: run mb mem cf-card

STEP 1: Enter: boot_diags

Page 13: MB Replace

X.Step

5

NOTE: Text box information below.

6 Continue with Section X on next page.

Page 12 of 21

FAS2040: Run Diagnostics (cont.) Action DescriptionThe test output below only includes the test suite summary line. Look to see that all these show as PASSED. If any state FAILED, scroll back through your test output to see which test FAILED and call NGS to report the test failure. Read all the

FAS2040 Motherboard Diagnostic ------------------------------ Performing comprehensive motherboard diagnostic ..... Performing comprehensive GBE test on e0d ..... ****** Comprehensive GBE test ................... PASSED Performing comprehensive GBE test on e0c ..... ****** Comprehensive GBE test ................... PASSED Performing comprehensive GBE test on e0b ..... ****** Comprehensive GBE test ................... PASSED Performing comprehensive GBE test on e0a ..... ****** Comprehensive GBE test ................... PASSED Performing comprehensive BGE test on e0P ..... ****** Comprehensive BGE test ................... PASSED Testing FCAL card on channel 0a Performing comprehensive FCAL test on channel 0a ..... ****** Comprehensive FCAL test .................. PASSED Testing FCAL card on channel 0b Performing comprehensive FCAL test on channel 0b ..... ****** Comprehensive FCAL test .................. PASSED ONBOARD SAS present: Slot 0 58 Dual Channels [Lsi Rev 0x8] Testing SAS card on channel 0c Performing comprehensive SAS test on channel 0c ..... ****** Comprehensive SAS test ................... PASSED Testing SAS card on channel 0d ..... ****** Comprehensive SAS test ................... PASSED Internal loopback test ...................... PASSED Link test(xtnd only) ........................ SKIPPED ****** Comprehensive IB test .................... PASSED Performing comprehensive BMC test ..... ****** Comprehensive BMC Test ................... PASSED ..... Environmental check, subsystem: SES ......... PASSED ****** Comprehensive mb test .................... PASSED

Confirm: The GbE test passes for all 4 onboard Ethernet ports e0a-e0d

Confirm: both FCAL tests pass ports 0a,0b and both onboard SAS tests pass for port 0c, 0d.

Confirm: The Comprehensive mb test "PASSED"

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Confirm: The BMC and SES tests passed

If a FC test fails remove the cable for that port if attached and retest "mb' test only

DIAGNOTIC RESULTS CONFIRMATION CHECKS Verify all Tests state: PASSED or SKIPPED. No test on next 2 pages should indicate FAILED. If so STOP - call NGS! Note- Text Box on FC test below

Page 14: MB Replace

X.Step

7 In text box 'STEP 7' below, verify all the memory was discovered: FAS2040 ~ 4GB8

9 Go to Section XI, "Verify FC Adapter Configuration" on next page.

Page 13 of 21

FAS2040: Run Diagnostics (cont.) Action Description

If all tests show PASSED or SKIPPED, enter: exit to exit the main diagnostic menu. If any tests listed as FAILED, report failure to NGS.

Testing : 3188 MB (start=10c00000, end=d8000000) Total Memory Size : 3456 MB Main Memory Diagnostic ---------------------- Performing comprehensive main memory test ..... ****** Comprehensive Memory test ................ PASSED CompactFlash Diagnostic ------------------------ ..... ****** Comprehensive CompactFlash test .......... PASSED Pass = 1, Current date = Saturday Jul 15 09:46:39 2011 --- Completed pass 1. Enter Diag, Command or Option: exit AMI BIOS8 Modular BIOS Copyright (C) 1985-2009, American Megatrends, Inc. All Rights Reserved ..... CPU Type: Intel(R) Xeon(R) CPU @ 1.66GHz LOADER-A>

Note: That the Comprehensive Memory test & Comprehensive CompactFlash test "PASSED"

Note: Test Suite Complete message

STEP 8: Enter: exit to exit the Diags. The system will display many messages. After about 10-20 seconds, the it will drop to the LOADER-A|B> prompt.

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

STEP 7: Note: PLEASE CONFIRM FAS2040 should total ~4GB

Page 15: MB Replace

XI.Step

1 Boot into maintenance mode by following steps herea) If a 'System ID mismatch' warning message is displayed due to the new Controller Module , answer : y

NOTE

Under NO CIRCUMSTANCES bypass the system halt to "boot" the system on a NVMEM battery voltage issue.2

a)

NOTE

b) Enter: fcadmin config to confirm the changed FC Adapters are displaying as PENDING: (target) ports.

3 If any FC cables were disconnected from adapters '0a' or '0b' due to boot issue, firmly reconnect them now. Must click in.4 Go to Section XII, "Capture new System-ID on replacement Controller" on next page.

STOPIf the system reports the battery voltage is too low or a critical failure, do NOT proceed - Do NOT bypass the system stop. Engage NGS for assistance.

Page 14 of 21

FAS2040: Verify FC Adapter Configuration Action Description

If the replacement PCM fails to boot to the Maintenance menu, confirm the original Boot Device (CF Card) moved from the original MB to the replacement. Engage NGS for assistance.

Review the fcadmin config output from Section IV. If any onboard Adapters (0a, 0b) were configured as "target" verify they are still configured by entering: fcadmin config If one or more adapters need to be set as a "target" follow steps 2a-2b. If all are OK, skip to step 3.

For each Adapter to be configured as a target enter: fcadmin config -t target <HA> Issue one command per adapter. This example configures Adapter ports '0a' and '0b' as targets:

If the adapter that needs to be changed to a target, is listed as " online", it must be off-lined first before it can be changed. Issue: fcadmin offline <HA>

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

...... ...... WARNING: System id mismatch. This usually occurs when replacing CF or NVRAM cards! Override system id? {y|n} [n] y .....

STEP 1a): Enter: y

nvram: Need to update primary image on flash from version 49 to 2 nvram: Need to update secondary image on flash from version 49 to 2 Updating nvram firmware, battery is off. The system will automatically reboot when the update is complete.

NOTE: If the NVRAM FW is down rev, an auto-update will start and the controller will reboot.

*> fcadmin config Local Adapter Type State Status --------------------------------------------------- 0a initiator CONFIGURED offline 0b initiator CONFIGURED offline *> fcadmin config -t target 0a Tue Oct 28 07:19:05 GMT [fci.config.state:info]: Fibre channel initiator adapter 0a is in the PENDING (target) state. A reboot is required for the new adapter configuration to take effect. *> fcadmin config -t target 0b ... Fibre channel initiator adapter 0b is in the PENDING (target) state. A reboot is required for the new adapter configuration to take effect. *> fcadmin config Local Adapter Type State Status --------------------------------------------------- 0a initiator PENDING (target) offline 0b initiator PENDING (target) offline

Example Only STEP 2: List theFC Adapter configuration

STEP 2a: Examples to configure a port to be a target.

STEP 2b: Enter: fcadmin config to confirm each target port is shown as PENDING

Page 16: MB Replace

XII.Step

NOTE

123

4

5 Go to Section XIII, "Disk Reassign" on next page.

Page 15 of 21

FAS2040: Capture new System-ID on replacement Controller Action DescriptionFAS2040 systems have NVMEM integrated into the controller and so when replacing its controller, the disks need to be reassigned to the new System-ID.

Compare the new system ID to the old system ID. The old system-ID is on the disk show -v output that was captured in Section IV. For DOT 8, always use the HOME column, not OWNER.

Enter: disk_list to force some disk I/O for the primary and secondary path check in next step.Enter: storage show disk -p to confirm all adapters list a PRIMARY and SECONDARY path. No? Re-check cable seating.Enter: disk show -v This shows disk ownership by system-ID. Confirm all disks are listed as originally captured. NOTE: The primary and secondary path, to the SAS and FC Adapter under the Disk heading, can reverse.

*> disk show -v Local System ID: 1743755272 DISK OWNER POOL SERIAL NUMBER HOME -------- ------------------ ----- -------------------- ------------------ 0c.00.0 tsst-2 (142217816) Pool0 3LM17RW900009750Q6SF tsst-2 (142217816) 0c.00.1 tsst-2 (142217816) Pool0 3LM1623E00009750Q7YT tsst-2 (142217816) ..... ..... 0a.41 tsst-2 (142217816) Pool0 JLVT29GC tsst-2 (142217816) 0a.43 tsst-2 (142217816) Pool0 JLVT7BUC tsst-2 (142217816) ..... 0b.21 tsst-1 (122217803) Pool0 JLVT0KDC tsst-1 (122217803) 0b.18 tsst-1 (122217803) Pool0 JLVT2HZC tsst-1 (122217803) ..... ..... 0d.01.6 tsst-1 (122217803) Pool0 9QJ75925 tsst-1 (122217803) 0d.01.10 tsst-1 (122217803) Pool0 9QJ74TQG tsst-1 (122217803) 0d.01.9 tsst-1 (122217803) Pool0 9QJ758NQ tsst-1 (122217803) 0d.01.5 tsst-1 (122217803) Pool0 9QJ74VNZ tsst-1 (122217803)

In this example, the local System ID for the new Controller is 1743755272. The old MB System ID was 122217803 (disk show -v from Section IV). The disks need to be reassigned to the local System ID. Example Only

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 17: MB Replace

XIII.Step

1

NOTE The disk reassignment process takes several seconds and a message is printed for each disk that is reassigned.

A.A1

A2

A3 Enter: priv set advanced at the prompt for the following command to work. Prompt will include " * ".A4

second takeover/giveback from the "target" (repaired) node must be executed later in this AP. Ref TSB-1110-04

to cancel the 'disk reassignment' and follow the steps >> here.If re-dispatched for the disk reassign, start at the beginning of this Procedure- A.

A5A6

A7 IF Cluster-Mode, enter: exit to return to clustershell.

A8 Continue with step 2 on next page.

STOP If the console messages stated the giveback must be completed immediately, do not enter any other commands on the partner node until "after" the disk ownership on the down node is verified and the giveback is completed.

STOP

IF the following highlighted console message is displayed: 1) The giveback cannot be postponed and 2) A

Engage the customer and ask: “Are there any Windows applications running that would inhibit a ‘cf giveback’ at this time? (open cifs sessions)" If the customer states the giveback cannot be performed now, answer n

If the giveback can be preformed now, enter: y and continue. The next console message confirms the disk ownership update to the new system-ID. Enter: y to the question.

At the console prompt enter: disk reassign -s <old_system_ID> -d <new_system_ID> Cut and paste the old and new System IDs from the console Log.

Page 16 of 21

FAS2040: Disk Reassign Action DescriptionFollow procedure-A if the node was successfully taken over by its partner.Follow procedure-B on next page if this node is a single controller configuration or the partner did NOT takeover.

Execute the "A" steps on the partner node.

Login to the PARTNER node (7-Mode=root , C-Mode=admin ). Engage end-user for password. If this is a C-Mode system, enter: run local to enter the nodeshell.

NOTE The partner console prompt must have the word "(takeover)" in it. If not, verify with end-user or NGS that the takeover did NOT occur. If it did not, use Method BOn the partner node, enter: partner aggr status -f to make sure there are no failed disks in the system as they will need to be replaced before the disk reassignment and giveback. Inform customer to open a support case if there are "failed" disks.

Disk ownership will be updated on all disks previously belonging to Filer with sysid 122217803. Would you like to continue (y/n)? y Enter: y

Command Example Only: partner(takeover)*> disk reassign -s 122217803 -d 1743755272

A console message will be displayed for each disk changing ownership (System ID)

disk reassign: A giveback must be done immediately following a reassign of partner disks. After the partner node becomes operational, do a takeover and giveback of this node to complete the disk reassign process. Do you want to continue (y/n)?

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

DOT 7 and DOT 8 7-Mode partner(takeover)>

Cluster-Mode cluster::> run local partner(takeover)>

Cluster-Mode partner(takeover)> exit logout cluster::>

Page 18: MB Replace

XIII.Step Action Description

B.B1

B2

B3 Continue with step 2.

2

3 At the maintenance mode prompt: " * >", enter: halt to exit to LOADER-A|B.4 Go to Section XIV, "Boot PROM Variable Checks" on next page.

From the console port on "target" controller on which you replaced the MB (in maintenance mode): Enter: disk show -v to display the disks reassigned to the new System ID.

STOP!BEFORE the "giveback" is executed, you must verify the system-id for this node's disks listed under "HOME" if the column exists, use OWNER if not, and the new "Local System ID" are the same. If not, confirm the correct system-ids were entered on the 'disk reassign' command. If problems, do NOT proceed, call NGS for assistance.

Page 17 of 21

FAS2040: Disk Reassign (cont.)

Single Controller configuration or the partner did NOT takeover. Execute the "B" steps from Maintenance mode on the replacement Controller.

At the maintenance mode " * > " prompt enter: disk reassign -s <old_system_ID> -d <new_system_ID> Cut and paste the old and new System IDs Enter: y to question "Would you like to continue (y/n)?"

*> disk show -v Local System ID: 1743755272 DISK OWNER POOL SERIAL NUMBER HOME -------- ------------------ ----- -------------------- ------------------ 0c.00.0 tsst-2 (142217816) Pool0 3LM17RW900009750Q6SF tsst-2 (142217816) 0c.00.1 tsst-2 (142217816) Pool0 3LM1623E00009750Q7YT tsst-2 (142217816) ..... ..... 0a.41 tsst-2 (142217816) Pool0 JLVT29GC tsst-2 (142217816) 0a.43 tsst-2 (142217816) Pool0 JLVT7BUC tsst-2 (142217816) ..... 0b.21 ------------------ Pool0 JLVT2HZC (1743755272) 0b.18 ------------------ Pool0 JLVT2HZC (1743755272) ..... ..... 0d.01.6 ------------------ Pool0 9QJ75925 (1743755272) 0d.01.10 ------------------ Pool0 9QJ74TQG (1743755272) 0d.01.9 ------------------ Pool0 9QJ758NQ (1743755272) 0d.01.5 ------------------ Pool0 9QJ74VNZ (1743755272)

Example Only

The new local System ID for the Controller is 1743755272. The owner name (tsst-1) may or may not be shown. But those disks should reflect the new local System ID.

*> disk reassign -s 122217803 -d 1743755272 Disk ownership will be updated on all disks previously belonging to Filer with sysid 122217803. Would you like to continue (y/n)? y Enter: y

A console message will be displayed for each disk changing ownership (System ID)

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Example Only

Page 19: MB Replace

XIV.Step Action Description

1

2 Go to Section XV, "Boot the Operating System - 'giveback' if applicable" on next page.

Page 18 of 21

FAS2040: Boot PROM Variable Checks

IF ONTAP version is < 8.0.2 (ONTAP 8.1 and > are not affected), unset the variable bootarg.init.wipeclean . (copy-n-paste)

LOADER-A> unsetenv bootarg.init.wipeclean

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 20: MB Replace

XV.Step Action Description

1 At the LOADER-A|B prompt, enter autoboot to boot ONTAP.2

a) If the system booted up to a "login>" prompt, example below, go to step 3.

b)

3

4 Login into the PARTNER node (7-Mode=root , Cluster-Mode=admin ). Engage end-user for password. 5

6 7-Mode Only: Ask the customer if there are any heavy NDMP, SnapMirror or SnapVault processes running. If yes , they should be disabled due to bug - 489060. The procedure to disable these processes is >> here.

7 Enter the proper controller giveback command(s) based on the mode running as follows:

8 Continue with Section XV on next page.

A system boot to the login prompt is the result of one of two possibilities: IF this system has a partner controller, but the partner did not takeover, go to the STOP under Step 10 on next page. IF the system is stand-alone (non-HA head), skip to Section XVI.

IF Cluster-Mode, enter: run local to switch to the nodeshell. Enter: cf status to confirm the repaired node is ready for a "giveback". May have to wait a couple minutes for the NVMEMs to synchronize.

Page 19 of 21

FAS2040: Boot the Operating System - 'giveback' if applicable

After the console stops printing messages, press the <enter> key.

If the system booted up to a "Waiting for giveback>" prompt (press the <enter> key) , example below, the node was part of an HA configuration and was taken over by its partner. Go to step 4.

DOT 7 and DOT 8 7-Mode

partner(takeover)> cf giveback

Cluster-Mode (exit nodeshell first) nodeshell> exit cluster::> storage failover giveback -fromnode local

"partner" is the controller name for partner head "<xyz>" is the controller name for the remediated head

partner(takeover)> cf status partner has taken over <xyz>. <xyz> is ready for giveback. ..... partner(takeover)>

Loading X86_64/freebsd/image1/kernel:0x100000/3375736 0x538280/3221872 ..... ..... ******************************* * Press Ctrl-C for Boot Menu. * ******************************* ..... ..... ..... login:

These are typical Boot strap

console messages. If the node is operating in stand-alone

mode, you should eventually get a "login" prompt when you

hit <enter>.

.... Many typical system startup messages removed for clarity

Phoenix TrustedCore(tm) Server ..... ..... ******************************* * Press Ctrl-C for Boot Menu. * ******************************* Chelsio T3 RDMA Driver - version 0.1 ..... Waiting for giveback...(Press Ctrl-C to abort wait)

NOTE 2.1: If you see this message, this node is part of a HA configuration and the partner node took over for it. If the "controller giveback" fails due to partner "not ready", wait 5 minutes for the NVMEMs to synchronize. If the giveback fails due to "open CIFS sessions", failed disks or for any other reason, contact NGS.

"...." = Deleted lines to save space

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Page 21: MB Replace

XV.Step Action Description

9

a)

Click > ONTAP 8 failover show to see examples of output. Issues? Call NGS.10

STOP

a)

11 From the "repaired node", execute a takeover using the proper command below to sync the sys-IDs.

a)

b)

c)

(i)

Click > ONTAP 8 failover show to see examples of output. Issues? Call NGS.12 Continue with Section XV on next page.

Wait! 90 seconds for 7-Mode or 2 miniutes for Cluster-Mode and then check giveback status by entering the appropriate command below for the proper ONTAP mode. For 7-Mode look for failiover enabled , for Cluster-Mode follow step 9(a).

For ONTAP Cluster Mode, storage failover show should not show any "partial" givebacks. If there are, wait another 60 seconds and recheck. Some system loads may take serveral minutes to complete the giveback.

For ONTAP Cluster Mode, storage failover show should not show any "partial" givebacks. If there are, wait another 60 seconds and recheck. Some large systems may take up to 10 minutes to complete.

Page 20 of 21

FAS2040: Boot the Operating System - 'giveback' if applicable (cont.)

IF the partner printed out the following highlighted message after the "disk reassign" command was executed, go to Step 11. If no message, skip to Step 13. (Ref Internal TSB-1110-04).IF this system has a partner controller, but the partner did not takeover, (Disks were assigned in maint. mode) continue with step 10a . The console message below is not displayed when disks are reassigned in maintenance mode. (Ref TSB-1209-02)

Login to the repaired node and re-enable "controller failover" using proper command syntax below (copy-and-paste). Enter: y to all questions.

Wait! 60 seconds for 7-Mode or 90 seconds for Cluster-Mode after takeover reports complete- Then check takeover status by entering the appropriate command shown for the specified ONTAP Mode.

After the appropriate Wait period in step 12(a) and the cf status reports: "Ready for giveback" , enter the proper "giveback" command below. This is the final synchronization of the system-Ids across the HA pair.

Wait Again! This time 90 seconds for 7-Mode or 3 miniutes for Cluster-Mode and then check giveback status by entering the appropriate command below for the proper ONTAP mode. For 7-Mode look for failiover enabled , for Cluster-Mode follow step (i).

disk reassign: A giveback must be done immediately following a reassign of partner disks. After the partner node becomes operational, do a takeover and giveback of this node to complete the disk reassign process.

DOT 7 and DOT 8 7-Mode target> cf enable

Cluster-Mode (run from clustershell ) (1st cmd is for 2-node clusters ONLY, 2nd cmd is for 3 or more node clusters)

cluster::> cluster ha modify -configured true OR cluster::> storage failover modify -node local -enabled true

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Cluster-Mode cluster::> storage failover show

DOT 7 and DOT 8 7-Mode target(takeover)> cf status Controller Failover enabled, XYZ is up. Follow step (i).

DOT 7 and DOT 8 7-Mode target> cf takeover

Cluster-Mode cluster::> storage failover takeover -bynode local

DOT 7 and DOT 8 7-Mode target(takeover)> cf status

Cluster-Mode cluster::> run local cf status

Cluster-Mode cluster::> storage failover show

7-Mode target(takeover)> cf status Controller Failover enabled, XYZ is up. Follow step (i).

7-Mode target(takeover)> cf giveback

Cluster-Mode cluster::> storage failover giveback -fromnode local

Page 22: MB Replace

XV.Step Action Description

13 If Cluster-Mode: Follow steps (a-b) below otherwise skip to Section XVI.a)

Example of output here> net int show

b)

Example of output here> net int show

XVI. FAS2040: NetApp Storage Encryption (NSE) System? Step Action DescriptionSTOP

1 Enter: key_manager setup on the target (repaired) node to update the boot PROM variables and to regenerate the key for the new system-ID. Follow these additional steps here.

a) Next, login to the partner node and enter: key_manager setup to update it's boot parameters.

XVII.StepNOTE

1 Ask end-user if using "AutoSupport"? If YES, perform step 1(a). If NO, perform step 1(b).a)

b) If ASUP is disabled: Call NGS CSR and provide the new MB serial number so they can register it as the new system s/n.2 IF NDMP, SnapMirror or SnapVault options were disabled, enable them now. Refer to page 2 of doc > > here3 Ask customer if using Operations Manager? If so, can they still access the controllers. If not, see bug > > 5831604 C-Mode Only: Re-enable "auto-giveback" options if they were disabled on either node. C-Mode command here5 Email the console log with the NetApp Reference Number in the Subject Line to [email protected] Place the defective part in the antistatic bag and seal the box.7 Follow the return shipping instructions on the box to ship the part(s) back to NetApp’s RMA processing center. If the

shipping label is missing see process to obtain a shipping label here Missing Shipping Label?8 Verify with customer that the system is OK and if working with NGS ask them if it is OK to be released.9 Close dispatch per Rules of Engagement.

Page 21 of 21

FAS2040: Controller registration, Enable options, Submit logs and Part Return Action DescriptionService entitlements break when the MB is swapped because the new motherboard changes the system serial number.

ASUP system: Request end-user to send NetApp an ASUP Message from the target (repaired) node so the configuration setup can be verified and the new system serial number can be registered by NGS. If the target system is not UP, send ASUP from its partner. Use the corresponding command for the version of ONTAP running. Enter your dispatch's 7-digit FSO number (begins with 5).

FAS2040: Boot the Operating System - 'giveback' if applicable (cont.)

At the clustershell prompt, enter the command below to list the logical interfaces that are not on their home server and port.

If any interfaces are listed as "false" in the above command, enter the command below to revert them back to their home port. Issues? Call NGS.

IF this system is using NSE disks continue with step 1, otherwise skip to Section XVII.

DOT 7 and DOT 8 7-Mode filer> options autosupport.doit 5xxxxxx

Cluster-Mode cluster::> invoke * -type all -message 5xxxxxx

Processor Controller Module (PCM) Replacement for the FAS2040 For NetApp Authorized Service Engineers

Cluster-Mode cluster::> net int show -is-home false

Cluster-Mode cluster::> net int revert *