cust troubleshoot
DESCRIPTION
isilon troubleshootTRANSCRIPT
1 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Abstract
This guide helps you troubleshoot OneFS upgrade failures and error
messages received during upgrades.
September 15, 2015
EMC ISILON CUSTOMER TROUBLESHOOTING GUIDE
ONEFS UPGRADE FAILURES
2 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Best practices and useful information
Page 4
Contents and overview
Before you begin
Page 3
Appendix A
If you need further assistance
Start Troubleshooting
Page 5
Nodes did not all come back online
Page 8
Simultaneous Upgrade
Page 11
Rolling Upgrade
Page 12
Note Follow all of these steps, in order, until you reach a resolution.
1. Follow these
steps.
2. Perform
troubleshooting
steps in order.
3. Appendices
Appendix B
How to use this flow chart
3 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Configure logging through SSH
We recommend configuring screen logging to log all session input and output during your troubleshooting session . This log
file can be shared with EMC Isilon Technical Support if you require assistance at any point during troubleshooting .
Note: The screen session capability does not work in OneFS 7.1.0.6 and 7.1.1.2. If you are running either of these versions,
please configure logging using your local SSH client's logging feature.
1. Open an SSH connection to the cluster and log in using the root account. Note: If the cluster is in compliance mode, use
the compadmin account to log in. All compadmin commands must be preceded by the sudo prefix.
2. Change the directory to /ifs/data/Isilon_Support by running:
cd /ifs/data/Isilon_Support
3. Run the following command to capture all input and output of the session:
screen -L
This will create a file called screenlog.0 that will be appended to during your session.
4. Perform troubleshooting.
Before you begin
CAUTION!If the node, subnet, or pool you are working on goes down during the course of
troubleshooting and you do not have any other way to connect to the cluster, you could
experience data unavailability.
Therefore, make sure you have more than one way to connect to the cluster before you
start this troubleshooting process. The best method is to have a serial cable available.
That way, if you are unable to connect through the network, you will still be able to
connect to the cluster physically.
For specific requirements and instructions for making a physical connection to the
cluster, see article 16744 on the EMC Online Support site.
Before you begin troubleshooting, confirm that you can either connect through another
subnet or pool, or that you have physical access to the cluster.
4 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Most upgrade problems occur during rolling upgrades that are initiated
from the OneFS web administration interface.
For best results, do the following:
Use the command-line interface (CLI) to perform upgrades.
Initiate the upgrade from the highest-numbered node in the cluster,
unless the highest-numbered node is an Accelerator.
If the highest-numbered node is an Accelerator, then initiate the
upgrade from node 1.
Use the command-line interface
It is best to initiate the upgrade from the command-line interface. The
CLI displays more detailed information than the web interface, and is not
reliant on the WebUI services running in order to function. You can also
launch a screen session, which enables you to resume from where you
left off if you get disconnected.
Initiate the upgrade from the highest-numbered node
The node that you initiate the upgrade from is called the "master node."
During an upgrade, each node is upgraded and rebooted in turn, in
ascending numerical order, starting with the lowest-numbered node.
When the master node is the highest-numbered node, the upgrade
starts with node 1, and the last node to be rebooted is the master node.
The system should always upgrade and reboot the master node last,
regardless of which numbered node it is, but this does not always
happen. Sometimes, when the master node is not the highest-numbered
node, the system starts upgrading with node 1 as usual, but when it
reaches the master node, it upgrades and reboots that node in its
numerical order. This stops the upgrade process because, after it is
rebooted, the master node can no longer tell the rest of the nodes to
upgrade. Therefore, you should always initiate the upgrade from the
highest-numbered node in the cluster (unless, as stated above, the
highest-numbered node is an Accelerator; in this case, you should
initiate the upgrade from node 1).
Best practices and useful information
Introduction This page explains why upgrades often fail
and how to prevent upgrade problems in the
future.
5 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Troubleshooting
Analysis
Start
Note Most upgrade problems
occur during rolling
upgrades that are initiated
from the OneFS web
administration interface.
Therefore, we will use the
command-line interface
exclusively to troubleshoot
your issue and get your
upgrade restarted. For
more information, see
"Best practices and useful
information" on page 4.
Did the
upgrade fail with a
specific error displayed
on the screen?
Follow the prompts
and onscreen
instructions.
Yes
Can the
upgrade be completed
successfully now?No
End troubleshooting
Yes
No
IntroductionStart troubleshooting here. If you need
help understanding the flow chart
conventions used in this guide, see
Appendix B: How to use this flow chart.
Go to Page 6
Go to Page 6
If you have not done so already, log in to
the cluster and configure logging through
SSH, as described on page 3.
6 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Troubleshooting, continued
Analysis, continued
Page
6
You could have arrived here from:
Page 5 - Analysis
Page 8 - Nodes did not all come back online
After
running the command,
do you see this error?
ERROR Client connected from
an unprivileged port number
50230. Refusing the connection
[Errno 54] RPC session
disconnected
No
Install a patch as described in
the following article:
OneFS: After a failed or paused
upgrade, commands sent from
nodes that are not yet upgraded
might fail, article 198906.
Then continue troubleshooting.
Yes
Go to Page 7
Run the following command to see which nodes were successfully upgraded.
isi_for_array -s "uname -a"
The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix C.
Note: If the node did not fully reboot or is down, it will not show up. Also, if
the upgrade was a rolling upgrade, an error might appear stating a node did
not come back online.
_________
____________________________________
______________
7 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Troubleshooting, continued
Analysis, continued
Page
7
Using the
output of the
isi_for_array -s "uname -a"
command from Page 6,
are all the nodes running
the new version
of OneFS?
No
Yes
No
Yes
You could have arrived here from:
Page 6 - Analysis, continued
Go to Page 8
Go to Page 9
End troubleshooting
Run the following command:
isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see Appendix D.__________
Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.
8 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 7 - Analysis, continued
Troubleshooting, continued
Nodes did not all come back online
Has it been
at least 15 minutes since the
nodes rebooted as part of
the upgrade?
Yes
Wait 15 minutes
Go back to Page 6
No
Page
8
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
9 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 7 - Analysis, continued
No
Go to Page 10
Yes
Did you
follow the steps in the
"Planning an Upgrade" and
"Completing pre-upgrade tasks"
sections of the OneFS Upgrade
Planning and Process Guide
before beginning the
upgrade?
Follow the steps in the "Planning an
Upgrade" and the "Completing pre-
upgrade tasks" sections of the
OneFS Upgrade Planning and
Process Guide.
Troubleshooting, continued
Analysis, continued
Page
9
10 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Page
10
Did you
perform a simultaneous
upgrade or a rolling
upgrade?
RollingSimultaneous
Go to Page 11 Go to Page 12
Troubleshooting, continued
Analysis, continued
You could have arrived here from:
Page 9 - Analysis, continued
11 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Page
11
Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.
You could have arrived here from:
Page 10 - Analysis, continued
No
Yes
Troubleshooting, continued
Simultaneous upgrade
Go to Page 14
Run the following command:
isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see Appendix D.__________
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
12 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 10 - Analysis, continued
Page
12
Troubleshooting, continued
Rolling upgrade
For each node that did not get upgraded, run the following command to check
that node's /var/log/messages file to see if there are errors with a timestamp
that occurred during the upgrade. In the command, replace <YYYY-MM-DD>
with the date of the upgrade:
grep '^<YYYY-MM-DD>' /var/log/update_engine*
For example:
grep '^2015-04-15' /var/log/update_engine*
Yes
Are there
errors on a node that did
not get upgraded? No Go to Page 14
Is the
following error present?
Unable to claim upgrade
daemon on one or
more nodes.
No
Yes
Go to Page 13
________________________
Go to Page 14
Run the following command to determine which nodes did not get upgraded:
isi_for_array -s "uname -a"
The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix C. _________
13 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 12 - Rolling upgrade, continued
Page
13
Run the following command:
isi services -a isi_upgrade_d
Is the
service enabled or
disabled?
Disabled
You are still in the middle of an upgrade and unable
to proceed.
Disable the service by running the following
command:
isi services -a isi_upgrade_d disable
Go to Page 14
Enabled
Troubleshooting, continued
Rolling upgrade, continued
_____________________________
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
14 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 11 - Simultaneous upgrade
Page 12 - Rolling upgrade, continued
Page 13 - Rolling upgrade, continued
Page
14
Open a screen session by running the following command, where <session name> is a name that
you provide. Record the name in case you need to use it later. The screen session enables you to
easily reconnect to the upgrade process if the session gets disconnected during the upgrade.
screen -S <session name>
If you get disconnected, you can use the following command to reconnect:
screen -x <session name>
Note: If you are running OneFS 7.1.1.2 or 7.1.0.6, skip this step. The screen session feature
does not work in OneFS 7.1.1.2 or 7.1.0.6.
Troubleshooting, continued
Restart the upgrade
Open an SSH connection to the
highest-numbered node in the cluster,
and log in using the root account.
____________________________________________________________
Restart the upgrade by running one of
the following commands:
For a rolling upgrade:
isi update --rolling
For a simultaneous upgrade:
isi update
Did the
upgrade
restart?No
Go to Page 15
Wait for the upgrade
to complete.
Yes
__________________________
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
15 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 14 - Restart the upgrade, continued
Page
15
Have all
of the nodes been
upgraded? NoYes
Troubleshooting, continued
Restart the upgrade, continued
Go to Page 16
Run the following command to determine whether
any more nodes were upgraded:
isi_for_array -s "uname -a"
The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix C. _________
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
16 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
You could have arrived here from:
Page 15 - Restart the upgrade, continued
Page
16
NoYes
Go to Page 17
Troubleshooting, continued
Post-upgrade checks
__________________________________
End troubleshooting
Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.
Run the following command:
isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see Appendix D.__________
17 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Page
17
Yes No
Troubleshooting, continued
Nodes did not all join the cluster
End troubleshooting
You could have arrived here from:
Page 16 - Post-upgrade checks
Reboot each down node as follows:
1. If possible, use a serial console to connect to the node.
Otherwise, log in to the node by using SSH.
For instructions about connecting through a serial console, see
article 16744 on the EMC Online Support site.
2. After you are connected to the node, run the following command
to reboot the node:
shutdown -r now
3. Wait for the rebooted nodes to come back online.
___________
Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
reboot.
Run the following command:
isi status -q
In the output, look at the Health DASR column to see if any nodes
report -D- (Down). For an example of the output, see Appendix D.__________
If you want to determine root cause, please contact Isilon Technical
Support before continuing. If you do not want root cause analysis,
then continue.
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.
18 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Contact EMC Isilon Technical Support
If you need to contact Isilon Technical Support during troubleshooting, reference the page or step that you need help on.
This information and the log file will help Isilon Technical Support staff resolve your case more quickly .
Appendix A: If you need further assistance
Upload node log files and the screen log file to EMC Isilon Technical Support
1. When troubleshooting is complete, type exit to end your screen session.
2. Gather and upload the node log set and include the SSH screen log file by using the command appropriate for your
method of uploading files. If you are not sure which method to use, then use FTP.
ESRS:
isi_gather_info --esrs --local-only -f /ifs/data/Isilon_Support/screenlog.0
FTP:
isi_gather_info --ftp --local-only -f /ifs/data/Isilon_Support/screenlog.0
HTTP:
isi_gather_info --http --local-only -f /ifs/data/Isilon_Support/screenlog.0
SMTP:
isi_gather_info --email --local-only -f /ifs/data/Isilon_Support/screenlog.0
SupportIQ:
Copy and paste the following command.
Note: When you copy and paste the command into the command-line interface, it will appear on multiple lines (exactly
as it appears on the page), but when you press Enter the command will run as it should.
isi_gather_info --local-only -f /ifs/data/Isilon_Support/screenlog.0 --noupload \
--symlink /var/crash/SupportIQ/upload/ftp
3. If you receive a message that the upload was unsuccessful, refer to article 16759 on the EMC Online Support site for
directions for uploading files over FTP.___________
19 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Decision diamondYes No
Process stepProcess step with command:
command xyz
Go to Page #
Page
#
Note Provides context and additional
information. Sometimes a note is
linked to a process step with a
colored dot.
CAUTION!Caution boxes warn that
a particular step needs
to be performed with
great care, to prevent
serious consequences.
End point Document ShapeCalls out supporting documentation
for a process step. When possible,
these shapes contain links to the
reference document.
Sometimes linked to a process step
with a colored dot.
Optional process step
Directional arrows indicate
the path through the
process flow.
IntroductionDescribes what the section helps you to
accomplish.
You could have arrived here from:
Page # - Page title
Appendix B: How to use this flow chart
20 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Appendix C: Output of the isi_for_array -s "uname -a"
command
You could have arrived here from:
Page 6 - Analysis, continued
Page 12 - Rolling upgrade, continued
Page 15 - Restart the upgrade, continued
Example output for
isi_for_array -s "uname -a"
cluster-1: Isilon OneFS cluster-1 v7.0.2.5 Isilon OneFS v7.0.2.5
B_7_0_2_216(RELEASE): 0x7000250005000D8:Mon Nov 25 20:16:16 PST 2013
[email protected]:/build/mnt/obj.RELEASE/build/mnt/src/sys/
IQ.amd64.release amd64
_______________________
______________________________
__________________________________
21 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures
We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.
Appendix D: Output of the isi status -q command
Example out put for
isi status -q
Cluster Name: mycluster
Cluster Health: [ ATTN ]
Cluster Storage: HDD SSD
Size: 11G (23G Raw) 0 (0 Raw)
VHS Size: 11G
Used: 573M (5%) 0 (n/a)
Avail: 11G (95%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|192.168.146.128|-A-- | 396K| 828K| 1.2M| 144M/ 2.8G( 5%)| (No SSDs)
2|192.168.146.129|OK | 49K| 3.2M| 3.2M| 145M/ 2.8G( 5%)| (No SSDs)
3|192.168.146.130|OK | 3.5K| 162K| 165K| 142M/ 2.8G( 5%)| (No SSDs)
4|192.168.146.131|OK | 49K| 356K| 405K| 143M/ 2.8G( 5%)| (No SSDs)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 498K| 4.5M| 5.0M| 573M/ 11G( 5%)| (No SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
You could have arrived here from:
Page 7 - Analysis, continued
Page 11 - Simultaneous upgrade
Page 16 - Post-upgrade checks
Page 17 - Nodes did not all join the cluster
_______________________
_______________________________________________________________________________________
© 2011 - 2013 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without
notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR
WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND
SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United State and other
countries.
All other trademarks used herein are the property of their respective owners.