docu56010_isilon-onefs-7.2.0.0---7.2.0.4-release-notes

IsilonOneFSVersion 7.2.0.0 - 7.2.0.4

Release Notes

Copyright © 2015 EMC Isilon. All rights reserved. Published in USA.

Published October 1, 2015

EMC believes the information in this publication is accurate as of its publication date. The information is subject to changewithout notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind withrespect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for aparticular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicablesoftware license.

EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and othercountries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com).

EMC CorporationHopkinton, Massachusetts 01748-91031-508-435-1000 In North America 1-866-464-7381www.EMC.com

2 OneFS 7.2.0.0 - 7.2.0.4 Release Notes

OneFS Release Notes 7

OneFS 7.2.0 Release notes..............................................................................8

Upgrading OneFS 9

Target Code...................................................................................................10Supported upgrade paths..............................................................................10

New features, software support, logging, and controls 13

New and changed in OneFS 7.2.0 - Highlights............................................... 14Authentication................................................................................. 14Cluster configuration........................................................................14File system.......................................................................................15Hardware......................................................................................... 15HDFS................................................................................................15Networking...................................................................................... 15NFS.................................................................................................. 16OneFS API........................................................................................ 16Security............................................................................................16SMB.................................................................................................16

New and changed in OneFS 7.2.0.4...............................................................16Authentication................................................................................. 16File system.......................................................................................17Hardware......................................................................................... 17HDFS ............................................................................................... 18Networking...................................................................................... 18Security............................................................................................18Upgrade and installation..................................................................19

New and changed in OneFS 7.2.0.3 (Target Code)......................................... 20Cluster configuration........................................................................20Hardware......................................................................................... 20HDFS................................................................................................21Networking...................................................................................... 21Security............................................................................................21SMB.................................................................................................22

New and changed in OneFS 7.2.0.2...............................................................22Antivirus.......................................................................................... 22Authentication................................................................................. 22Cluster configuration........................................................................23File system.......................................................................................23HDFS................................................................................................24Security............................................................................................25

New and changed in OneFS 7.2.0.1...............................................................25Authentication................................................................................. 25Cluster configuration........................................................................26Diagnostic tools............................................................................... 26File transfer......................................................................................26HDFS................................................................................................26Security............................................................................................27

Chapter 1

Chapter 2

Chapter 3

CONTENTS

OneFS 7.2.0.0 - 7.2.0.4 Release Notes 3

SmartLock........................................................................................27SmartQuotas....................................................................................27SMB.................................................................................................27

New hardware and firmware support 29

New hardware and firmware support in OneFS 7.2.0.4...................................30New hardware and firmware support in OneFS 7.2.0.3 (Target Code)............. 30New hardware and firmware support in OneFS 7.2.0.2...................................30New hardware and firmware support in OneFS 7.2.0.1...................................30New hardware and firmware support in OneFS 7.2.0.0...................................32

Resolved issues 33

Resolved in OneFS 7.2.0.4............................................................................ 34Antivirus.......................................................................................... 34Authentication................................................................................. 34Backup, recovery, and snapshots.....................................................34Cluster configuration........................................................................39Diagnostic tools............................................................................... 39Events, alerts, and cluster monitoring.............................................. 39File system.......................................................................................41Hardware......................................................................................... 43HDFS................................................................................................44Migration......................................................................................... 44Networking...................................................................................... 45NFS.................................................................................................. 46OneFS API........................................................................................ 48OneFS web administration interface.................................................48SmarQuotas.....................................................................................49SMB.................................................................................................49

Resolved in OneFS 7.2.0.3 (Target Code)....................................................... 51Antivirus.......................................................................................... 51Authentication................................................................................. 51Backup, recovery, and snapshots.....................................................53Cluster configuration........................................................................54Diagnostic tools............................................................................... 55Events, alerts, and cluster monitoring.............................................. 56File system.......................................................................................57File transfer......................................................................................60Hardware......................................................................................... 60HDFS................................................................................................62Job engine........................................................................................64Migration......................................................................................... 64Networking...................................................................................... 65NFS.................................................................................................. 65SmartLock........................................................................................68SmartQuotas....................................................................................68SMB.................................................................................................68Upgrade and installation..................................................................70

Resolved in OneFS 7.2.0.2............................................................................ 71Antivirus.......................................................................................... 71Authentication................................................................................. 72Backup, recovery, and snapshots.....................................................75Cluster configuration........................................................................77Diagnostic tools............................................................................... 78

Chapter 4

Chapter 5

CONTENTS


Events, alerts, and cluster monitoring.............................................. 79File system.......................................................................................80Hardware......................................................................................... 82Job engine........................................................................................86Migration......................................................................................... 86Networking...................................................................................... 87NFS.................................................................................................. 89OneFS web administration interface.................................................90SmartLock........................................................................................91SMB.................................................................................................91Upgrade and installation..................................................................93Virtual plug-ins................................................................................ 94

Resolved in OneFS 7.2.0.1............................................................................ 95Antivirus.......................................................................................... 95Authentication................................................................................. 95Backup, recovery, and snapshots.....................................................96Cluster configuration........................................................................98Command-line interface................................................................... 98Events, alerts, and cluster monitoring.............................................. 98File system.......................................................................................99Hardware....................................................................................... 101HDFS..............................................................................................102Job engine......................................................................................104Migration....................................................................................... 104Networking.................................................................................... 105NFS................................................................................................ 106OneFS API...................................................................................... 109OneFS web administration interface...............................................109SmartLock......................................................................................110SmartQuotas..................................................................................110SMB...............................................................................................110Virtual plug-ins.............................................................................. 112

Resolved in OneFS 7.2.0.0.......................................................................... 112Antivirus........................................................................................ 112Authentication............................................................................... 113Backup, recovery, and snapshots...................................................114Cluster configuration......................................................................116Events, alerts, and cluster monitoring............................................ 116File system.....................................................................................117File transfer....................................................................................119Hardware....................................................................................... 119HDFS..............................................................................................121Job engine......................................................................................121Migration....................................................................................... 122Networking.................................................................................... 122NFS................................................................................................ 124OneFS web administration interface...............................................124SmartLock......................................................................................125SmartQuotas..................................................................................125SMB...............................................................................................125Upgrade and installation................................................................130Virtual plug-ins.............................................................................. 130

Isilon ETAs and ESAs related to this release 131

ETAs related to OneFS 7.2.0........................................................................ 132

Chapter 6

CONTENTS

OneFS 7.2.0.0 - 7.2.0.4 Release Notes 5

ESAs related to OneFS 7.2.0........................................................................ 133

OneFS patches included in this release 135

Patches included in OneFS 7.2.0.4..............................................................136Patches included in OneFS 7.2.0.3 (Target Code)........................................ 136Patches included in OneFS 7.2.0.2..............................................................137Patches included in OneFS 7.2.0.1..............................................................139

Known issues 141

Target Code known issues...........................................................................142Antivirus..................................................................................................... 142Authentication............................................................................................ 142Backup, recovery, and snapshots ............................................................... 143Cluster configuration...................................................................................145Command-line interface.............................................................................. 146Diagnostic tools.......................................................................................... 146Events, alerts, and cluster monitoring......................................................... 146File system.................................................................................................. 149File transfer................................................................................................. 151Hardware.................................................................................................... 151HDFS...........................................................................................................153iSCSI........................................................................................................... 154Job engine...................................................................................................154Migration.................................................................................................... 156Networking..................................................................................................156NFS............................................................................................................. 157OneFS API................................................................................................... 159OneFS web administration interface............................................................ 160Security.......................................................................................................160SmartQuotas...............................................................................................161SMB............................................................................................................ 161Upgrade and installation.............................................................................162Virtual plug-ins............................................................................................163

OneFS Release Resources 165

OneFS information and documentation....................................................... 166Functional areas in the OneFS release notes................................................167Where to go for support...............................................................................171Provide feedback about this document....................................................... 171

Chapter 7

Chapter 8

Chapter 9

CONTENTS


CHAPTER 1

OneFS Release Notes

The OneFS release notes contain information about new features, changes infunctionality, issues that are resolved, support for new hardware and firmware, andknown issues and limitations in the Isilon OneFS 7.2.0 operating system.

l OneFS 7.2.0 Release notes......................................................................................8

OneFS Release Notes 7

OneFS 7.2.0 Release notesThe OneFS 7.2.0 release notes contain descriptions of all of the enhancements,functionality changes, new features, support for hardware, support for firmware, andresolved issues that are included in the release.

l OneFS 7.2.0.4 released: October 1, 2015 (General Availability)

l OneFS 7.2.0.3 released: July 22, 2015 (Target Code)

l OneFS 7.2.0.2 released: May 6, 2015

l OneFS 7.2.0.1 released: February 18, 2015

l OneFS 7.2.0.0 released: November 20, 2014

The new features, functionality changes, resolved issues, and known issues listed in therelease notes are categorized by functional area. For a list of the functional areas used tocategorize the release notes and a brief description of what each functional area typicallycontains, see the Functional areas in the OneFS release notes section in the OneFS releaseresources section at the end of this document.

For a list of available OneFS releases and information about target code releases andgeneral availability (GA) releases, see Current Isilon Software Releases on the EMC OnlineSupport site.

OneFS Release Notes


https://support.emc.com/docu46145

CHAPTER 2

Upgrading OneFS

OneFS upgrades comprise a full operating system upgrade and require that the Isiloncluster be rebooted. To help ensure that the version of OneFS to which you upgradecontains all of the resolved issues included in the version you are upgrading from,upgrades are supported only from designated previous releases of OneFS.

Before upgrading OneFS, review the Supported upgrade paths section of this document toverify that the cluster can be upgraded from your current version of OneFS directly to thisrelease.

See the OneFS Upgrade Planning and Process Guide on the EMC Online Support site fordetailed upgrade instructions and additional upgrade information.

To download the installer for this maintenance release, see the OneFS Downloads pageon the EMC Online Support site.

l Target Code...........................................................................................................10l Supported upgrade paths......................................................................................10

Upgrading OneFS 9


https://support.emc.com/downloads/15209_Isilon-OneFS

Target CodeOneFS 7.2.0.3 is the current 7.2.0.x target code version. A OneFS release is designated asTarget Code after it satisfies specific criteria, which includes production time in the field,deployments across all supported node platforms, and additional quality metrics. Forinformation about upgrading to OneFS Target Code, see Upgrading to OneFS Target Codeon the Isilon EMC Community Network (ECN) pages.

Supported upgrade pathsTo ensure that the version of OneFS you are upgrading to contains all of the bug fixesincluded in the version of OneFS you are upgrading from, upgrades are only supportedfrom specified versions of OneFS. If the cluster is not running a supported version ofOneFS, contact EMC Isilon Technical Support before attempting an upgrade.

Upgrade resourcesFor more information about simultaneous and rolling upgrades and other importantdetails about the OneFS upgrade process see the OneFS Upgrades - Isilon Info Hub.

Upgrades to OneFS 7.2.0.4 (General Availability)Simultaneous upgrades to OneFS 7.2.0.4 are supported from the following OneFSversions:

l OneFS 7.2.0 through OneFS 7.2.0.3





l OneFS 7.0 (7.0.0.0)

Rolling upgrades to OneFS 7.2.0.4 are supported from the following OneFS versions:




Upgrades to OneFS 7.2.0.3 (Target Code)Simultaneous upgrades to OneFS 7.2.0.3 are supported from the following OneFSversions:






l OneFS 7.0 (7.0.0.0)

Rolling upgrades to OneFS 7.2.0.3 are supported from the following OneFS versions:



Upgrading OneFS


https://community.emc.com/docs/DOC-44653



Upgrading OneFS

Supported upgrade paths 11

Upgrading OneFS


CHAPTER 3

New features, software support, logging, andcontrols

This section contains descriptions of new features, new software support, new protocoland protocol version support, additional logging, and new controls such as command-line options and sysctl parameters.

New features enable you to perform tasks or implement configurations that werepreviously unavailable.

These new features include:

l New software and protocol support

l Updated software and protocol version support

l New logging

l New controls such as command options, sysctl parameters, and OneFS webadministration controls

Functionality changes include modifications and enhancements to OneFS that enable youto perform preexisting tasks in new ways, or that improve underlying OneFS functionalityor performance. These changes also include removing support for deprecated protocolsand software.

The functionality changes documented in the release notes include:

l Changes to the formatting or syntax of a pre-existing command

l Changes to underlying code to improve performance

l Updates to integrated OneFS components such as OpenSSL and GNU bash

l Changes to enable functionality in the OneFS web administration interface that waspreviously available only from the command-line interface

l Changes to remove support for old and deprecated protocols

l New and changed in OneFS 7.2.0 - Highlights....................................................... 14l New and changed in OneFS 7.2.0.4....................................................................... 16l New and changed in OneFS 7.2.0.3 (Target Code)................................................. 20l New and changed in OneFS 7.2.0.2....................................................................... 22l New and changed in OneFS 7.2.0.1....................................................................... 25

New features, software support, logging, and controls 13

New and changed in OneFS 7.2.0 - Highlights

AuthenticationImproved usability for MIT Kerberos

The MIT Kerberos authentication method has been completely revamped to make itconsistent with the other authentication methods. You can now manage Kerberosauthentication through a Kerberos provider, similar to the Active Directory provider.A Kerberos provider can be included in various access zones similar to the otherproviders.

OneFS defaults to LDAP paged searchOneFS now defaults to LDAP paged search if both paged search and Virtual List View(VLV) are supported. If paged search is not supported and VLV is enabled on theLDAP server, OneFS will use VLV when returning the results from a search.

Note

In most cases, bind-dn and bind-password must be enabled in order to use VLV.

Cluster configurationNew protection policy

To ensure that node pools made up of new Isilon HD400 nodes can maintain a dataprotection level that meets EMC Isilon guidelines for meantime to data loss (MTTDL),OneFS offers a new requested protection option, +3d:1n1d (3 drives or 1 node and 1drive). This setting ensures that data remains protected in the event of threesimultaneous drive failures, or the simultaneous failure of one drive and one node.This protection policy can also be applied to node pools that do not contain HD400nodes.

Suggested protectionOneFS now includes a function to calculate a recommended protection level basedon cluster configuration. This capability is available only on new clusters. Clustersupgraded to OneFS 7.2 do not have this capability. Although you can specify adifferent requested protection on a node pool, the suggested protection level strikesthe best balance between data protection and storage efficiency. In addition, as youadd nodes to your Isilon cluster, OneFS continually evaluates the protection leveland alerts you if the cluster falls below the suggested protection level.

Node equivalencyOneFS now enables nodes of different generations to be compatible based oncertain criteria and constraints. You can specify compatibilities between Isilon S200and similarly configured Isilon S210 nodes, and between X400 and similarlyconfigured X410 nodes. Nodes must have compatible RAM amounts and identicalHDD and SSD configurations. Compatibilities allow newer generation nodes to bejoined to existing node pools made up of older generation nodes. After you addthree or more newer generation nodes, you can delete the compatibility so thatOneFS can autoprovision the new nodes into their own node pools. This enables youto take advantage of the speed and efficiency characteristics of the newer nodetypes in their own node pools.

New features, software support, logging, and controls


Zone-aware ID mappingOneFS now supports management of ID mapping rules for each access zone. IDmapping associates Windows identifiers to UNIX identifiers to provide consistentaccess control across file sharing protocols within an access zone.

File systemL3 cache stores metadata only on archive platforms

For Isilon NL400 and HD400 nodes that contain SSDs, L3 cache is enabled by defaultand cannot be disabled. In addition, L3 cache stores only metadata in SSDs onarchive platforms, which feature mostly data writing events. By storing metadataonly, L3 cache optimizes the performance of write-based operations.

HardwareAutomatic drive firmware updates

OneFS now supports automatic drive firmware updates for new and replacementdrives. This is enabled through drive support packages.

Improved InfiniBand stabilityThe stability of back-end connections to the cluster has been improved byaddressing a number of issues that were encountered when one or more InfiniBandswitches was rebooted. In some cases, the issues that were addressed occurred ifone or more InfiniBand switches were rebooted manually. In other cases, the one ormore InfiniBand switches unexpectedly rebooted due to an issue such as a memoryleak or a race condition. If any of these issue occurred, the affected nodes typicallylost connectivity to the cluster and, in some cases, had to be manually rebooted inorder to reestablish a connection.

HDFSIncreased Hadoop support

l OneFS now supports additional Hadoop distributions including Cloudera CDH5,Hortonworks Data Platform 2.1, and Apache Hadoop 2.4.

l WebHDFS now supports Kerberos authentication. Users connecting to the EMCIsilon cluster through a WebHDFS interface can be authenticated with Kerberos.

l HDFS supports secure impersonation through proxy users that impersonateother users with Kerberos credentials to perform Hadoop jobs on HDFS data.

l OneFS now supports an Ambari agent that allows you to monitor the status ofHDFS services in each access zone through an external Ambari interface.

NetworkingSource-based routing

OneFS now supports source-based routing, which selects which gateway to directoutgoing client traffic through based on the source IP address in each packetheader.


File system 15

NFSNFS service improvements

OneFS incorporates a number of improvements to the NFS service, including supportof NFS v4 and NFS v3 (NFS v2 is no longer supported). Other improvements includemoving the service from the operating system kernel into userspace for increasedreliability; supporting audit features for NFS events; incorporating access zonesupport for NFS clients; autobalancing across all nodes to achieve performanceparity and ensure continuous service; and the ability to create aliases to simplifyclient connections to NFS exports.

OneFS APIRESTful interface for object storage

OneFS introduces Isilon Swift, an object storage application for Isilon clusters basedon the object storage API provided by OpenStack Swift. The Swift RESTful API, anHTTP-based protocol, allows Swift clients to execute Swift API commands directlywith Isilon to execute object storage requests. Accounts, containers, and objectsthat form a basis for the object storage can be accessed through the NFS, SMB, FTP,and RAN protocols in addition to the Swift RESTful API. The following Swift RESTfulAPI calls are supported: GET, PUT, POST, HEAD, DELETE, and COPY.

SecurityTelnet_d support disabled on upgrade

Telnet service, which was removed in OneFS 7.0.0, will stop functioning on upgradeto 7.2.0. SSH should be used for all shell access.

SMBSupport for SMB2 symbolic links

Beginning in OneFS 7.2.0, OneFS natively supports translation of SMB2 symboliclinks. This change might affect the behavior of SMB2 symbolic links in environmentsthat rely on them. For more information, see article 193808 on the EMC OnlineSupport site.

New and changed in OneFS 7.2.0.4

Authentication

New and changed in OneFS 7.2.0.4 ID

A user that attempts to connect to the cluster over SSH, through the OneFS API, orthrough a serial cable, can no longer be authenticated on clusters running incompliance mode if any of the following identifiers are assigned to the user aseither the user's primary ID or as a supplemental ID:UID: 0

SID: S-1-22-1-0

156600

The message that is logged in the /var/log/lsassd.log file when a trusted

Active Directory domain is offline now includes the name of the domain that cannot

151058



https://support.emc.com/kb/193808


be reached. In the example below, <domain_name> is the name of the domain that isoffline:

[lsass] Domain '<domain_name>' is offline

File system


If you run the stat command to view information about a file, the Snapshot ID of

the file is now included in the output. This information appears in the st_snapidfield.

147333

Hardware


Wear life thresholds were added for the system area on the following Sunset CovePlus SSD drive models:

l HGST HUSMM1620ASS200





The addition of these thresholds enables OneFS to generate alerts and log events ifthe wear life of the system area on these SSD drive models reaches 88 percent(warn), 89 percent (critical), or 90 percent (smartfail).

156892

New control:Options were added to the isi_dsp_install command to enable

you to display the version number of the most recently installed drive supportpackage (DSP) or to display a list of previously installed DSPs. To display theversion number of the most recently installed DSP, run the following command:

isi_dsp_install --latest

Output similar to the following is displayed:

2015-06-22 15:02:21 || Drive_Support_v1.7.tar

To display a list of previously installed DSPs, run the following command:

isi_dsp_install –history

Output similar to the following is displayed:

2015-06-22 15:00:20 || Drive_Support_v1.5.tar2015-06-22 15:01:36 || Drive_Support_v1.6.tar2015-06-22 15:02:21 || Drive_Support_v1.7.tar

154222


File system 17


The error that appears if you run the isi_dmilog command on a platform that

does not support the command was changed from

dmilog functions not supported on this platform

to

dmilog functions not supported on this platform - please consult 'isi_hwmon -h'

For more information about the isi_hwmon command, see article 199270 on the

EMC Online Support site.

150724

HDFS


Support for Ambari 2.0.2 was added. 157860

1.7.0_IBM HDFS was added to the list of supported Ambari servers. 154873

Networking


The default network flow control setting for Isilon nodes that contain Intel networkinterface cards (ixgbe NICs) was changed. The default flow control setting is now 1.

The ixgbe NIC can receive pause frames but does not send pause frames. Thisconfiguration is consistent with Isilon nodes that contain Chelsio NICs.

Note

Ethernet flow control in a full-duplex physical link provides a mechanism that willallow an interface or switch to request a short pause in frame transmission from asender by issuing a media access control (MAC) control message and PAUSEspecification as described in the 802.3x full-duplex supplement standard.

151707

Security


On clusters running in compliance mode, you can no longer run the su command to

assume the privileges of a user with root-level (UID=0) access to the cluster. If youattempt to run the su command to assume the privileges of a user with root-level

privileges, the following message appears on the console:

su: UID 0 denied by compliance mode

157417





Note

Due to this change in behavior, beginning in OneFS 7.2.0.4, clusters running incompliance mode cannot be reimaged by running the sudo isi_reimagecommand.

The network time protocol (NTP) service was updated to version 4.2.8P1. For moreinformation, see ESA-2015-154 on the EMC Online Support site.

154655

The version of OpenSSL that is installed on the cluster was updated to version0.9.8.zg.

145892

Upgrade and installation


Beginning in OneFS 7.1.0, the file in which cluster configuration information isstored was changed from a plain text file (gconfig) to a database file

(isi_gconfg.db). In conjunction with this change, the maximum allowed size of

the configuration information for an SMB share was limited to 8192 bytes (8 KB).Beginning in OneFS 7.2.0.4, the OneFS pre-upgrade check checks the size of theconfiguration information for an SMB share prior to upgrading the cluster and thecluster is prevented from being upgraded if the configuration size is greater than 8KB.

The pre-upgrade check can be run alone, or as part of the upgrade process. Ineither case, if the configuration size of an SMB share exceeds the maximum sizeallowed , a message similar to the following appears on the console during the pre-upgrade check:

Error: The 'share_name' share has too many access permissions and it cannot be upgraded.The suggested resolution for this issue is:1. Remove those users from the share permissions.2. Add those users to a group.3. Add that group to the share permissions.4. Retry the upgrade.

If the pre-upgrade check detects that the configuration size of an SMB shareexceeds the maximum size allowed when it is running as part of the defaultupgrade process, the pre-upgrade check portion of the upgrade completes,however the OneFS upgrade is not started, and a message similar to the followingappears on the console, and in the SMB upgrade log file located in the /ifs/.ifsvar/tmp directory:

Error: The 'share_name' share has too many access permissions and it cannot be upgraded.The suggested resolution for this issue is:1. Remove those users from the share permissions.2. Add those users to a group.3. Add that group to the share permissions.4. Retry the upgrade.

Under these conditions, the upgrade process cannot be completed until the SMBshare configuration information is reduced in size. In most cases, this can beaccomplished by following the resolution suggested during the pre-upgrade check.

156585


Upgrade and installation 19



If you encounter this limitation and cannot reduce the size of the SMB configurationinformation by following these steps, contact EMC Isilon Technical support forassistance.

Note

Prior to the addition of this check, if the configuration size of an SMB share on acluster that was being upgraded to OneFS 7.1.0 or later exceeded the maximumsize allowed, some of the share information might not have been preserved duringthe upgrade process, and an error similar to the following might have appeared inthe /var/log/isi_gconfig_d.log file:

Update error: value for key 'share_name' has size (12324) greater than max allowed value size (8192)

Although the isi pkg command was not intended to be used to install a drive

support package (DSP), it was possible to install a DSP by running the isi pkgcommand. If a DSP was installed using the isi pkg command, the cluster might

have exhibited unexpected behavior until the DSP was removed.Beginning in OneFS 7.2.0.4, if you attempt to install a DSP using the isi pkgcommand, the installation fails and a message similar to the following appears inthe /var/log/isi_pkg log file:

Package <PACKAGE NAME> must be installed with isi_dsp_install.

153429

New and changed in OneFS 7.2.0.3 (Target Code)

Cluster configuration


The output from the sysctl efs.bam.disk_pool_db command now shows

the equivalence_id for pool groups. The ID helps Isilon Technical Support to

identify internal datastructure values when troubleshooting issues related tostorage pools.

150558

More detailed logging was added to help diagnose issues that occur whenSmartPools are upgraded during a OneFS upgrade and to help diagnose issues thatoccur after running the smartpools-upgrade command.

Note

This new information appears in the /var/log/messages file.

149686

Hardware


A new version of the QLogic BXE driver was incorporated into this release. 152083




Adds a check to the OneFS software event 400120001 to detect boot drives thatare missing mirror components.

145967

Improves the node format command so that the progress of the node formatoperation is reported in percentage complete. Prior to this change, dots weredisplayed on the console until the operation was complete.

142241

Removed redundant requests for a node's sensor data from the isi_hw_statuscommand, to improve the response time on A100, S210, X410, and HD400 nodes.

142147

HDFS


Support for Ambari 2.0.1 and 2.1.0 was added. 153925

Support for the HDFS truncate remote procedure call was added. 143461


Networking


Support for PTR record lookup for SmartConnect zone member addresses wasadded.

149662

New control:The following parameters were added to the isi networks command:

l --disable-dns-tcp-supportl --enable-dns-tcp-supportThe first parameter can be used to enable TCP support for SmartConnect; thesecond parameter can be used to disable TCP support for SmartConnect. Bydefault, TCP support is enabled and SmartConnect works as expected. If TCPsupport is disabled, SmartConnect doesn't listen for TCP connections on the DNSport (53), and clients that attempt a DNS query over TCP receive a connectionrefused error.

145012

Security


The version of Apache that is installed on the cluster was updated to version2.2.29. For more information, see ESA-2015-093 on the EMC Online Support site.

136994


HDFS 21


SMB


Improves logging operations performed by the SRVSVC process, as follows:

l The default logging level for the srvsvc process was changed from WARNING toINFO.

l The user name and domain name for the user performing an action is logged inthe /var/log/srvsvcd.log file, in addition to the SID.

l The action of modifying or deleting an SMB share via the MicrosoftManagement Console (MMC) snap-in is logged in the /var/log/srvsvcd.log file, including the user name.

An example of the new logging output appears below, where <USER SID info> is thename and SID of the user and <SMB_SHARE_NAME> is the name of the share:

Log level changed to INFODOMAIN_NAME\USER_NAME <USER SID info> set info on share SMB_SHARE_NAMEDOMAIN_NAME\USER_NAME <USER SID info> deleted share SMB_SHARE_NAME

149826,149776

Adds support for the SMB2_CREATE_QUERY_ON_DISK_ID (QFid) SMB CREATERequest value.

Note

Support for the SMB 2 QFid SMB CREATE Request value allows a file opened froman SMB share to be temporarily cached on an SMB 2 client, reducing some networktraffic associated with opening and closing the file.

149777


Antivirus


The MCP virus_scan parameter was added to the isi_rpc_d configuration file. 142083

Authentication


The number and type of actions that are logged when a machine password changetriggers a configuration update were increased. Beginning in OneFS 7.2.0.2, if amachine password is updated, the following activities are logged:

l The time at which an lsass thread starts the machine password update

l The result of the attempt to update the password on a domain controller

l The result of the LDAP confirmation of the password version

l The result of updating the /ifs/.ifsvar/pstore.gc file

138759




l The success or failure of the password update attempt



Logging was added to help identify issues that are caused by applying restrictivepermissions to the /usr/share/zoneinfo directory or its subdirectories.

Note

It is possible to apply permissions to the /usr/share/zoneinfo directory and

its subdirectories that will prevent the isi_papi_d process from reading necessaryfiles. If the isi_papi_d process cannot access these files, the OneFS webadministration interface cannot start, and lines similar to the following appear inthe /var/log/messages file:

/boot/kernel.amd64/kernel:[kern_sig.c:3349](pid 10953="isi_papi_d")(tid=100317) Stack trace:/boot/kernel.amd64/kernel: Stack:--------------------------------------------------/boot/kernel.amd64/kernel:/lib/libc.so.7:strlcpy+0x15/boot/kernel.amd64/kernel:/usr/lib/libisi_config.so.1:arr_dev_type_parse+0xb23/boot/kernel.amd64/kernel:/usr/lib/libisi_config.so.1:_arr_config_load_from_impl+0x174/boot/kernel.amd64/kernel:/usr/lib/libisi_platform_api.so.1:_ZN24cluster_identity_handler8http_getERK7requestR8response+0x39/boot/kernel.amd64/kernel:/usr/lib/libisi_rest_server.so.1:_ZN11uri_handler19execute_http_methodERK7requestR8response+0x57d/boot/kernel.amd64/kernel:/usr/lib/libisi_rest_server.so.1:_ZN11uri_manager15execute_requestER7requestR8response+0x100/boot/kernel.amd64/kernel:/usr/lib/libisi_rest_server.so.1:_ZN14request_thread7processEP12fcgi_request+0xbd/boot/kernel.amd64/kernel:/usr/lib/libisi_rest_server.so.1:_ZN14request_thread6on_runEv+0x1b/boot/kernel.amd64/kernel:/lib/libthr.so.3:_pthread_getprio+0x15d/boot/kernel.amd64/kernel:--------------------------------------------------/boot/kernel.amd64/kernel: pid 10953 (isi_papi_d), uid 1: exited on signal 11 (core dumped)

138729

File system


NEw control: 141959


Cluster configuration 23


The --reserved option was added to the isi get command, and the isi getcommand was modified so that it runs only on specific, reserved logical inodes(LINs) when the command is run with both the --reserved option and the -Loption.

Logging similar to the following was added to the /var/log/messages file if the

NVRAM journal cannot be read:

Bad type: 0

139667

Logging was added to improve diagnosis of issues that can occur if a necessaryOneFS python file fails to load. If this condition is encountered, a message similarto the following appears in the /var/log/messages file where <python_file> is

the name of the python file that failed to load:

python: Failed to import isi.app.lib.cluster in <python_file>

Note

In addition to the messages described above, if you run the isi stat or if you

run the isi events list -w command, a bad marshal error appears on

the console. If you encounter the issue that this new logging is intended to helpdiagnose, contact EMC Isilon Technical Support for assistance. For moreinformation about this issue, see article 197403 on the EMC Online Support site.

138733

HDFS



Support for the getEZForPath and checkAccess HDFS RPC calls was added.

Note

In previous versions of OneFS, if an HDFS client sent a request to the HDFS serverthat contained one of these RPC calls, the call failed, and messages similar to thefollowing were returned to the client:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): Unknown rpc: getEZForPath and org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): Unknown rpc: checkAccess

142558,140040

Support for Ambari version 1.7.0 was added. 140051




Security


The version of GNU bash installed on the cluster was updated to version 4.1.17. Formore information, see ESA-2014-146 on the EMC Online Supprot site.

143337

User input that is passed to a command line is now escaped using quotationmarks. For more information, see ESA-2015-112 on the EMC Online Support site.

140931

An update was applied to address a denial of service vulnerability in Apache HTTPServer. For more information, see ESA-2015-093 on the EMC Online Support site.

137884


Authentication


Adds the ability to enable Telnet on the cluster.For more information, see article 198100 on the EMC Online Support site.

137111

Adds a setting to the OneFS registry that enables you to configure the maximumamount of memory that can be allocated to the lsass process.

Note

Without this setting, the maximum amount of memory that can be allocated to thelsass process is set to a default of 512 MB. If the system approaches that limit,LDAP connections are closed, and the following lines appear in the lsassd.logfile:

Error code 40286 occurred during attempt 0 of a ldap search. Retrying. Error code 40286 occurred during attempt 1 of a ldap search. Retrying. Error code 40286 occurred during attempt 2 of a ldap search. Retrying. Error code 40286 occurred during attempt 1 of a ldap search. Retrying. Error code 40286 occurred during attempt 1 of a ldap search. Retrying.

Work with EMC Isilon Support to determine whether you need to configure theamount of memory allocated to the lsass process. The memory limit must be atleast 512 MB, and no more than 1024 MB. If the memory limit is set outside thatrange, the system will restore the default value of 512 MB.For more information, see article 195564 on the EMC Online Support site.

134439


Security 25



HTTPS://SUPPORT.EMC.COM/KB/202878





Updates the time zone database that OneFS relies on when you configure thecluster time zone to Time Zone Data v. 2014h. This database is made available bythe Internet Assigned Numbers Authority (IANA).

135492

Diagnostic tools


New control:The following options were added to the isi_gather_info command:

l --dump and --cores to collect the associated files for diagnosis.

l --no-cores and --no-dumps if the associated files are not needed.

l --clean-all, --clean-cores, and --clean-dumps to delete the

associated files from the /var/crash directory after successful compression

of the package.

Note

dump refers to files that are logged when the node stops responding, and corerefers to files that are logged when the node unexpectedly restarts.

135226

File transfer


The throughput calculation performed by the vsftpd process was improved so thatthe total throughput perceived by FTP clients is more precisely controlled byconfiguring the local_max_rate option in the /etc/mcp/templates/vsftpd.conf file.

Note

Prior to implementing this fix, after configuring the local_max_rate option, the

total throughput perceived by FTP clients was lower than expected.

134432

HDFS


Support for Cloudera 5.2 was added. 138484




Security


The version of OpenSSL installed on the cluster was updated to 0.9.8zc. 137904

The versions of the Network Time Protocol daemon (NTPD) and Apache, wereupdated as follows:

l The version of Apache that is installed on the cluster was updated from 2.2.21to 2.2.25

l The version of NTPD that is installed on the cluster was updated from 4.2.4p4to 4.2.6p5

137895

The version of ConnectEMC installed on the cluster was updated from version3.2.0.4 to 3.2.0.6. This upgrade changes the behavior of the ConnectEMCcomponent so that it no longer uses an internal version of OpenSSL and insteadrelies on the version of OpenSSL installed on the Isilon cluster. For moreinformation, see ESA-2015-038 on the EMC Online Support Site.

134760

SmartLock


Adds commands to the sudoers file, which is a file that defines which commands

a user with sudo privileges is permitted to run. These additional commands enableEMC Isilon Technical Support staff to troubleshoot clusters that are in compliancemode.

133285

SmartQuotas


New control:The efs.quota.allow_remote_root sysctl parameter was added to allow a

root user who is connected to the cluster remotely to make changes to files anddirectories within a SmartQuota domain, even if those changes would exceed orfurther exceed the quota domain’s hard threshold.

For more information about sysctls, see article 89232 on the EMC Online Supportsite.

131283

SMB


New control:

The following option was added to the gconfig file:registry.Services.lwio.Parameters.Drivers.onefs.FileAttributeEncryptedIgnored

136296


Security 27




If this option is enabled, Windows offline encrypted files will be synchronized inunencrypted format when an affected user reconnects to the cluster.To enable this option, run the following command:

isi_gconfigregistry.Services.lwio.Parameters.Drivers.onefs.FileAttributeEncryptedIgnored=1

New control:The SMB 1 maximum buffer size can now be configured to meet the requirementsof Kazeon applications.

To configure the SMB 1 maximum buffer size:

1. Open an SSH connection on any node in the cluster and log on using the rootaccount.

2. Run the following command from the command line where <max_buffer> is thedesired maximum buffer size: isi_gconfigregistry.Services.lwio.Parameters.Drivers.srv.MaxBufferSizeSMB1=<max_buffer>

Note

For optimal interoperability with Kazeon, the maximum buffer size should be set to16644.

134448



CHAPTER 4

New hardware and firmware support

The following sections list new support for hardware and firmware revisions that wasadded in the specified OneFS releases.

l New hardware and firmware support in OneFS 7.2.0.4...........................................30l New hardware and firmware support in OneFS 7.2.0.3 (Target Code)..................... 30l New hardware and firmware support in OneFS 7.2.0.2...........................................30l New hardware and firmware support in OneFS 7.2.0.1...........................................30l New hardware and firmware support in OneFS 7.2.0.0...........................................32

New hardware and firmware support 29

New hardware and firmware support in OneFS 7.2.0.4

Hardware ModelNumber

Drive Type CompatibleNodes

Firmware

Hardware: Support forSMART iSATASG9SLM3B8GBM11ISI 8GBboot flash drives wasadded. (156892)

SG9SLM3B8GBM11ISI

Boot flash IQ108NL, NL400,S200, X200, X400

Ver7.02k orVer7.02w

New hardware and firmware support in OneFS 7.2.0.3 (TargetCode)



Firmware

Adds support for theSunset Cove Plus 800 GBdrives with D252 firmware.(146915)

HGSTHUSMM1680ASS205

SED SSD HD400, NL400,X200, X400, X410

D252

Added support for D254firmware, which is installedon HGST Ultrastar 800 GBdrives. (146915)

HGSTHUSMM8080ASS205

SED SSD S210, X200, X400,X410, NL400

D254

Fixes upgrade path fromMKAOA580 firmware,which is installed on 3 TBdrives. (146915)

HGSTHUA723030ALA640

HDD X200, X400,NL400, IQ 108NL,IQ 108000X

MKAOA580




Firmware

Adds support for SunsetCove 800 GB SED SSDswith D252 firmware.(144840)

HGSTHUSMM1680ASS205

SED SSD X200, X400,NL400, X410,HD400

D252






Firmware

Adds support for SunsetCove Plus 1.6 TB driveswith A204 firmware.(134055)

HGSTHUSMM1616ASS200

SSD S200, S210, X200,X400, X410

A204

Adds support for A204firmware for Sunset CovePlus 800 GB drives.(134055)

HGSTHUSMM1680ASS200

SSD S200, S210, X200,X400, X410, NL400

A204


HGSTHUSMM1640ASS200

SSD S200, S210, X200,X400, X410

A204


HGSTHUSMM1620ASS200

SSD S200, S210, X200,X400, X410

A204

Adds support for new 32GB Smart Modular bootflash drives with an A19controller. (134055)

SHMST6D032GHM11EMC118000100

Boot SSD X410, S210,HD400

S8FM08.0

Adds support for 1EZfirmware, which is installedon HGST 6 TB drives.(134055)

HGSTHUS726060ALA640

HDD HD 400, NL400 1EZ

Adds support for A006firmware, which is installedon 3 TB Seagate MantaraySEDs. (134055)

ST33000652SS HDD X200, X400, NL400 A006

Adds support for firmwarerevision MFAOABW0, whichis installed on 4 TB HitachiMars-K Plus SATA drives.(134055)

HGSTHUS724040ALA640

SATA X200, X400,NL400,IQ72000X,IQ72NL

MFAOABW0

Adds support for firmwarerevision MF8OABW0, whichis installed on 3 TB HitachiMars-K Plus SATA drives.(134055)

HGSTHUS724030ALA640

SATA X200, X400,NL400, IQ108NL

MFAOABW0

Adds support for firmwarerevision MF6OABW0, whichis installed on 2 TB HitachiMars-K Plus SATA drives.(134055)

HGSTHUS724020ALA640

SATA X400, NL400 MFAOABW0


New hardware and firmware support in OneFS 7.2.0.1 31

New hardware and firmware support in OneFS 7.2.0.0No new hardware or firmware support was added in this release.



CHAPTER 5

Resolved issues

This section contains the following topics:

l Resolved in OneFS 7.2.0.4.................................................................................... 34l Resolved in OneFS 7.2.0.3 (Target Code)............................................................... 51l Resolved in OneFS 7.2.0.2.................................................................................... 71l Resolved in OneFS 7.2.0.1.................................................................................... 95l Resolved in OneFS 7.2.0.0.................................................................................. 112

Resolved issues 33

Resolved in OneFS 7.2.0.4

Antivirus

Antivirus issues resolved in OneFS 7.2.0.4 ID

The OneFS web administration interface did not list any files in the DetectedThreats section of the Antivirus > Reports page if any ASCII special characters—for example, an ampersand (&)—were in the path name of any infected file.

153117

The OneFS antivirus client could not connect to some ICAP servers if the ICAP URLthat you configured on the cluster was not in the following format:

icap://<hostname>:<port>/avscan

144726

Authentication

Authentication issues resolved in OneFS 7.2.0.4 ID

A local user who did not have root privileges could not change their password byrunning the UNIX passwd command. As a result, if an affected user’s password

expired, they were unable to log on to the cluster until the password was resetthrough another method.

155570

If an SMB client sent a request to apply an invalid security identifier (SID) to a file ordirectory on the cluster, the cluster returned a STATUS_IO_TIMEOUT response.

Depending on the application that was used to send the request, a message similarto the following might have appeared on the client:

The specified network name is no longer available

154257

If the cluster was not joined to a Microsoft Active Directory (AD) domain, and youattempted to change the access control list (ACL) of a file on the cluster from aWindows client, the operation failed, and a message similar to the followingappeared on the client:

The program cannot open required dialog box because it cannot determine whether the computer named "10.0.1.1: is joined to a domain. Close this message, and try again.

Under these conditions, ACLs could only be modified through the OneFS command-line interface.

150915

Backup, recovery, and snapshots

Backup, recovery, and snapshots issues resolved in OneFS 7.2.0.4 ID

While synchronizing data between source and target clusters in compliance mode,if the file flags applied to a file on the source cluster differed from the file flagsassigned to the file on the target cluster, SyncIQ attempted to update the fileattributes of WORM committed files on the target cluster even if the retention date

157106

Resolved issues



for those files had not yet passed. As a result, the synchronization failed. If thisissue occurred, lines similar to the following appeared in the /var/log/messages file:

bam_ads_setmode error: 30Local error : syncattr error for <path_to_WORM_file>:chfal

During an initial SyncIQ data replication, Access Control Lists (ACLs) applied tosymbolic links, pipes, block devices, and character devices were not replicatedfrom the SyncIQ source cluster to the SyncIQ target cluster. As a result, following aninitial synchronization, applications and users were prevented from accessingthese file system objects and were also prevented from accessing files anddirectories on the cluster through symbolic links.

155965

When performing an NDMP restore, OneFS verifies the end of the data stream bydetecting two consecutive blocks of zeroes. In rare cases, in OneFS 7.2.0.0 through7.2.0.3, if the second block of zeroes was stored in a different buffer than the firstblock of zeroes, OneFS did not read the second block of zeroes from the otherbuffer, and instead read the data that followed the first block of zeroes. If thisoccurred, the restore operation was immediately stopped, and data that was in theprocess of being restored might have been incompletely restored.This issue did not occur if the RESTORE_OPTIONS NDMP environment variable wasset to 1, specifying that a single-threaded restore operation be performed.

Note

By default, restore operations are multi-threaded.

155782

If you attempted to run a SyncIQ job from OneFS 5.5 to OneFS 7.x and the job didnot have a valid policy ID, the job stopped without dispatching a failure message,and an error similar to the following appeared in the /var/log/messages file:

Stack: -------------------------------------------------/lib/libc.so.7:__sys_kill+0xc/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/usr/lib/libisi_migrate_private.so.2:get_lmap_name+0x54/usr/bin/isi_migr_sworker:work_init_callback+0xacd/usr/bin/isi_migr_sworker:old_work_init4_callback+0x16f/usr/lib/libisi_migrate_private.so.2:generic_msg_unpack+0x8bc/usr/lib/libisi_migrate_private.so.2:migr_process+0x2f1/usr/bin/isi_migr_sworker:main+0xafa/usr/bin/isi_migr_sworker:_start+0x8c--------------------------------------------------/boot/kernel.amd64/kernel: pid 24302 (isi_migr_sworker), uid 0: exited on signal 6 (core dumped)

Note

Starting in OneFS 7.2.0.4, the following message will appear in the /var/log/isi_migrate.log file:

Source version unsupported. 'sync_id' must contain a valid policy id.

154830

If the force_interface option was enabled on a SyncIQ policy, the SyncIQ

scheduler process, isi_migr_sched, leaked memory. If this occurred, scheduled

154326

Resolved issues

Backup, recovery, and snapshots 35


policies stopped running, and the following message appeared in the /var/log/isi_migrate.log file:

Cannot allocate memory

If you set the BACKUP_OPTIONS NDMP environment variable to a value of 7 to run

incremental, token-based backups, OneFS created entries in the dumpdates file

for all levels of backup, rather than creating dumpdates entries only for level 10,

incremental backups. As a result, NDMP snapshots never expired.

154311

If you used the snapshot-based incremental backup feature during a backupoperation and if multiple snapshots were created between backups, the featuremight have failed to recognize that data had changed during the backup procedure.As a result, some changed files were not backed up.For more information, see ETA 203815 on the EMC Online Support site.

154269

If you configured SyncIQ policies to run when source files were modified by settingthe Run Job option to Whenever the source is modified, a memory

leak could have occurred in the SyncIQ scheduler (isi_migr_sched). If this issueoccurred, new SyncIQ jobs were not scheduled, some data was unavailable, and amessage similar to the following appeared in the /var/log/isi_migrate.logfile:

Could not allocate parser read buffer: Cannot allocate memory

154259

If you performed an NDMP direct access recovery (DAR) or selective restore on anIsilon cluster, OneFS assigned ownership of the restored directories to the rootaccount. Because clusters in compliance mode do not have a root account, therestored directories were inaccessible on clusters in compliance mode, unless thecompadmin user was logged on to the compliance cluster.

154250

Although multiple IPv4 and/or IPv6 addresses were defined, NDMP listened to onlyone IPv4 and/or one IPv6 IP address. For example:

l If a node had multiple IPv4 addresses defined, NDMP listened to only one IPv4address.

l If a node had multiple IPv6 addresses defined, NDMP listened to only one IPv6address.

l If a node had both IPv4 addresses and IPv6 addresses defined, NDMP listenedto only one IPv4 address and only one IPv6 address.

154248

During a snapshot-based incremental backup, a Write Once Read Many (WORM) filemight have been backed up as a regular file. If this occurred, and the files wererestored, the files were restored as regular files, and they could have been modifiedafter they were restored.

154246

If the isi_ndmp_d process was stopped, the NDMP process ID file was still lockedby one or more NDMP child processes. As a result, the mcp process could notrestart the isi_ndmp_d process, and no new NDMP connections could beestablished. If this occurred, a Failed to spawn NDMP daemon message

appeared in the /var/log/isi_ndmp_d.log file.

154244

If you queried for the date on which a SyncIQ policy would next be run using thenext_run OneFS API property, the date and time that was returned was incorrect.

154211

Resolved issues




During Data Management Application (DMA) polling, if no tape was loaded in abackup drive, OneFS set the drive to unbuffered mode. As a result, if a non-Isilonbackup initiator did not set the tape drive to buffered mode before starting abackup-to-tape, the backup-to-tape performance by non-Isilon initiators might havebeen very slow.

Note

This was not an issue if the tape drive was used only by Isilon backup accelerators.

153451

While a SyncIQ policy was running, if a SyncIQ primary worker (pworker) process onthe source cluster sent a list of directories to delete to a secondary worker(sworker) on the target cluster, and then the pworker process unexpectedlystopped, the pworker's work range was transferred to another pworker. The otherpworker then sent the list of directories to another sworker. This action resulted intwo sworker processes on the target cluster trying to delete the same directory atthe same time. If this issue occurred, the SyncIQ job stopped, and lines similar tothe following appeared in the /var/log/messages file:

/boot/kernel.amd64/kernel:[kern_sig.c:3349](pid 70="isi_migr_sworker")(tid=2) Stacktrace:

/boot/kernel.amd64/kernel: Stack:--------------------------------------------------/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:move_dirents+0x1b6/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:delete_lin+0x279/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:delete_lin_callback+0x143/boot/kernel.amd64/kernel: /usr/lib/libisi_migrate_private.so.2:generic_msg_unpack+0x8bc/boot/kernel.amd64/kernel: /usr/lib/libisi_migrate_private.so.2:migr_process+0x2f1/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:main+0xa18/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:_start+0x8c/boot/kernel.amd64/kernel: --------------------------------------------------/boot/kernel.amd64/kernel: pid 70 (isi_migr_sworker), uid 0: exited on signal 10 (core dumped)

Additionally, the following error might have appeared in the /var/log/isi_migrate.log file:

Error : Unable to open lin 0:Invalid argument: Invalid argument from remove_entry_from_parent (utils.c:1516) from remove_single_entry (utils.c:1595) from remove_all_parent_dirents (utils.c:1680) from delete_lin (stf_transfer.c:784)

153446

If a SyncIQ policy designated a target directory that was nested within the SyncIQtarget directory of a pre-existing policy, an error occurred during SyncIQ protectiondomain creation which caused the SyncIQ policy's protection domain to beincomplete. If this occurred, the following message appeared in the /var/log/isi_migrate.log file:

create_domain: failed to ifs_domain_add

In addition, if you ran the isi domain list -lw command, the Type field for

the affected SyncIQ target was marked Incomplete.

153444

Resolved issues



If you ran a full SyncIQ data replication to a target directory that contained a largenumber of files that no longer existed in the source directory, it was possible for theprocess that removes extra files from a target directory to conflict with the processthat created the domain for the target directory. If this occurred, the SyncIQ jobfailed and had to be restarted.

153437

If the --skip_bb_hash option of an initial SyncIQ policy was set to no (the

default setting), and if a SyncIQ file split work item was split between pworkers, thepworker that was handling the file split work item might have attempted to transferdata that had already been transferred to the target cluster. If this occurred, theisi_migr_pworker process repeatedly restarted and the SyncIQ policy failed. Inaddition, the following lines appeared in the /var/log/messages file:

isi_migrate[45328]: isi_migr_pworker: *** FAILED ASSERTIONcur_len != 0 @ /usr/src/isilon/bin/isi_migrate/pworker/handle_dir.c:463:/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid 45328="isi_migr_pworker")(tid=100957)Stack trace:/boot/kernel.amd64/kernel: Stack:--------------------------------------------------/boot/kernel.amd64/kernel:/lib/libc.so.7:__sys_kill+0xc/boot/kernel.amd64/kernel/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:migr_continue_file+0x1507/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:migr_continue_generic_file+0x9a/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:migr_continue_work+0x70/boot/kernel.amd64/kernel:/usr/lib/libisi_migrate_private.so.2:migr_process+0xf/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:main+0x606/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:_start+0x8c/boot/kernel.amd64/kernel:--------------------------------------------------/boot/kernel.amd64/kernel: pid 45328 (isi_migr_pworker), uid 0:exited on signal 6 (core dumped)

153377

If a SyncIQ job was interrupted during the change compute deletion phase(STF_PHASE_CC_DIR_DEL), the Logical Inodes (LINs) could have been incorrectlyremoved from the SyncIQ job work list. If this occurred, the SyncIQ job failed, andmessages similar to the following appeared in the /var/log/isi_migrate.log file:

Unable to update metadata (inode changes) information for Lin …Operation failed while trying to detect all deleted lins in …

150613

If you viewed the details of a snapshot alias in the OneFS web administrationinterface, the Most Recent Snapshot Name was always No value, and the

Most Recent Snapshot ID was always 0.

145938

If you started a restartable backup with a user snapshot, after the backup wascompleted and the BRE context was removed, the expiration time of the snapshotwas changed. As a result, the snapshot might have been deleted prematurely.

144427

Resolved issues



Cluster configuration issues resolved in OneFS 7.2.0.4 ID

If you ran the isi_ntp_config command to exclude a particular node from

contacting an external Network Time Protocol (NTP) server, subsequent attempts toexclude another node failed, and, after running the command to exclude anothernode, a message similar to the following appeared on the console:

'str' object has no attribute 'gettext'

As a result, only one node could be excluded from contacting an external NTPserver.

154322

Diagnostic tools

Diagnostic tools issues resolved in OneFS 7.2.0.4 ID

Because the following ESRS log files were not listed in the newsyslog.conf file

—a configuration file that manages log file rotation—over time the files could havegrown in size and could have filled the /var partition:/var/log/GWExt.log/var/log/GWExtHTTPS.log

Note

If the /var partition on a node in the cluster is 90% full, OneFS logs an event

warning that a full /var partition can lead to system stability issues. Depending on

how the cluster is configured, an alert might also be issued for this event.

154107

When EMC Secure Remote Services (ESRS) was configured on the cluster, the ESRSprocess automatically selected the first available IP address, rather than selectingan IP address from an IP address pool in the System access zone. Since only theSystem zone allows a user SSH access for remote management, if the selected IPaddress was not in the System access zone, EMC Isilon Support could not monitorthe cluster remotely.

153455

Events, alerts, and cluster monitoring

Events, alerts, and cluster monitoring issues resolved in OneFS 7.2.0.4 ID

Because isi_rest_server, a component of the Platform API, did not check for thecorrect error codes when interacting with the OneFS auditing system's queueproducer library (QPL), if configuration auditing was enabled and there was an errorin the QPL, the error was not handled correctly. If this issue occurred, it might haveprevented system configuration changes from being audited.

156400

If auditing is enabled, the audit filter waits for a response from the queue producerlibrary (QPL) before sending audit events to the auditing process (isi_audit_d).In OneFS 7.2.0.0 through 7.2.0.3, if the QPL became disconnected from theauditing process, isi_audit_d, while the auditing process was waiting for aresponse, the QPL failed to send a response to the auditing process. If this

156398

Resolved issues



occurred, auditing events continued to collect in the auditing process until thequeue became full. If the auditing process queue became full, processes related toevents that were being audited—for example, processes related to file systemprotocols and configuration changes—might have stopped working. Depending onwhich related processes were affected, various cluster operations could have beendisrupted by this issue—for example, if configuration auditing was enabled, youmight have been prevented from making configuration changes through the OneFSweb administration interface.

Under some circumstances, multiple isi_papi_d process threads might have calledthe same code at the same time. If this occurred, the isi_papi_d process mighthave unexpectedly restarted.

154324

If file system auditing was enabled and you configured the system to audit eventsin which a user renamed a file, if the user renamed the file from a Mac clientconnected to the cluster through a virtual private network (VPN), the complete pathto the file was not always captured in the audit log. If this occurred, applicationsthat relied on the file paths in the audit logs might have been adversely affected.Beginning in OneFS 7.2.0.4, if as user attempts to rename a file and the completefile path to the renamed file is not captured in the audit log, the file is not renamedand an error appears in the audit log.

153463

Only the root user was permitted to run the isi_audit_viewer command. This

limitation prevented other users—including users with sudo privileges—from

viewing configuration audit logs and protocol audit logs on the cluster.

153439

If you enabled auditing on the cluster, only nodes that had the primary externalinterface (em0) configured could communicate with the Common Event Enabler(CEE) server, even if a secondary interface, such as em1, was configured and activeon the node. As a result, the audit logs from these nodes were not collected on theCEE server.

153432

If you configured OneFS to send syslog messages to a remote syslog server, theHOSTNAME of the cluster was not included in the messages. The absence of theHOSTNAME entry made it difficult to distinguish messages sent from multipleclusters to the same syslog server.

153417

Because the OneFS auditing system did not correctly convert a POSIX path withmultiple path separators (/) into a Microsoft UNC path, if NFS protocol auditing wasenabled, incorrect paths could have been recorded in the audit log andapplications that rely on the information in the audit log might have been adverselyaffected.

150920

If file system protocol auditing was enabled and a client opened a parent directoryand then opened a subdirectory or file within the parent directory, the auditingsystem might have incorrectly appended the subdirectory or file path to the parentdirectory path. If this occurred, the incorrect path might have caused an error in theauditing process and file system protocol events that were in the process of beinglogged might not have been captured. If the incorrect path was logged,applications that relied on file paths in the audit log might have been adverselyaffected.

150918

Resolved issues


File system

File system issues resolved in OneFS 7.2.0.4 ID

If a node ran for more than 497 days without being rebooted, an issue that affectedthe OneFS journal buffer sometimes disrupted the drive sync operation. If this issueoccurred, OneFS reported that the journal is full, and as a result, resources that arewaiting for a response from the journal enter a deadlock state. Any cluster thatcontains a node that has run for more than 497 consecutive days with no downtimemight unexpectedly reboot as a result of this issue.For more information, see ETA 202452 on the EMC Online Support site.

158417

If a node ran for eight months or longer without a reboot and the node’s internalclock rolled over, the universal memory allocator (UMA) processed an invalid value,which prevented the UMA from reclaiming any of the memory it had allocated. Ifthis issue occurred, the affected node might have run out of memory, causing thenode to unexpectedly reboot.

157489

On a compliance mode cluster, if either the retention period or the DOS ReadOnly flag that was applied to a file on a SyncIQ source cluster was changed after

the initial synchronization, subsequent incremental SyncIQ jobs failed, andmessages similar to the following appeared in the /var/log/messages file,

where <path> was the path to the file on the target cluster:

Local error : syncattr error for <path>: Readonly file system

This issue occurred because, under these conditions, an unnecessary chowncommand was also sent to the target cluster.

156270

If you installed the drive support package (DSP) 1.5 firmware update on a clusterthat contained a node with solid-state drives (SSDs) that were configured for use asL3 cache, the node might have rebooted unexpectedly. If a node rebooted for thisreason, messages similar to the following appeared in the /var/log/messagesfile:

Stack: -------------------------------------------------- kernel:sched_switch+0x125 kernel:mi_switch+0x12e kernel:sleepq_wait+0x3a kernel:_sleep+0x37a efs.ko:l3_mgmt_drive_state+0x9bd efs.ko:drv_change_drive_state+0x178 efs.ko:drv_down_drive_prepare+0x1c2 efs.ko:drv_down_drive+0x81efs.ko:drv_unmount_drive+0x176 efs.ko:drv_modify_drive_state_down+0x1d4 efs.ko:ifs_modify_drive_state+0x35a efs.ko:_sys_ifs_modify cpuid = 28 Panic occurred in module efs.ko loaded at 0xffffff87bde5a000:

154266

If OneFS was not mounted on a node and you ran the isi_flush --l3-fullcommand on that node, the node restarted unexpectedly and messages similar tothe following appeared in the /var/log/messages file:

Stack: --------------------------------------------------kernel:trap_fatal+0x9fkernel:trap_pfault+0x386kernel:trap+0x303efs.ko:mgmt_finish_super+0x4e

154264

Resolved issues

File system 41



efs.ko:l3_mgmt_nuke+0x70efs.ko:sysctl_l3_nuke+0xcbkernel:sysctl_root+0x132kernel:userland_sysctl+0x18fkernel:__sysctl+0xa9kernel:isi_syscall+0x39kernel:syscall+0x28b--------------------------------------------------

If you attempted to smartfail multiple nodes that were holding user locks, the lockwas held by LK client entries but not present in lock failover (LKF) entries. As aresult of this inconsistency, future lock attempts failed, and a manual release of thelock was required to grant the desired access.

153436

If you exceeded the number of recommended snapshots on a cluster, nodes in thecluster might have rebooted unexpectedly. If this issue occurred, lines similar tothe following appeared in the /var/log/messages file:

/boot/kernel.amd64/kernel: Stack:--------------------------------------------------/boot/kernel.amd64/kernel:kernel:isi_assert_halt+0x42/boot/kernel.amd64/kernel:efs.ko:pset_resize+0x107/boot/kernel.amd64/kernel:efs.ko:pset_add+0x50/boot/kernel.amd64/kernel:efs.ko:bam_data_lock_get_impl+0x1c8/boot/kernel.amd64/kernel:efs.ko:bam_data_lock_get+0x2b/boot/kernel.amd64/kernel:efs.ko:ifm_read_op_init+0xa8/boot/kernel.amd64/kernel:efs.ko:bam_mark_file_data+0xfd/boot/kernel.amd64/kernel:efs.ko:ifs_mark_file_data+0x373/boot/kernel.amd64/kernel:efs.ko:_sys_ifs_mark_file_data+0x166/boot/kernel.amd64/kernel:kernel:isi_syscall+0x53/boot/kernel.amd64/kernel:kernel:syscall+0x1db/boot/kernel.amd64/kernel:-------------------------

152660

If you ran a SmartPools job on a file with an alternate data stream (ADS), the jobsometimes failed, and continued to fail, even if the job was manually started. If theSmartPools job failed for this reason, the SmartPools process eventually stoppedrunning scheduled jobs, and this might have caused node pools to become full,degrading cluster performance. If this occurred, the SmartPools job reported anerror similar to the following in the job history report:

Node 6: pctl2_set_expattr failed: No such file or directory

151619

In some environments, where there was a heavy workload on the cluster, a nodecould run out of reserved kernel threads. This condition could have caused thenode to restart unexpectedly. If this iisue occurred, client connectivity to that nodewas interrupted, and lines similar to the following appeared in the /var/log/messages file:

panic @ time 1422835686.820, thread 0xffffff0248243000: ktp: No reserved threads leftcpuid = 6Panic occurred in module efs.ko loaded at 0xffffff87b7c84000:Stack: --------------------------------------------------efs.ko:ktp_assign_reserve+0x29fefs.ko:dfq_reassign_cb+0x9bkernel:_sx_xlock_hard+0x276kernel:_sx_xlock+0x4fefs.ko:lki_unlock_impl+0x306efs.ko:lk_unlock+0xbeefs.ko:bam_put_delete_lock_by_lin+0x36efs.ko:_bam_free_free_store+0x34

143399

Resolved issues



efs.ko:dfq_service_thread+0x139efs.ko:kt_main+0x83kernel:fork_exit+0x7f

Hardware

Hardware issues resolved in OneFS 7.2.0.4 ID

In rare cases, a failing dual in-line memory module (DIMM) caused a burst ofcorrectable error correcting code (ECC) errors. If this burst of errors was extreme—for example, if it occurred tens of thousands of times per hour—the performance ofthe node and the cluster might have been degraded. If this issue occurred, amessage similar to the following appeared tens of thousands of times per hour inthe /var/log/messages file and on the console:

RDIMM P1-DIMM1A (cpu 0, channel 0, dimm 0) non-fatal (correctable) ECC error

This issue continued until the DIMM was replaced.

156345

If the hardware abstraction layer (HAL) could not detect the network interface card(NIC) in an Isilon node, the HAL assigned an empty string to the related nic nameattribute in the lni.xml file, instead of returning an empty list. As a result, when

the flexnet configuration file (flx_config.xml) was updated with this

information, the related <nic-name> element in the flx_config.xml file was also

empty. The empty element was an invalid entry in the file and it rendered theflx_config.xml file unusable by the node. Because an updated

flx_config.xml file is propagated to all nodes in the cluster, this issue could

have caused all nodes in the cluster to have a flx_config.xml file with invalid

entries. If this occurred, client connections to the cluster might have beendisrupted until the unusable flx_config.xml file was replaced.

155333

If you ran the isi firmware status command on a cluster that contained

S210 nodes with common from factor power supply units (PSUs) that had partnumber 071-000-022-00, and if firmware package version 9.3.1 or later was notinstalled on the cluster, messages similar to the following appeared on theconsole:

CFFPS1_Blastoff CFFPS 09.05 2,7CFFPS1_Blastoff_DC CFFPS <CFFPS1_Blastoff_DC> 2,7CFFPS1_Optimus CFFPS <CFFPS1_Optimus_Acbel> 2,7CFFPS2_Blastoff CFFPS 09.05 2,7CFFPS2_Blastoff_DC CFFPS <CFFPS2_Blastoff_DC> 2,7CFFPS2_Optimus CFFPS <CFFPS2_Optimus_Acbel> 2,7

This issue occurred because earlier versions of OneFS, and earlier versions of thefirmware package did not recognize PSU part number 071-000-022-00.

Note

This issue can be resolved in earlier versions of OneFS 7.2.0.x by installingfirmware package version 9.3.1 or later.

154596

Resolved issues

Hardware 43


If a node with a LOX NVRAM card was unable to communicate with the NVRAM cardbecause the NVRAM card controller was unexpectedly reset, the cluster becameunresponsive to all client requests and data on the cluster was unavailable untilthe affected node was rebooted.

Note

Beginning in OneFS 7.2.0.4, if this issue is encountered, the affected node will berebooted automatically to prevent the cluster from becoming unresponsive.

153693

HDFS

HDFS issues resolved in OneFS 7.2.0.4 ID

Because OneFS treated query strings from WebHDFS clients as case-sensitive,some valid queries or operations might have failed. For example, OneFS expectedoperations such as GETFILESTATUS to be upper case, while Boolean arguments

and strings were expected to be lower case. As a result, queries similar to thefollowing might have failed because GetFileStatus is entered in mixed case:

http://isilon_ip:8082/webhdfs/v1/?op=GetFileStatus&user.name=root

156921

If multiple threads attempted to simultaneously update the stored list of blocked IPaddresses, the HDFS service restarted and client sessions were disconnected. Theservice was automatically restored after a few seconds.

156306

Because the WebHDFS CREATE operation does not explicitly instruct the system tocreate parent directories, if OneFS received a WebHDFS request to create a file ordirectory within a parent directory that did not yet exist, the request failed.Beginning in OneFS 7.2.0.4, OneFS will automatically create parent directories if itreceives a WebHDFS create request that requires them.

154404

Migration

Migration issues resolved in OneFS 7.2.0.4 ID

If you restarted a full or incremental isi_vol_copy migration three or more

times, and if a specific file was in the process of being copied to the target clustereach time the isi_vol_copy migration was restarted, the file was not

successfully copied to the target cluster.

Note

You might still encounter this issue if you restart an isi_vol_copy migration of a

single, large file three or more times.

154335

Resolved issues


Networking

Networking issues resolved in OneFS 7.2.0.4 ID

If an X410, S210, or HD400 node was configured to communicate through a 10GigE network interface card that was using the Broadcom NetXtreme Ethernet (BXE)driver, the node could have encountered an issue where the output of theifconfig command reported no carrier for the link. Toggling the interface up

and down did not resolve the issue and the node had to be rebooted to reestablishthe link.

154455

In some cases, the Mellanox InfiniBand driver waited for a hardware status registerto be cleared, which caused the driver to enter a read and retry loop. If the retryloop timed out, the driver attempted to print out a significant amount of systemdata three times. Since printing the system data output was enabled by default,and because there was a significant amount of data to be processed, the drivereventually triggered several Software Watchdog time outs. After five of these timeouts, the software watchdog rebooted the affected node and the following linesappeared in the /var/log/messages file:

Consecutive swatchdog state warnings: 5Opt-in swatchdog state warnings: 5Memory pressure swatchdog warnings: 0Majority of swatchdog warnings by opt-in threads!panic @ time 1394782550.534, thread 0xffffff06ebc8a000: Software watchdog timed outcpuid = 3Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: --------------------------------------------------kernel:isi_swatchdog_panic+0x15kernel:isi_swatchdog_hardclock+0x1eakernel:hardclock_cpu+0xd9kernel:lapic_handle_timer+0x15ckernel:spinlock_exit+0x32kernel:putcons+0x3ekernel:putchar+0x7akernel:kvprintf+0xa3bkernel:__vprintf+0x5bkernel:printf+0x70kernel:_fmt_flush+0x3dkernel:fmt_append+0x47kernel:fmt_print_num+0x1f7kernel:fmt_vprint+0x302kernel:fmt_print+0x5fmthca.ko:_mthca_mst_dump+0xc7mthca.ko:mthca_print_mst_dump+0x56mthca.ko:check_time+0x1cmthca.ko:mthca_cmd_poll+0x105mthca.ko:mthca_cmd_box+0x65mthca.ko:mthca_MAD_IFC+0x1cdmthca.ko:mthca_query_port+0x107kernel:port_active_handler+0x31kernel:sysctl_root+0xd6kernel:userland_sysctl+0x15ckernel:__sysctl+0xa9kernel:isi_syscall+0x53kernel:syscall+0x1db--------------------------------------------------

Note

Beginning in OneFS 7.2.0.4, the system data is not printed by default, allowing theread and retry loop to complete more quickly, and minimizing the chance of asoftware watchdog time out events.

153425

If Source Based Routing (SBR) was enabled on the cluster, client connections thatwere handled by SBR were disconnected if the MAC address (ARP entry) for therelevant subnet gateway expired. This issue occurred because nodes in the cluster

150647

Resolved issues

Networking 45


did not send an ARP request to refresh the MAC address and, as a result, attemptedto send network traffic to an incorrect destination MAC address for the gateway

Note

The default expiration time for an ARP entry is 10 minutes.

In rare cases, a race condition between the networking service and theSmartConnect service caused the SmartConnect service IP to be assigned to a nodebefore the network addresses were updated in the IP pool. If this issue occurred,connection requests to the cluster failed until the dynamic IP addresses in allnetwork pools were manually rebalanced by running the isi networkscommand with the --sc-rebalance-all option.

148736

The error messages that are logged if the flx_config.xml file cannot be read or

loaded were updated to facilitate diagnosis of the issue. Beginning in OneFS7.2.0.4, if the flx_config.xml file cannot be read or loaded—for example, if the

file cannot be read because a node’s network interface card is not accessible—lines similar to the following might appear in the /var/log/messages file and

the /var/log/isi_flexnet_d.log file:

isi_smartconnect[15482]: Error processing subnet in flexnet config: 7isi_smartconnect[15482]: parameter member iface-class of member nonexistantisi_smartconnect[15482]: /ifs/.ifsvar/modules/flexnet/flx_config.xml is corrupt (configuration errno 7: [/ifs/.ifsvar/ modules/flexnet/flx_config.xml] parameter 'member iface-class' of 'member' nonexistant)isi_smartconnect[15482]: Corrupt config found on /ifsisi_smartconnect[15482]: Unable to load FlexNet configurations.

141789

NFS

NFS issues resolved in OneFS 7.2.0.4 ID

If an NFS operation failed because the NFSv3 client that attempted to perform theoperation did not have adequate access permissions and then the same NFSv3client sent a request for file system information, the NFS server unexpectedlyrestarted and an error message similar to the following was logged in the inthe /var/log/nfs.log file:

[lwio] ASSERTION FAILED: Expression = (0), Message = 'Got access denied on stat-only open!'

156109

If all of the following conditions were met, users connected to an NFS exportreceived Permission denied errors when they attempted to access file systemobjects to which they should have had access:

l The --map-lookup-uid option was enabled (set to yes) for the affected

NFS export.

l The group owner of the affected file system object was one of the user'ssupplemental groups rather than the user's primary group.

l The cluster-side lookup for the user's supplemental groups failed.

154927

Resolved issues



This issue occurred because, when the lookup for the user's UID failed, OneFS didnot correctly apply supplemental group permissions to the user. As a result, theuser was denied access to the file system object.

If an NFSv3 or NFSv4 client attempted to move a subdirectory from one directory toanother within a parent directory to which a directory SmartQuota was applied, thefile could not be moved and messages similar to the following appeared on theconsole:

cannot move `directory_name1' to a subdirectory of itself, `directory_name2’

OR

cannot move `directory_name1' to `directory_name2’: Input/output error

This issue occurred even if the efs.quota.dir_rename_errno sysctl

parameter was set to 18.

Note

For more information about setting the efs.quota.dir_rename_errno sysctl

to a value of 18, see article 90185 on the EMC Online Support site.

For more information about configuring sysctl parameters in OneFS, see article89232 on the EMC Online Support site.

154910

In environments with NFS exports rules that referenced hundreds of unresolvablehostnames, the isi nfs exports list --verbose command consumed

too many reserved privileged socket connections when it was interacting with theisi_netgroup_d process. As a result, commands that used isi_rdo for intra-nodecommunications (for example, isi_gather_info or isi_for_array) failed to

complete for a few seconds. If this occurred, a message similar to the followingappeared on the console:

isi_rdo: [Errno 13] TCPTransport.bind_to_reserveport: Unable to bind to privileged port.

153457

If an NFS client attempted to send an NLM asynchronous request to lock a file andreceived an error in response to the request, a socket was opened but was notclosed. Over time, it was possible for the maximum number of open sockets to bereached. If this occurred, processes could not open new sockets on the affectednode. As a result, affected nodes might have been slow to respond to file lockrequests, or lock requests sent to an affected node might have timed out. If lockrequests timed out, NFS clients could have been prevented from accessing files orapplications on the cluster.

153453

If NFSv4 clients mounted NFS exports on the cluster through NFS aliases, it waspossible to encounter a race condition that caused the NFS service to unexpectedlyrestart. This issue was more likely to occur when many NFSv4 clients weresimultaneously mounting exports through NFS aliases. If this race condition wasencountered, the NFS service on the affected node unexpectedly restarted, NFSclients connected to the node might have been disconnected, some NFS clients

152337,151697

Resolved issues

NFS 47





might have been prevented from mounting an export, and the following linesappeared in the /var/log/messages file:

/lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/lwio-driver/nfs.so:NfsAssertionFailed+0xa4/usr/likewise/lib/lwio-driver/nfs.so:Nfs4OpenOwnerAddOpen+0x112/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4ProcOpen+0x2567/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4ProcCompound+0x5fe/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4Dispatch+0x43a/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4CallDispatch+0x3e/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7

If the Deny permission to modify files with DOS read-onlyattribute over Windows File Sharing (SMB) ACL policy option was

enabled, files to which the DOS READ-ONLY flag was applied might have

appeared writeable to NFS clients. As a result, a process on an NFS client mighthave attempted to write a change to a read-only file. If this occurred, the write tothe file might have been rejected by the NFS server without sending an error to theclient, or a permissions error might have appeared on the client when the file wasclosed or when the system attempted to move the file's data to persistent storage.

150347

Although the correct ACLS were assigned to a file—for example, std_delete ormodify—NFSv3 and NFSv4 clients could not delete, edit, or move the file unless thedelete_child permission was set on the parent directory.For more information, see ETA 204898 on the EMC Online Support site.

149743

OneFS API

OneFS API issues resolved in OneFS 7.2.0.4 ID

In OneFS, a numeric request ID is included in API client requests that are generatedby a script or application that relies on the isi.rest python module to

communicate with the OneFS API. Because, after generating 1431 request IDs, theformula that was used to generate the API request ID generated an ID of zero, whichis an invalid value, the next API request failed.The impact of the failed request depended on how the application or script thatsent the request was designed to handle this type of failure. If the request wasretried, a new request ID was generated and the request succeeded.

157487

OneFS web administration interface

OneFS web administration interface issues resolved in OneFS 7.2.0.4 ID

In the OneFS web administration interface, if the path to the shared directory for anSMB share was long enough to exceed the width of the SMB shares page, the

shared directory Edit link was sometimes not visible.

Note

The Edit link was accessible if you used the Tab key to move to the link.

144423

Resolved issues



SmarQuotas

SmartQuotas issues resolved in OneFS 7.2.0.4 ID

If you edited the usage limits of an existing directory quota in the OneFS webadministration interface, the Show Available Space as: Size of hardthreshold and Size of cluster options were missing from the Set a hardlimit section. This issue occurred if you chose the Size of cluster option

when you created the directory quota with a hard limit.

154331

If a SmartQuota threshold was exceeded and then files were moved or deleted tocorrect the issue, an alert was sometimes sent after the issue was corrected, eventhough the threshold was no longer exceeded. If this occurred, a false alert similarto the following was generated, where /ifs/<path> was the path of the directory

that temporarily exceeded the configured threshold:

Your root quota under /ifs/<path> has been exceeded.Your quota is 12 TB,and 6.7 TB is in use. You must delete files to bring usage below 12 TB before you can create or modify files. Please clean up and free some disk space.

149570

SMB

SMB issues resolved in OneFS 7.2.0.4 ID

If an SMB share on the cluster was configured with the Impersonate Guestsecurity setting set to Always, and if a large number of SMB sessions to the share

were being opened and closed, an extra cred file was opened for each SMBsession. However, when the SMB session ended, the extra cred file was notcorrectly closed and, over time, it was possible for the number of open cred files toreach the maximum number of open files allowed. If this occurred, new SMBsessions to the affected node could not be established, and messages similar tothe following appeared in the /var/log/lwiod.log file:

Failed to accept connection due to too many open files

157030

If you used Microsoft Management Console (MMC) to configure an SMB share onthe cluster from a Windows client and the file path to the share was invalid--forexample, if the file path did not exist on the cluster--the share was not created butno error was returned to the Windows client.Beginning in OneFS 7.2.0.4, if you attempt to create an SMB share with an invalidfile path through MMC, the following error appears on the client:

The device or directory does not exist.

155057

Due to a race condition that could occur when multiple SMB 1 sessions were beingopened on the same connection, the lwio process sometimes unexpectedlyrestarted. If the process restarted, SMB clients connected to the affected node weredisconnected from the cluster.

154962

If SMB auditing was enabled and you set the --max-cached-messagesparameter to 0 (zero) to disable message caching, the SMB client session and

negotiate requests that were waiting to be audited might have prevented new SMB

154271

Resolved issues

SmarQuotas 49


session and negotiate requests from being processed. If this occurred, SMB clientsmight have been prevented from establishing new connections to the cluster untilthe backlog of audit messages was processed.

Note

Beginning in OneFS 7.2.0.4, if you set the --max-cached-messages parameter

to 0 to disable message caching, and the Common Event Enabler (CEE) server

becomes unavailable, some audit messages that have not yet been logged mightbe discarded. This behavior prevents a backlog of requests from disrupting SMBclient requests and connections.

If a symbolic link was migrated from a Microsoft Windows client to the cluster, if thetool that was migrating the data attempted to update the attributes of a symboliclink that had already been migrated, the attributes could not be updated, and themigration of the symbolic link failed. For example, if you attempted to migrate datato an Isilon cluster using the EMCopy tool, and if the data contained a symboliclink, the symbolic link was initially migrated but EMCopy could not apply attributesto the symbolic link, and an error similar to the following appeared on the EMCopyclient:

ERROR (5) : \path_to_target\symbolic_link -> Unable to set access time

In addition, if the EMCopy tool attempt to retry the failed operation, the retry failedand an error similar to the following appeared on the EMCopy client:

ERROR (4392) : \path_to_target\symbolic_link -> Unable to open, Failed after 1 retries.

153972

If you attempted to migrate a directory symbolic link from a Microsoft Windowsclient to an Isilon cluster, OneFS returned a response to the Windows clientindicating that the operation was not supported, and the symbolic link was notmigrated. Depending on the application that was being used to migrate the data,error messages might have appeared on the client. For example, if you attemptedto migrate data to an Isilon cluster using the EMCopy tool, the symbolic links werenot migrated, and an error similar to the following appeared on the EMCopy client:

ERROR (50) : \path_to_target -> symbolic_link : symlink creation failure

153366

Under some circumstances, after an SMB2 client attempted to access a file on thecluster through a symbolic link, OneFS returned an ESYMLINKSMB2 error (aninternal error that is not seen on the client). If this error was returned, the symboliclink was resolved; however, some kernel memory that was allocated in order tocomplete the process of resolving the symbolic link was not deallocated after thelink was resolved. As a result, over time a node's kernel processes might have runout of memory to allocate. If this occurred, the affected node rebootedunexpectedly, and messages similar to the following appeared in the /var/log/messages file on the affected node:

/boot/kernel.amd64/kernel: Pageout daemon can't find enough free pages. System running low on memory. Check for memory pigs

152404

Resolved issues



If you queried the contents of a directory in an SMB share from a MicrosoftWindows command prompt, and if you included the search string *.* (asterisk–dot–asterisk) immediately after other search characters in the query—for example,dir do*.*—the search results did not include the expected files or directories.

This issue occurred because OneFS treated the dot as a character rather than as awildcard.

Note

Searches with only the *.* string listed the entire contents of the directory, asexpected.

149841

Resolved in OneFS 7.2.0.3 (Target Code)

Antivirus


If you configured an antivirus scan of a directory in the OneFS web administrationinterface or from the command-line interface, the forward slash (/) at the end of thedesignated path was removed from the search string. As a result, the antivirusscanner might have scanned more directories than expected. For example, if thefile system included both an /ifs/data directory, and an /ifs/data2directory, and if you configured the antivirus scanner to scan the /ifs/data/directory, because the forward slash (/) was not included in the path, the antivirusscanner would have scanned both the /ifs/data directory and the /ifs/data2 directory.

149763

Authentication


Due to a file descriptor (FD) leak that occurred when SMB clients listed files anddirectories within an SMB share, it was possible for OneFS to eventually run out ofavailable file descriptors. If this occurred, an ACCESS_DENIED orSTATUS_TOO_MANY_OPENED_FILES response was sent to SMB clients thatattempted to establish a new connection to the cluster or SMB clients that wereconnected to the cluster that attempted to view or open files. As a result, new SMBconnections could not be established, and SMB clients that were connected to thecluster could not view, list, or open files. If this issue occurred, messages similar to

the following appeared on the Dashboard > Event summary page of the OneFSweb administration interface, and in the command-line interface when you ran theisi events list -w | grep -i descriptor command:

System is running out of file descriptors

152809

Resolved issues

Resolved in OneFS 7.2.0.3 (Target Code) 51


In addition, messages similar to the following appeared in the /var/log/lwiod.log file:

Could not create socket: Too many open filesFailed to accept connection due to too many open files

In environments that relied on Kerberos authentication, if a machine password waschanged while there were many active SMB connections to the cluster, a racecondition could have taken place. If this occurred, the lwio process restartedunexpectedly, and lines similar to the following appeared in the /var/log/messages file:

Stack: --------------------------------------------------/usr/lib/libkrb5.so.3:krb5_copy_principal+0x33/usr/lib/kt_isi_pstore.so:krb5_pktd_get_next+0xe6/usr/lib/libkrb5.so.3:krb5_dyn_get_next+0x5e/usr/lib/libkrb5.so.3:krb5_rd_req_decoded_opt+0x4a4/usr/lib/libkrb5.so.3:krb5_rd_req_decoded+0x1d/usr/lib/libkrb5.so.3:krb5_rd_req+0xc1/usr/lib/libgssapi_krb5.so.2:krb5_gss_accept_sec_context+0x8fd/usr/lib/libgssapi_krb5.so.2:gss_accept_sec_context+0x22c/usr/lib/libgssapi_krb5.so.2:spnego_g/boot/kernel.amd64/kernel: ss_accept_sec_context+0x3d6 /usr/lib/libgssapi_krb5.so.2:gss_accept_sec_context+0x22c/usr/likewise/lib/lwio-driver/srv.so:SrvGssContinueNegotiate+0x2c5/usr/likewise/lib/lwio-driver/srv.so:SrvGssNegotiate+0xd3/usr/likewise/lib/lwio-driver/srv.so:SrvProcessSessionSetup_SMB_V2+0x6c6/usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecute_SMB_V2+0x1324/usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecuteInternal+0x51b/usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecuteWorkItemCallback+0x28/usr/likewise/lib/liblwbase.so.0:WorkThread+0x1f7/lib/libthr.so.3:_pthread_getprio+0x15d--------------------------------------------------

149810

If an LDAP server was configured to handle Virtual List View (VLV) search instead ofpaged search, and if LDAP users were listed, a memory leak occurred whenreturning more than one page of information. If users were listed a sufficiently largenumber of times, the lsass process could run out of memory and restartunexpectedly. As a result, SMB users could not be authenticated for the severalseconds it took for the lsass process to restart.

149797

Microsoft Active Directory (AD) users in trusted domains were allowed a higher levelof access to EMC Isilon clusters by default if RFC 2307 was enabled on the cluster,and if Windows Services for UNIX (SFU) was not configured on the trusted domain.

149795

If the lsassd process was not able to resolve user and group IDs, a message waslogged to the /var/log/messages file. In rare and extreme cases, excessive

logging could decrease the wear life of the boot disks on the affected node. If thisoccurred, lines similar to the following appeared in the /var/log/messagesfile:

Failed to map token token={UID:10116, GID:100, GROUPS={GID:100, GID:20042}, zone id=-1 }: Failed to lookup uid 10116: LW_ERROR_NO_SUCH_USER

149769

If you configured public key SSH authentication on a cluster running OneFS 7.1.1.2through OneFS 7.1.1.5 or OneFS 7.2.0.1 through OneFS 7.2.0.2, and then you

138180

Resolved issues



upgraded to OneFS 7.2.0.x, the root user could no longer log in to the clusterthrough SSH without entering their password.



A secondary worker process incorrectly attempted to remove extended userattributes from a WORM-committed file before updating the file retention date. As aresult, incremental SyncIQ jobs failed and error messages similar to the followingappeared in the /var/log/isi_migrate.log file, where <ATTR> was the name

of the specific attribute:

Error : Failed to delete user attribute <ATTR>: Read-only file system

154102

Reduces lock contention by changing the lock type used by the SyncIQ coordinatorwhen reading the siq-policies.gc file coordinator from an exclusive lock to a

shared lock.

149818

During a SyncIQ job, if the rm command that was run during the cleanup process of

the temporary working directory on the target cluster exited with an error, theSyncIQ policy went into an infinite loop, and data could not be synced to thecluster. If this occurred, a message similar to the following appeared inthe /var/log/isi_migrate.log file:

Unable to cleanup tmp working directory, error is …

149771

If you configured or displayed a SyncIQ performance rule in the OneFS webadministration interface, the bandwidth limit was described as kilobytes persecond (KB/sec). This output did not match the kilobits per second (kbps) valueseen in the command-line interface. The web interface and command-line interfacenow show the bandwidth limit value measured in kilobits per second.

149668

SyncIQ consumed excessive amounts of CPU during the phase when SyncIQ waslisting the contents of snapshot directories. This caused SyncIQ policies to takelonger to complete.

148431

If the Deny permission to modify files with a DOS read-onlyattribute over both UNIX (NFS) and Windows File Sharing(SMB) ACL policy option was enabled on the cluster, SyncIQ jobs failed when

SyncIQ attempted to synchronize a file or a folder to which the DOS read-onlyattribute is applied. If a SyncIQ job failed for this reason, an Operation notpermitted error message appeared in the /var/log/isi_migrate.logfile.

147200

If there was a group change on the source cluster while a SyncIQ job was in theprocess of starting, the SyncIQ scheduler might have stopped unexpectedly andthen automatically restarted. If this issue occurred, lines similar to the followingappeared in the /var/log/messages file:

Stack: --------------------------------------------------/lib/libc.so.7:__sys_kill+0xc

146395

Resolved issues



/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/usr/lib/libisi_migrate.so.2:siq_job_summary_save_new+0x200/usr/bin/isi_migr_sched:sched_main_node_work+0xf3f/usr/bin/isi_migr_sched:main+0xf13/usr/bin/isi_migr_sched:_start+0x8c--------------------------------------------------

When performing a SyncIQ job, in certain cases the target sworker would notacknowledge completing some tasks. Furthermore, if a SyncIQ job was very large, asource pworker could have accumulated a large number of un-acknowledged tasksand then waited for the target worker to acknowledge work that was alreadycompleted. If this occurred, the SyncIQ job would run indefinitely.

142966

If a directory was renamed to a path that had been excluded from a SyncIQ job, theSyncIQ state information for the directory and its children remained stored.However, the directory and its children tree were removed from the SyncIQ target.Any future changes that were made to the directory or its children were treated aschanges to included paths. If this occurred, a SyncIQ target error similar to thefollowing appeared in the /var/log/isi_migrate.log file:

Error : Unable to open Lin <LIN>: No such file or directory

If all directories that had been excluded from the SyncIQ job were removed in anincremental SyncIQ job, that incremental SyncIQ job could have failed while tryingto delete an excluded directory. If this occurred, an error similar to the followingappeared in the /var/log/messages or /var/log/isi_migrate.log files:

FAILED ASSERTION found == true

141584

All SyncIQ System B-Trees were protected at 8x mirrored, unnecessarily consumingdisk space.

Note

Beginning in OneFS 7.2.0.3, the protection policy for SyncIQ System B-Trees is setto the system disk pool default, which enhances SyncIQ performance. If you wantto change the default protection policy for SyncIQ System B-Trees, contact EMCIsilon Technical Support.

141176

If SyncIQ encountered an issue when processing an alternate data stream for adirectory, an incorrect directory path appeared in the error message that waslogged in the /var/log/isi_migrate.log file.

132233



When adding preformatted drives to a node, the drive did not get properlyrepurposed for the pool that it was being added to. If this issue occurred, data wasnot written to the drive, the drive remained unprovisioned until it was reformatted,

150040

Resolved issues



and messages similar to the following were logged in the /var/log/messagesfile:

isi_drive_repurpose_d[6008]: STORAGE drive (devid:x, lnum:y, bay:z)is not part of any DiskPool. Skipping this drive.

If the isi_cpool_rd driver was enabled and the FILE_OPEN_REPARSE_POINT flag wasalso enabled, then, if an SMB client attempted to open a symbolic link, thesymbolic link was inaccessible, and the following error appeared on the console:

STATUS_STOPPED_ON_SYMLINK

149010

If a file on the cluster was deleted or modified, and the most recent snapshot ofthat file was deleted, any changes to SmartPools policies might have silently failedto propagate to some snapshot files.

147958

Available space remaining on SSDs that are deployed as L3 cache was incorrectlyreported in the OneFS web administration interface.

141931

Diagnostic tools


When you selected Help > Help on This Page or Help > Online Help from the

General Settings page of the web administration interface, a page appeared withthe following message:

Not Found The requested URL /onefs/help/GUID-E395ABA6-B63A-4F40-8281-3574CCF6C8B1.html was not found on this server.

Note

This issue did not affect the SNMP Monitoring and SupportIQ general settingspages.

146846

If you ran the isi_gather_info command with the --ftp-proxy-port and

--save-only options or with the --ftp-proxy and --save-only options,

the specified FTP proxy port or FTP proxy host values were not saved. As a result,the desired FTP proxy settings had to be specified each time theisi_gather_info command was run.

142784

If you ran the isi_gather_info command on a node that was encountering

back-end network issues, the operation timed out after 3 minutes, and a messagesimilar to the following appeared on the console:

isi_rdo: [Errno 60] Operation timed out isi_gather_info: FAILED to make required directories on 1 nodes.

75677

Resolved issues

Diagnostic tools 55



If you ran the isi statistics client or isi statistics heatcommand with the --csv option, the following error appeared instead of the

statistics data:

unsupported operand type(s) for %: 'NoneType' and 'tuple'

153565

The stated storage capacities for /, /var, and /var/crash were reported 8 times

too high in the OneFS statistics system. This sometimes caused incorrect capacitysizes to appear in the web administration interface, SNMP queries, or in PlatformAPI-enabled applications.

151651

The following event message did not automatically clear after the boot drive wasreplaced:

Drive at Internal <drive_location> wear_life threshold exceeded: xx (Threshold: xx). Please schedule drive replacement.

150730

If memory allocated to the clusterwide event log monitoring process(isi_celog_monitor) became very fragmented, the isi_celog_monitor processstopped performing any work. As a result, no new events were recorded, alertsregarding detected events were not sent, and messages similar to the followingwere repeatedly logged in the /var/log/isi_celog_monitor.log file:

isi_celog_monitor[5723:MainThread:ceutil:92]ERROR: MemoryErrorisi_celog_monitor[5723:MainThread:ceutil:89]ERROR: Exception in serve_forever()

Note

Allocated memory is considered fragmented when it is not stored in contiguousblocks. Memory allocated to the CELOG process is more likely to becomefragmented in environments with frequent configuration changes and in whichmany CELOG events are being generated.

150625

If the CELOG notification master node went down, delivery of event notificationsstopped until the down node returned to service or until the CELOG notificationsubsystem (isi_celog_notification) was restarted, at which point the subsystemwould elect a new notification master with the updated group information.

149682

If phase 2 of an FSAnalyze job took longer than 100 minutes to complete, the jobsometimes stopped progressing, might have progressed very slowly, or might havefailed and then resumed. This issue occurred because, during phase 2, theFSAnalyze job updated an SQLite index, and while the job was updating this index,it could not handle other job engine requests, which prevented the job fromprogressing. In addition, if, while the SQLite index was being created, the numberof requests waiting to be handled grew to more than 100 (the maximum allowed),the job was terminated and then resumed from a point before the 100 minutes hadelapsed.

147009

The isi_papi_d process did not properly handle CELOG events that referenced apath name that contained special characters or multibyte characters. If this issue

144742

Resolved issues



occurred, a message similar to the following appeared in the /var/log/isi_papi_d.log file:

isi_papi_d[37840]: [0x80a403500]: ERROR Event 5.705 specifier parse error:"enforcement": "advisory", "domain": "directory /ifs/data/\xe8\xa9\xa6\xe9\xa8\x93&dios","name": "exceeded", "val": 0.0, "devid": 0,"lnn": 0}

If the cluster was being monitored by an InsightIQ server, this issue might alsohave resulted in a lost connection between the InsightIQ server and the cluster.

The physIfaces object identifier (OID) was incorrectly named in the ISILON-TRAP-MIB.txt file, available in the General Settings > SNMP Monitoring tab

of the OneFS web administration interface. As a result, it was not always possibleto monitor the cluster through SNMP.

144382

Protocol event logging in the /var/log/audit_protocol.log file always

showed a value of 0 bytes written for a write event, and close events did not have

bytes written or bytes read fields.

138957

If the snmpd process failed to load the /etc/mcp/sys/lni.xmlfile, /etc/ifs/local.xml file, or /etc/ifs/array.xml file, a memory leak

could occur. A memory leak in the snmpd process could have caused SNMPmonitoring to be interrupted until the snmpd process was manually stopped andthen restarted.

138691

Temporary SQLite files were created in the /var/tmp directory more frequently

than was necessary. Because writes to the /var partition can decrease the wear

life of boot disks on an affected node, an index was added to the /ifs/.ifsvar/db/celog/events.db SQLite database file to reduce the

frequency with which these files are written to the /var/tmp directory.

135108

If you attempted to run Insight IQ 3.1.x to monitor a cluster, disk statistics were notbeing collected due to the Platform API disk statistics query returning an error. As aresult, InsightIQ could not be used to collect drive statistics from the cluster.

129187

File system


If SMB2 symbolic link translation was disabled on the cluster by running thefollowing command:

isi_gconfig registry.Services.lwio.Parameters.Drivers.onefs.SMB2Symlinks=0

Symbolic links to directories might have failed and an error similar to the followingmight have appeared on the client:

The symbolic link cannot be followed because its type is disabled.

150833

If L3 was enabled in a cluster environment using Self-Encrypting Drives (SED) thatpreviously had it disabled, the SSDs were smartfailed but not re-added as L3devices. As a result, if you ran the isi_devices command, it was possible to see

149778

Resolved issues

File system 57


that the SSDs never automatically transitioned from the [REPLACE] back to the

[PREPARING] state, and false drive replacement alerts were generated.

If you copy configuration files while the isi_mcp process is running, by design, theMD5 command will validate the files in question. If two files with the same file

name were copied almost simultaneously, and the second file was started, theMD5 process on the first file could have been truncated. As a result, an infinite loopoccurred whereby the isi_mcp child process would stop responding. In the belowexample, 93.0 was the CPU usage, and the process was running for more than6400 minutes (106 hours).

isi_for_array -s 'ps auwxxxHl | grep isi_mcp | grep -vi grep'4284 93.0 0.0 55744 8176 ?? R 2Mar14 6425:30.28 isi_mcp: child (isi_mcp)

149759

Isilon A100 nodes might have restarted unexpectedly during a group change,resulting in data unavailability. If this issue occurred, lines similar to the followingappeared in the /var/log/messages file:

Software Watchdog failed on CPU 1 (82353: kt: rtxn_split [-])Stack: --------------------------------------------------kernel:isi_hash_resize+0x31fefs.ko:lki_handle_async_reacquire+0x262efs.ko:lki_group_change_commit+0x727efs.ko:lk_group_change_commit_initiator+0x32efs.ko:rtxn_sync_locks_done+0x12eefs.ko:rtxn_split+0x4e9efs.ko:rtxn_split_courtship_thread+0x388efs.ko:kt_main+0x83kernel:fork_exit+0x77--------------------------------------------------

149687

Due to a race condition that could occur while file metadata was being upgradedfollowing an upgrade from OneFS 6.5.5.x to OneFS 7.2.0.x, a node might haveunexpectedly restarted. If this issue occurred, the following lines appeared inthe /var/log/messages file on the affected node:

panic @ time 1406566983.500, thread0xffffff07b80ae560:Assertion FailureStack:-------------------------------------------------kernel:isi_assert_halt+0x42efs.ko:ifm_di_get_current_protection+0x61efs.ko:ifm_get_parity_flag+0x33efs.ko:bam_read_block+0x5fefs.ko:bam_read_range+0xd8efs.ko:bam_read+0x613efs.ko:bam_read_uio+0x36efs.ko:bam_coal_read_wantlock+0x37aefs.ko:ifs_vnop_wrapunlocked_read+0x2c6nfsserver.ko:nfsvno_read+0x58bnfsserver.ko:nfsrvd_read+0x55cnfsserver.ko:nfsrvd_dorpc+0x4d3nfsserver.ko:nfs_proc+0x243nfsserver.ko:nfssvc_program+0x7b1krpc.ko:svc_run_internal+0x3c6krpc.ko:svc_thread_start+0xakernel:fork_exit+0x7f--------------------------------------------------*** FAILED ASSERTION ifm_di_getinodeversion(dip)== 6 @/build/mnt/src/sys/ifs/ifm/ifm_dinode.c:397:ifm_di_get_current_protection: wronginode

149669

Resolved issues



It was possible for a race condition between the group change and the deadlockprobe—a mechanism that attempts to detect and correct deadlock conditions—tocause a node to restart unexpectedly.

149667

If a cluster had run for more than 248.5 consecutive days, an issue that affectedthe OneFS journal buffer could sometimes disrupt the drive sync operation. Whenthis issue occurred, OneFS reported that the journal was full, and as a result,resources that were waiting for a response from the journal entered a deadlockedstate. When the journal was in this state, nodes that were affected rebooted toclear the deadlock. In addition, a message similar to the following appeared inthe /var/log/messages file:

/boot/kernel.amd64/kernel:efs.ko:rbm_buf_timelock_panic_all_cb+0xd0

148960

Under rare circumstances, the lock subsystem did not drain fast enough, causingan assertion failure. When this issue occurred, the node restarted, and thefollowing stack was logged to the /var/log/messages file:

Stack: --------------------------------------------------kernel:isi_assert_halt+0x2ekernel:lki_lazy_drain+0xf76kernel:_lki_split_drain_locks+0xa8kernel:kt_main+0x15ekernel:fork_exit+0x75--------------------------------------------------<3>*** FAILED ASSERTION must_drain ==> !pool->lazy_queue_size || !li->mounted @ /b/mnt/src/sys/ifs/lock/lk_initiator.c:13270: lki_lazy_drain_pool on LK_DOMAIN_DATALOCK took 302454934. lazy queue 1870 -> 11. li->llw_count = 0, iter_count=11087431 chk_space_time = 0, chk_space_iters = 0 llw_time = 880073 llw_iters = 2503 reject_drain_time = 1550050 reject_drain_iters = 1 yield_time = 282713930 yield_iters = 11084926 shrink_lazy_queue_count = 11087431

148123

If an SMB client changed the letter case of the name of a file or directory stored onthe cluster, the file or directory's ctime (change time) value was not updated. As aresult, the affected file or directory was not backed up during incremental backups.

147606

If SmartCache write caching was enabled and if clients were performingsynchronous writes to the cluster, it was possible to encounter a runtime assertthat caused an affected node to unexpectedly restart. If this issue occurred, linessimilar to the following appeared in the /var/log/messages file:

Stack: --------------------------------------------------kernel:cregion_issue_write+0xdcbkernel:_cregion_write+0x1f5kernel:cregion_write+0x24kernel:cregion_flush+0xf6kernel:coalescer_flush_overlapping+0x219kernel:coalescer_flush_local_overlap+0x275kernel:bam_coal_flush_local_overlap+0x2d--------------------------------------------------

146541

While running an initial SyncIQ job, the target root directory and its contentsremained in a read-write state instead of read-only until the SyncIQ job completed.As a result, files could be deleted or modified in the target cluster.

145714

Resolved issues

File system 59


SNMP monitoring with Nagios failed when using an Isilon-specific Nagiosconfiguration file. The following error appeared in Nagios when querying thecluster:

External command error: Timeout: No Response from <IPaddress>:161

144278

In rare cases, an SMB client released its lease on a file before OneFS received arequest to release the lease. If this occurred, the lwio process restartedunexpectedly, SMB clients connected to the affected node were disconnected, andlines similar to the following appeared in the /var/log/messages file:

Stack: --------------------------------------------------lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0x9f/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockBreakFillBuffer_inlock+0xbf/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockComplete_inlock+0x7e/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockBreakToRH+0x187/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xf3/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x8c/usr/likewise/lib/liblwbase.so.0:EventThread+0x333/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec/lib/libthr.so.3:_pthread_getprio+0x15d--------------------------------------------------

139833

File transfer

File transfer issues resolved in OneFS 7.2.0.3 ID

If a client was connected to the cluster through vsftpd and ran the ls or dircommands for directories that contained more than 100,000 files, the vsftpdprocess reached its memory limit, and a memory allocation error occurred. As aresult, the files in the affected directories could not be listed.

149665

Hardware


The isi firmware status command did not report the firmware version of the

Mellanox IB/NVRAM card. This issue affected the S200, X200, X400, and NL400series nodes.

150725

The LED on the chassis turned solid red for a drive prior to completion of thesmartfail process. As a result, the drive might have been replaced prematurely,possibly causing data loss.

145348

If you installed a new drive support package (DSP) on a node that already had aDSP installed and you then attempted to update a drive whose update wasincluded only in the new DSP, the fwupdate command did not update the drive

unless either the isi_drive_d process or the affected node was restarted. If thisissue occurred, and you ran the isi devices –a fwupdate command before

145268

Resolved issues



restarting the isi_drive_d process or the node, the following error appeared on theconsole:

'fwupdate' action complete, 0 drives updated, 0 updates failed

If you attempted to install a node firmware package that did not have support forthe Chassis Management Controller (CMC) component, on a node that contained aCMC–for example, an S210, X210, X410, NL410, or HD400 node–the installationfailed and an unhandled exception error similar to the following appeared on theconsole:

FAILED : Unhandled exception in safe.id.cmc ('empty_fw_object' object has no attribute 'update')

Note

Beginning in OneFS 7.2.0.3, if the preceding conditions exist, the followingmessage appears on the console, where <partnumber> is the part number of theCMC:

FW archive does not have support for PN <partnumber>

144708

The isi_sasphymon process could potentially close a valid 0 file descriptor. If thisissue occurred, any drive associated with the file descriptor would no longer bemonitored by the isi_sasphymon process. This issue would also cause excessivelogging in the /var/log/isi_sasphymon.log file similar to the following:

isi_sasphymon[3979]: Can't get SCSI Log Sense page 0x18 from Bay 2 - scan 6isi_sasphymon[3979]: cam_get_inquiry: error from cam_send_ccb: 9isi_sasphymon[3979]: scsi_get_info: error from scsi_get_inquiry

143042

If you ran the isi_reformat_node command on a node containing self-

encrypting drives (SEDs), sometimes the SEDs could not be released fromownership, and when the node rebooted, the unreleased SEDs came up in aSED_ERROR state.

Note

Beginning in OneFS 7.2.0.3, if you run the isi_reformat_node command on a

node containing self-encrypting drives (SEDs) that cannot be released fromownership, the following messages appear on the console where <affected drives> isa list of the affected drives:

isi_wipe_disk has failed in isi_reformat_nodeFailed to wipe the following drives:<affected drives>Opening zsh to allow user to revert these drives using the '/usr/bin/isi_hwtools/isi_sed revert' command. To continue with the reformat, enter 'exit' in the shell.

Note

If the reformat process continues without reverting the listed drives, it is likely theywill be in a SED_ERROR state on the next node boot.

141983

Resolved issues

Hardware 61


After replacing boot flash drives to a node and running the gmirror statuscommand, the correct number of active components was displayed but a status ofDEGRADED was incorrectly returned for some components in the output. In the

example below, the keystore and mfg mirrors were affected:

Name Status Components mirror/root0 COMPLETE ad7p4 ad4p4 mirror/keystore DEGRADED ad7p11 ad4p12 mirror/var-crash COMPLETE ad7p10 mirror/mfg DEGRADED ad7p9 ad4p10mirror/journal-backup COMPLETE ad7p8 ad4p8 mirror/var1 COMPLETE ad7p7 ad4p7 mirror/var0 COMPLETE ad7p6 ad4p6 mirror/root1 COMPLETE ad7p5 ad4p51

Although the operation of the node was unaffected, the incorrect Status sometimesled to unnecessary service calls for hardware exchanges.

128304

HDFS


If the maximum number of HDFS client connections to the cluster was reached, allworker threads remained busy during processing. As a result, no further clusterconnections could be established, namenode remote procedure calls (RPCs) werequeued for long periods of time, and the HDFS server incorrectly appeared to beunavailable.

154175

If you tried to change ownership of files or directories through the WebHDFS RESTAPI by setting only the owning user or the owning group of a file or directory (butnot both), an exception error similar to the following might have appeared in thecommand-line interface:

"RemoteException":{ "exception" : "SecurityException","javaClassName": "java.lang.SecurityException","message" : "Failed to get id rec: 1:" }}

Additionally, Ambari 2.1 might have failed to install Hortonworks Data Platform 2.3through the WebHDFS REST API.

153786

The datanode port that HDFS listens on was changed from 1021 to 585 to avoidconflicts with other processes that might have been listening on the same port.

152933

If the maximum number of HDFS client connections to the cluster was reached, allworker threads remained busy during processing. As a result, no further clusterconnections could be established, namenode remote procedure calls (RPCs) werequeued for long periods of time, and the HDFS server incorrectly appeared to beunavailable.

147723

Resolved issues



The isi_hdfs_d proccess no longer unnecessarily logs the following message tothe /var/log/isi_hdfs_d.log file:

RPC getDatanodeReport raised exception: Could not parse 'GetDatanodeReport'

146753

When Kerberos authentication was used with HDFS, the isi_hdfs_d process couldeventually run out of memory and unexpectedly stop. If this issue occurred, anisi_hdfs_d.core file was created in the /var/log/crash/ directory, and the

following lines appeared in the /var/log/messages file:

isi_hdfs_d: isi_hdfs_d: *** FAILED ASSERTION cv->members @ s11n.c:137: oom[kern_sig.c:3376](pid 27685=""isi_hdfs_d"")(tid=102752) Stack trace:Stack: --------------------------------------------------/lib/libc.so.7:thr_kill+0xc/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:file_status_array_append+0x9b/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:util_make_directory_listing+0x90d/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:_rpc2_getListing_ap_2_0_2+0xbf/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:rpc_ver2_2_execute+0x21c/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:_asyncrpctask+0x3a/boot/kernel.amd64/kernel: /usr/bin/isi_hdfs_d:_workerthr+0x257/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d/boot/kernel.amd64/kernel: --------------------------------------------------

146026

Java class names were not included for remote exceptions in WebHDFS. Theexclusion of Java class names might have caused unexpected errors, similar to thefollowing, when creating and writing a file through WebHDFS:

mkdir: The requested file or directory does not exist in the filesystem.

142056

If a Hadoop client tried to export data in Hive to a directory that already existed,and the client did not have permissions on the directory to make the change, themkdir command failed. If the mkdir command failed, an error similar to the

following appeared on the client:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.CopyTask

In addition, the following line appeared in the /var/log/isi_hdfs_d.log file

on the node:

pfs_mkdir_p failed in mkdirs with unusual errno: Operation not permitted

142049

The Ambari server sent a check_host command instead of a host_checkcommand. If this issue occurred, the following message was logged tothe /var/log/isi_hdfs_d.log file:

Ambari: Tried to access an undefined component name, which is most likely unsupported: check_host

139269

Resolved issues

HDFS 63

Job engine

Job engine issues resolved in OneFS 7.2.0.3 ID

If you tried to start a PermissionRepair job from the ClusterManagement > JobOperations > Job Types > Start Job dialog, and you set the Repair Type to

Clone: copy permissions from the chosen path to all files and directoriesor Inherit: recursively apply an ACL, the Template File or Directory field did

not appear. As a result, you could not configure a PermissionRepair job toperform a Clone-type or Inherit-type repair.

154094

If a MediaScan job detected an ECC error in a file’s data, the job did not properlyrestripe the file away from the ECC error. As a result, the file was underprotected,and was at risk for data loss if further damage occurred to the data–for example, ifa device containing a copy of the data failed. If this issue occurred, a messagesimilar to the following appeared in the /var/log/isi_job_d.log file:

mark_lin_for_repair:1331: Marking for repair: 1:0001:0003::HEAD

148016

In the web administration interface, the Edit Job Type Details page for jobs that

had a schedule set to Every Sunday at 12:00am displayed Close and Edit JobType buttons instead of Cancel and Save Changes buttons.

144692

Migration


During a full or incremental migration, if midfile checkpoints were enabled or if theWINDOW_MAX_SIZE > 0 environmental variable was set, an error similar to the

following appeared in the /var/log/isi_vol_copy.log file and on the

console, and the migration had to be restarted from the beginning:

createleaves() - ./file19: not found on tape first = 11988, curfile.ino = 19619

149816

During an incremental migration through the isi_vol_copy utility, if a socket fileneeded to be extracted or migrated, the migration failed and an error similar to thefollowing appeared on the console:

./f2: cannot create file: Operation not supported

149815

If you renamed or deleted a directory on the source cluster prior to performing anincremental migration, and if you then created a hard link file with the originalname of the deleted or renamed directory, the incremental migration failed. If thisoccurred, errors similar to the following appeared in the /var/log/isi_vol_copy.log file and also on the console:

[INFO] [isi_vol_copy stdout]: Error:[INFO] [isi_vol_copy stdout]: Failed to create hardlink ./HL_PREFIX_DIR4->./DIR4: err:Operation not permitted[1]…[INFO] [isi_vol_copy stdout]: *** FAILED ASSERTION !"fixupentrytype()" @ /b/mnt/src/isilon/lib/isi_emctar/updated.c:431:

149814

Resolved issues



If the isi_vol_copy_vnx tool was used to migrate data from a VNX array to a OneFScluster, and if the data contained any NULL SIDs, the migration process stopped,and a message similar to the following appeared in the /var/log/messagesfile:

/boot/kernel.amd64/kernel:[bam_acl.c:190](pid 83648="isi_vol_copy_vnx")(tid=101308) ifs_verify_acl: Failed verifyingsecurity_ace on lin:1:02df:da06. Ace#3. An ACE cannot have a NULL identity type.

149760

Networking


S210 and X410 nodes that were configured to communicate through a 10 GigEnetwork interface card that was using the BXE driver, and that were also configuredto use aggregate interfaces with the link aggregation control protocol (LACP),experienced connectivity issues with those interfaces if the node was rebooted or ifthe MTU on those interfaces was reconfigured.

150883,152083

If you performed an extended link flapping test on a node containing a Chelsionetwork interface card (NIC), the NIC eventually became unresponsive, and had tobe manually disabled and then re-enabled before it resumed normal operations.While the NIC was unresponsive, external clients could not communicate with thenode; however, because the node’s back-end communication was unaffected, dataon the node was still available to clients connected to the cluster through othernodes.

149767

If the cluster contained X410, S210, or HD400 nodes that had BXE 10 GigE NICcards and any external network subnets connected to the cluster were set to 9000MTU, an error similar to the following appeared in the /var/log/messages file,

and the affected nodes rebooted:

ERROR: mbuf alloc fail for fp[01] rx chain (55)

For more information, see ETA 200096 on the EMC Online Support site.

148695,152083

A memory leak in the networking process, isi_flexnet_d, might have caused theprocess to stop running, and could have damaged the /etc/ifs/flx_config.xml file. If the file was damaged, all clients could have lost their

connections to the cluster.

141822

NFS


Because OneFS 7.2.0 and later returned 64-bit NFS cookies, some older, 32-bit NFSclients were unable to correctly handle read directory (readdir) and extended readdirectory (readdirplus) responses from OneFS. In some cases, the affected 32-bitclients became unresponsive, and in other cases, the clients could not view all of

153737

Resolved issues

Networking 65



the directories in an NFS export. In the latter cases, the client could typically viewthe current directory (".") and its parent directory ("..").For more information, see ETA 205085 on the EMC Online Support site.

Because NFSv3 Kerberos authentication requires all NFS procedure calls to useRPCSEC_GSS authentication, some older Linux clients—for example, RHEL 5 clients—that started the FSINFO procedure call with AUTH_NULL authentication beforeattempting the FSINFO procedure call with RPCSEC_GSS authentication, wereprevented from mounting an NFS export if the export was configured with theKerberos V5 (krb5) security type. Newer clients that started the FSINFO procedurecall with RPCSEC_GSS were not affected.

151582

If the lsass process was not running when NFS configuration information wasrefreshed on the cluster, it was possible for empty netgroups to be propagated tosome or all of the cluster nodes. If this issue occurred, NFS clients were unable tomount NFS exports.

149781

If you created a hard link that contained a colon (:) from an NFSv3 client, the colonand any characters that followed it were removed from the hard link name. As aresult, the hard link on the cluster did not have the correct name.If removing the colon and following characters resulted in changing the hard linkname to a file name that was already in use in the destination directory on thecluster, a file name conflict resulted, and a "File exists” error appeared on the NFSclient.

148001

If a client held a read lock on a file and an NFS4 client checked the lock status ofthe file, the response from the cluster incorrectly reported that the original clientwas holding a write lock on the file.This issue might have caused the program that the NFS client was using to workimproperly.

147638

If an NFS client attempted to list a file or directory at the root of an NFS exportmount point directory that began with two dots—for example, /mnt/nfs_export/..my_folder— and the requested file or directory did not exist,

OneFS returned the contents of the NFS export instead of a “file not found” errormessage.

147404

A memory leak in the isi_papi_d process might have caused an out-of-memory errorwhen running isi nfs exports commands.

145209

Because the nfs and onefs_nfs drivers (and the flt_audit_nfs driver, if you enabledprotocol auditing) share the same process ID, if one of these drivers failed to start,the MCP process did not always detect the failure and did not always restart thestopped drivers.

144485

On the NFS Export Details page, if you added a secondary group for either theMap Root User or the Map Non Root User, the value field did not display until yourefreshed the web administration interface page.

142343

If the NFS server shut down in the middle of a NFS export refresh, it was possible foran NFS resolver thread to be in use when the NFS server was attempting to shutdown. If this issue occurred, a core file might have been created, and lines similarto the following appeared in the /var/log/messages file:

Stack: --------------------------------------------------/lib/libthr.so.3:_umtx_op_err+0xa

142296

Resolved issues




/usr/likewise/lib/liblwbase.so.0:WaiterSleep+0xe0/usr/likewise/lib/liblwbase.so.0:LwRtlMvarTake+0x69/usr/likewise/lib/lwio-driver/nfs.so:NfsLockMvar+0x19/usr/likewise/lib/lwio-driver/nfs.so:NfsExportManagerResolveCallback+0x5f8/usr/likewise/lib/liblwbase.so.0:SparkWorkItem+0x56/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/lib/libthr.so.3:_pthread_getprio+0x15d-------------------------------------------------

It was possible for two NFS threads to create a race condition when the threadswere inserting NFS export information into the hash table. This race condition coulddamage the hash table, causing the NFS process to restart. When this racecondition occurred, lines similar to the following appeared in the /var/log/messages file:

/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid 7997="nfs")(tid=100859) Stack trace:/boot/kernel.amd64/kernel: Stack: --------------------------------------------------/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:HashLookup+0x31/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:LwRtlHashTableInsert+0x5a/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:LwRtlHashTableResize+0xaf/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:LwRtlHashTableResizeAndInsert+0x2e/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:LwRtlHashMapInsert+0x6f/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/nfs.so:NfsExportManagerResolveCallback+0x66/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:SparkWorkItem+0x563/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d/boot/kernel.amd64/kernel: --------------------------------------------------/boot/kernel.amd64/kernel: pid 7997 (nfs), uid 0: exited on signal 11 (core dumped)

139673

If there was a group change in the cluster, it was possible that the NFS server wouldnot shut down after a set period of time. After the set period of time elapsed, theNFS server was forcefully signaled to stop. When the NFS server was forcefullystopped, a core file was created and lines similar to the following appeared inthe /var/log/messages file:

Stack: --------------------------------------------------/lib/libc.so.7:_kevent+0xc/usr/likewise/lib/liblwbase.so.0:EventThread+0x964/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/lib/libthr.so.3:_pthread_getprio+0x15d--------------------------------------------------

131197

Resolved issues

NFS 67

SmartLock

SmartLock issues resolved in OneFS 7.2.0.3 ID

If the compadmin user on a compliance mode cluster ran the sudoisi_gather_info command, the sudo isi_gather_command successfully

gathered all of the expected files on the local node, but was unable to gather all ofthe expected files on remote nodes. This issue occurred because some files on thecluster can be read only by the root user, and the sudo command did not enable

the compadmin user to run commands as root on remote nodes.

139167

SmartQuotas


If you configured a storage quota on a directory with a pathname that contained asingle, multibyte character, and if a quota notification email was sent for thatdirectory, the multibyte character in the pathname that appeared in the quotanotification email was replaced with an incorrect character, such as a questionmark.

149758

If you changed a quota's soft or hard limit through the web administrationinterface, the Enforced parameter changed from Yes to No, making the quota

accounting-only. Any usage limit that was set was not enforced.

148807

If a quota was created with a hard, soft, or advisory threshold that included adecimal point–for example, isi quota quotas create --hard-threshold=4.5T–the operation failed, and a message similar to the following

appeared on the console:

Unknown suffix '.5T'; expected one of ['b', 'K', 'M', 'G', 'T', 'P', 'B', 'KB', 'MB','GB', 'TB', 'PB']

145943

In the web administration interface, after clicking View details for a quota on the

Quotas & Usage page, the %Used value under Usage Limits did not always

correctly match the percentage value displayed under %Used in the top summaryrow for the quota.

123355

SMB


If you created an SMB share and then created a single user or group with run-asroot permissions to the share, the user or group could not be deleted, and the useror group’s run-as-root permission could not be modified. If you attempted to deletethe user or group, the command appeared to successfully complete; however, theuser or group was not deleted. If you attempted to modify the user or group’spermissions, the command appeared to successfully complete; however, theoriginal permissions entry was not removed, and an additional entry, with themodified permissions, was added to the share. In the example below, the domain

146616

Resolved issues



admins group displays the duplicate entries created when the group’s permissionsrun-as-root was modified:

Account Account Type Run as Root Permission Type Permission-------------------------------------------------------------------------EXAMPLE\domain admins group True allow full EXAMPLE\domain users group False allow changeEXAMPLE\domain admins group False allow full

SMB clients were unable to display alternate data stream information for files onthe cluster that contained alternate data streams.

153666

During an upgrade to OneFS 7.2.0.x, an upgrade script did not properly interpret anempty string value for the HostAcl parameter in the /ifs/.ifsvar/main_config.gc file. This caused SMB shares to be inaccessible after the

upgrade was complete, and as a result, the SMB shares had to be re-created. If thisoccurred, output similar to the following appeared after running the isi_gconfigregistry.Services.lwio.Parameters.Drivers.srv.HostAclcommand:

registry.Services.lwio.Parameters.Drivers.srv.HostAcl (char**) = [ "" ]

150658

If the OneFS file system quota was exceeded, an incorrectSTATUS_QUOTA_EXCEEDED error was returned during SMB1 and SMB2 write

operations instead of STATUS_DISK_FULL. As a result, the client ignored the

error and write requests continued, but were not applied, because they were overquota. Any binary files, such as PST files would become unusable.

149811

In OneFS 7.2.0.x clusters, the SMB2 connection was sending invalid share flags. Asa result, if the inheritable-path ACL was set while creating a share, access to fileson a cluster using UNC path hyperlinks in Microsoft Outlook emails failed to open.

149796

If you ran the isi statistics client command to view information about

some SMB1 and SMB2 read and write operations–for example, thenamespace_write operation–the word UNKNOWN appeared in the UserNamecolumn, instead of a valid user name. As a result, if you ran scripts to filter read/write operations per user, the scripts did not work correctly.

149683

If you attempted to override the default Windows ACL settings that were applied toan SMB share, by adding custom ACLs to the /ifs/.ifsvar/smb/isi-share-default-acl/ template directory, the overrides were not implemented. As a

result, actual access permissions on the SMB share did not match expectedresults.

149664

If the FILE_OPEN_REPARSE_POINT flag was enabled, and an SMB client opened

an alternate data stream (ADS) through a symbolic link, the ADS was inaccessible,and the following error appeared on the console:STATUS_STOPPED_ON_SYMLINK

148734

If you ran the EMCopy application to migrate data containing symbolic links to thecluster, the SMB process unexpectedly restarted because of an lwio processassertion failure. When the SMB process restarted, clients were disconnected from

145612

Resolved issues

SMB 69


the cluster and the following error message appeared in the /var/log/lwiod.log file:

ASSERTION FAILED: Expression = (pFcb->bIsDirectory == bIsDirectory)

In addition, the following lines appeared in the /var/log/messages file:

/lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0xa4/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreateFCB+0x896/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreateFileCcb+0x3b0/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreateInternal+0x90e/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreate+0x28d/usr/likewise/lib/lwio-driver/onefs.so:OnefsProcessIrpContext+0x12b/usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/lib/libthr.so.3:_pthread_getprio+0x15d

On the client, EMCopy might have displayed the following error message:

ERROR (50) : \\TARGET\symlink -> folder:symlink creation failure


Upgrade and installation issues resolved in OneFS 7.2.0.3 ID

If the Disable access logging option was set in the OneFS web administrationinterface, and then you upgraded your cluster from OneFS 6.5.x to OneFS 7.x, theapache2 service failed to start, and an error similar to the following appearedrepeatedly in the /var/log isi_mcp file:

FAILED on action list 'start': action 1/1SERVICE apache2 (pid=3840) returned exit status 1

As a result, client access to HTTP was denied.

149812

If you attempted to upgrade a SmartPools database that was not successfullyupgraded due to empty node pools, an error similar to the following appeared inthe OneFS web administration interface and on the console.

Storage Pool Settings Changes FailedThe edit to the existing storage pool settings did not save due to the following error:Changing settings disallowed until SmartPools DB is fully upgraded

As a result, the upgrade did not complete.

149695

Because OneFS 7.2.0.x does not support file pool policy names that begin with anumber, if you upgraded from OneFS 6.5.5.x–a version that supported file poolpolicies with names that began with a number–and if any of your preexisting filepool policies began with a number, following the upgrade, SmartPools jobs failed,and file pool policies could not be created or modified.Beginning in OneFS 7.2.0.3, a pre-upgrade check will halt an upgrade if the clusterconfiguration being upgraded contains file pool policies that begin with a number.

149684

Resolved issues



Antivirus


If you attempted to scan an infected file from the OneFS web administrationinterface, and if the file name or the path name where the file was locatedcontained the apostrophe (') character, the web interface displayed an HTTP 500Internal Server Error page, and an error similar to the following

appearred in the /var/log/webware-errors/ file:

File "/usr/local/share/webware/WebKit/HTTPContent.py", line 105,in _respond self.handleAction(action)File "webui/Is2CorePage.py", line 80, in handleAction Page.handleAction(self, action)File "/usr/local/share/webware/WebKit/HTTPContent.py", line 213, in handleAction getattr(self, action)()File "webui/AVScanDetectedThreats.py", line 138, in rescan self.jsonRet['error'] = '%s %s' % (str(e), ACTION_STATE_ERROR)SystemError: 'finally' pops bad exception

141960

If the job that was running an antivirus scan policy was terminated, either byanother process or due to a software failure, the antivirus scan policy continued tobe listed as running in the OneFS web administration interface, and the job couldnot be manually cancelled or cleared from the list of running jobs. The correctstatus of the policy was displayed when viewed from the command-line interface.

141954

Because some antivirus scan reporting fields accepted invalid characters fromSQLite queries, running or completed antivirus scan policies were not listed in theOneFS web administration interface, and messages similar to the followingappeared in the webware_webui.log file where <policy_ID> was the ID of the

affected policy:

OperationalError: unrecognized token: "<policy_ID>"

138754

Under some circumstances—for example, if Antivirus scan was not correctlyconfigured—messages regarding the isi_avscan_d process were repeatedly loggedin the /var/log/isi_avscan_d.log file.

Note

Because repeated logging to the /var partition can adversely affect the wear life of

a node's boot flash drives, to reduce logging under the previously describedcircumstances, if a large number of duplicate messages are logged within a shortperiod of time, some of the messages are suppressed and a message similar to thefollowing appears in the /var/log/isi_avscan_d.log file:

isi_avscan_d[1764]: Suppressed 152 similar messages!

135097

Resolved issues

Resolved in OneFS 7.2.0.2 71

Authentication


If Microsoft Security Bulletin MS15-027 was installed on a Microsoft ActiveDirectory server that authenticated SMB clients that were accessing an Isiloncluster, and if the server used the NTLMSSP challenge-response protocol, the SMBclients could not be authenticated. As a result, SMB clients could not access dataon the cluster.For more information, see article 199379 on the EMC Online Support site.

147221

If you configured HDFS with Kerberos authentication, WebHDFS requests sent toaccess zones other than the System Zone were not correctly authenticated and theclient that sent the request received the following message:

503 Service Temporarily Unavailable

145590

If an LDAP provider returned a UID or a GID that was greater than 4294967295 (themaximum value that can be assigned to an unsigned 32-bit integer), an incorrectUID or GID was assigned to the associated user or group. This issue could haveaffected a user’s ability to access data on the cluster.

Note

Beginning in OneFS 7.2.0.2, if an LDAP provider returns a UID or a GID that isgreater than 4294967295, affected users will not be authenticated, and a Nosuch user error will be returned. Additional logging was also added to

the /var/log/lsassd.log file to help identify these issues.

144002

If the selective authentication setting was enabled for a Windows trusted domain,and if a user who was a member of the domain was assigned to a group to whichthe ISI_PRIV_LOGIN_SSH or ISI_PRIV_LOGIN_PAPI role-based access privilege wasassigned, the user was denied access to the cluster when attempting to log inthrough an SSH connection or through the OneFS web administration interface.This issue occurred because the selective authentication setting prevented OneFSfrom resolving the user’s group membership.

142088

If a DNS server became unavailable while the lsass process was sending RPCrequests to a domain controller, the lsass process might have restartedunexpectedly. If this issue occurred, authentication services were temporarilyunavailable, and a message a similar to the following appeared in the /var/log/messages file:

Stack: --------------------------------------------------/usr/likewise/lib/liblsaonefs.stat.so:LsaOnefsGetIpv4Address+0x9/usr/likewise/lib/liblsaonefs.stat.so+0xee4:0x807315ee4/usr/likewise/lib/liblsaserverstats.so.0:LsaSrvStatisticsRelease+0x82/usr/likewise/lib/lsa-provider/ad_open.so:AD_NetLookupObjectSidsByNames+0x3bc/usr/likewise/lib/lsa-provider/ad_open.so:AD_NetLookupObjectSidByName+0x1b1/usr/likewise/lib/lsa-provider/ad_open.so:LsaDmConnectDomain+0x205/usr/likewise/lib/lsa-provider/ad_open.so:LsaDmWrapNetLookupObjectSidByName+0x76/usr/likewise/lib/lsa-provider/ad_open.so:LsaDmEngineGetDomainNameWithDiscovery+0x6a5/usr/likewise/lib/lsa-provider/ad_open.so:AD_ServicesDomainWithDiscovery+0x79

142073

Resolved issues




/usr/likewise/lib/lsa-provider/ad_open.so:AD_AuthenticateUserEx+0x418/usr/likewise/lib/liblsaserverapi.so.0:LsaSrvAuthenticateUserExInternal+0x436/usr/likewise/lib/liblsaserverapi.so.0:LsaSrvAuthenticateUserEx+0x4be/usr/likewise/lib/libntlmserver.so.0:NtlmValidateResponse+0xeb1/usr/likewise/lib/libntlmserver.so.0:NtlmServerAcceptSecurityContext+0x10a/usr/likewise/lib/libntlmserver.so.0:NtlmSrvIpcAcceptSecurityContext+0x325/usr/likewise/lib/liblwmsg.so.0:lwmsg_peer_assoc_call_worker+0x20/usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec/lib/libthr.so.3:_pthread_getprio+0x15d--------------------------------------------------

If an LDAP or NIS provider attempted to authenticate a user with a user ID (UID) of4294967295, the isi_papi_d process unexpectedly restarted, and lines similar tothe following appeared in the /var/log/messages file:

/usr/lib/libisi_persona.so.1:persona_get_type+0x1/usr/lib/libisi_auth_cpp.so.1:_ZN4auth15json_to_personaERKN4Json5ValueERKNS_14lsa_connectionERKSs+0xc08/usr/lib/libisi_auth_cpp.so.1:_ZN4auth15persona_to_jsonERKNS_7personaERKNS_14lsa_connectionEb+0x62/usr/lib/libisi_platform_api.so.1:_ZN4auth15sec_obj_to_jsonERKNS_7sec_objERKNS_14lsa_connectionEbb+0x178/usr/lib/libisi_platform_api.so.1:_ZN18auth_users_handler8http_getERK7requestR8response+0x4c4/usr/lib/libisi_rest_server.so.1:_ZN11uri_handler19execute_http_methodERK7requestR8response+0x56e/usr/lib/libisi_rest_server.so.1:_ZN11uri_manager15execute_requestER7requestR8response+0x100/usr/lib/libisi_rest_server.so.1:_ZN14request_thread7processEP12fcgi_request+0x112/usr/lib/libisi_rest_server.so.1:_ZN14request_thread6on_runEv+0x1b/lib/libthr.so.3:_pthread_getprio+0x15d

141947

If a machine password was changed by a node while the lwreg process on anothernode was refreshing that node's lsass configuration, the lsass process on thesecond node could have cached both the old and new machine passwords. If thisoccurred, the lsass process unexpectedly restarted, and clients connected to theaffected node could not be authenticated. In addition, lines similar to the followingappeared in the /var/log/messages file:

/lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/lsa-provider/ad_open.so:LsaPcachepEnsurePasswordInfoAndLock+0x9b6/usr/likewise/lib/lsa-provider/ad_open.so:LsaPcacheGetMachineAccountInfoA+0x28/usr/likewise/lib/lsa-provider/ad_open.so:AD_MachineCredentialsCacheInitialize+0x38/usr/likewise/lib/lsa-provider/ad_open.so:AD_Activate+0x9d5/usr/likewise/lib/lsa-provider/ad_open.so:LsaAdProviderStateCreate+0xb22/usr/likewise/lib/lsa-provider/ad_open.so:AD_RefreshConfigurationCallback+0x792/usr/likewise/lib/liblsaserverapi.so.0:LsaSrvRefreshConfiguration+0x432/usr/likewise/lib/lw-svcm/lsass.so:LsaSvcmRefresh+0x209/usr/likewise/lib/liblwbase.so.0:RefreshWorkItem+0x24/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256

141940

Resolved issues

Authentication 73


/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec/lib/libthr.so.3:_pthread_getprio+0x15d

If a cluster that was joined to a Microsoft Active Directory (AD) domain was alsoconfigured with an IPv6 subnet, and if the AD domain controller was configured touse an IPv6 address, the netlogon process on the cluster repeatedly restarted andmembers of the Windows AD domain could not be authenticated to the cluster. Ifthe netllogon process restarted as a result of this issue, Windows clients mighthave received an Access Denied error when attempting to access SMB shares

on the cluster, or they might have received a Logon failure: unknownuser name or bad password message when attempting to log on to the

cluster. In addition, the following lines appeared in the /var/log/messagesfile:

Stack: -------------------------------------------------- /lib/libc.so.7:thr_kill+0xc /lib/libc.so.7:__assert+0x35 /usr/likewise/lib/libnetlogon_isidcchooser.so:IsiDCChooseDc+0xbb3 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetChooseDc+0x27 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvPingCLdapArray+0x1187 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCNameDiscoverInternal+0x72a /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCNameDiscover+0x111 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCName+0xb20 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvIpcGetDCName+0x4f /usr/likewise/lib/liblwmsg.so.0:lwmsg_peer_assoc_call_worker+0x20 /usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16 /usr/likewise/lib/liblwbase.so.0:WorkThread+0x256 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec /lib/libthr.so.3:_pthread_getprio+0x15d --------------------------------------------------

140851

An issue sometimes occurred that prevented OneFS from retrieving ServicePrincipal Name (SPN) keys from the cluster's machine password configuration file,pstore.gc. If this issue occurred, authentication requests failed with an

Access Denied error, and continued to fail until the lwio process restarted.

139654

If the isi_vol_copy_vnx utility, the PermissionsRepair job, SyncIQ, or the

isi_restill utility attempted to replicate an access control entry (ACE) that

contained a Security identifier (SID) with a subauthority of 4294967295, the utilityor job failed. If this occurred, lines similar to the following appeared inthe /var/log/messages file:

Stack: --------------------------------------------------/boot/kernel.amd64/kernel: /usr/lib/libisi_persona.so.1:persona_len+0x1/boot/kernel.amd64/kernel: /usr/lib/libisi_acl.so.1:cleanup_sd+0x506/boot/kernel.amd64/kernel: /usr/lib/libisi_acl.so.1:sd_from_text+0x1f1

138738

Although an LDAP or NIS file provider was configured with a list of unfindable usersthrough the --unfindable-users option of the isi auth create or isiauth modify command, a user's groups were still queried for through the LDAP

or NIS provider.

137897

Resolved issues



If an update to Microsoft Active Directory (AD) succeeded, but the subsequent LDAPquery for the new password failed, OneFS did not update the cluster's machinepassword configuration file, pstore.gc. As a result, there was a mismatch

between the machine password registered with Active Directory and the machinepassword being used by the cluster, and clients attempting to connect to thecluster could not be authenticated.

137743



During a parallel restore operation, if only a portion of the restore operation's filedata write was written to disk, the remaining file data from that write could havebeen discarded. Because a restore operation writes a maximum of 1 MB of data ata time, it was extremely unlikely that only a portion of the data would be written todisk.

142339

Under some circumstances, the NDMP process might have failed to correctlyaccount for the number of isi_ndmp_d instances running on a node, and thenumber of running instances might have exceeded the maximum number allowed.In some cases, the running instances might have consumed all available resources,causing a node to unexpectedly reboot, and the running NDMP job to fail. If thisissue occurred, clients connected to the node were disconnected, and lines similarto the following appeared in the /var/log/messages file:

/boot/kernel.amd64/kernel: pid 56071(isi_ndmp_d), uid 0 inumber 2111 on /tmp/ufp: out ofinodesisi_ndmp_d[56071]: ufp copy error: failed to open destination for /tmp/ufp/isi_ndmp_d/4675/gc ==>/tmp/ufp/isi_ndmp_d/.56071.tmp/gc: No space left on deviceisi_ndmp_d[56071]: ufp error: Failed to initialise failpoints for isi_ndmp_d/56071

142075

If a snapshot’s expiration time was extended or changed to zero (indicating that thesnapshot never expires) while the snapshot was being deleted, the isi_snapshot_dprocess could have missed the expiration change, and, as a result, the snapshotmight have been deleted.

142072

If the --skip_bb_hash option of a SyncIQ policy was set to no (the default

setting) and if a SyncIQ file split work item was split between pworkers, it waspossible for the pworker that was handling the file split work item to attempt totransfer data that had already been transferred to the target cluster. If thisoccurred, the isi_migr_pworker process repeatedly restarted and the SyncIQ policyfailed. In addition, the following lines appeared in the /var/log/messages file:

isi_migrate[45328]: isi_migr_pworker: *** FAILED ASSERTIONcur_len != 0 @ /usr/src/isilon/bin/isi_migrate/pworker/handle_dir.c:463:/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid 45328="isi_migr_pworker")(tid=100957) Stack trace:/boot/kernel.amd64/kernel: Stack:--------------------------------------------------/boot/kernel.amd64/kernel:/lib/libc.so.7:__sys_kill+0xc/boot/kernel.amd64/kernel:/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/boot/kernel.amd64/kernel:

142058

Resolved issues



/usr/bin/isi_migr_pworker:migr_continue_file+0x1507/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:migr_continue_generic_file+0x9a/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:migr_continue_work+0x70/boot/kernel.amd64/kernel:/usr/lib/libisi_migrate_private.so.2:migr_process+0xf1/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:main+0x606/boot/kernel.amd64/kernel:/usr/bin/isi_migr_pworker:_start+0x8c/boot/kernel.amd64/kernel:--------------------------------------------------/boot/kernel.amd64/kernel: pid 45328 (isi_migr_pworker), uid 0:exited on signal 6 (core dumped)

If a Collect job had not been run for a long time, snapshots were not processed,and, over time, they accumulated. As a result, it took longer than expected todelete files associated with a large number of accumulated snapshots.

141968

It was possible for a successful DomainMark job to leave a SyncIQ domain or aSnaprevert domain incomplete. If this occurred, the SnapRevert job—which mightrun during the SyncIQ Prepare Resync job phase—failed, and the following statusmessage appeared in the SyncIQ job report:

Snapshot restore domain is not ready (unrunnable)

141935

In the OneFS web administration interface, the View Details hyperlink on the DataProtection > SnapshotIQ > Snapshot Schedules page displayed only one line ofthe snapshot schedule settings. As a result, the full details of the schedule werenot available unless the user's mouse hovered outside of the browser window.

141933

Although configuring an NDMP backup job with both the BACKUP_FILE_LISTenvironment variable and the BACKUP_MODE=SNAPSHOT environmental variable

negated the effect of setting the BACKUP_MODE=SNAPSHOT environment variable

(faster incremental backups), it was possible to configure a job with bothenvironment variables. Beginning in OneFS 7.2.0.1, if you configure bothenvironmental variables, the job does not run, and the following message appearson the Data Management Application (DMA), on the console, and inthe /var/log/ndmp_debug.log file:

File list and backup_mode(snapshot)is not supported

141928

Under normal circumstances, the retention period applied to WORM-committedfiles might differ between SyncIQ source and target clusters. However, if theretention period applied to a file on a SyncIQ source cluster ended on an earlierdate than the retention period applied to the related file on the target cluster,incremental SyncIQ jobs failed, and messages similar to the following were loggedin the /var/log/messages file, where <path> is the path to the file on the target

cluster:

Local error : syncattr error for <path>: Read-only file system

This issue occurred because the SyncIQ process attempted to decrease theretention period of a WORM-committed file, which is not permitted.

138935

Resolved issues



Beginning in OneFS 7.2.0.2, if the retention date applied to a file on the sourcecluster predates the retention date on the target cluster, no attempt is made toupdate the retention date on the target cluster during synchronization.

If a SnapRevert job was run on a directory to which both a SyncIQ domain and aSnapRevert domain were applied, and if the SyncIQ domain was set to read/writemode, the SnapRevert job failed, and lines similar to the following appeared inthe /var/log/messages file and in the /var/log/isi_migrate.log file :

isi_job_d[20805]: Man Working(manager_from_worker_stopped_handler, 2012): Error from worker 2:14-12-03 12:16:50 SnapRevert[409] Node 1 (1) task 2-1: Snaprevert job finished with status failed: Unable to create and getfile descriptor for tmp working directory: Read-only file system(unrunnable)from snap_revert_item_process(/usr/src/isilon/bin/isi_job_d/snap_revert_job.c:730)from worker_process_task_item(/usr/src/isilon/bin/isi_job_d/worker.c:940)isi_job_d[20805]:snap_revert_item_process:743: Snap revert job finished with status failed: Unable to create andget file descriptor for tmp working directory: Read-only file system (unrunnable)isi_job_d[1910]: SnapRevert[409]Fail

138780

Due to a memory leak in the isi_webui_d process, while viewing SyncIQ reportsthrough the OneFS web administration interface, the isi_webui_d processunexpectedly restarted. As a result, the OneFS web administration interfacestopped responding, and users who were logged into the OneFS webadministration interface were disconnected and returned to the log-in screen. Inaddition, messages similar to the following appeared in the /var/log/webware-errors file:

isi_webui_d: siq_gc_conf_load: Failed to gci_ctx_new: Could not allocate parser read buffer: Cannot allocate memory

138731



If you attempted to reconfigure an existing file pool policy from the OneFS web

administration interface without selecting the disk or node pool in the StorageSettings section again, an error similar to the following appeared, and the file poolpolicy change was not saved:

File Pool Policy Edit Failed The edit to the file pool policy did not save due to the following error: Invalid storage pool '<storage-pool-name> (node pool)'

143453

After a cluster that was configured with manual node pools was upgraded, it waspossible for the drive purpose database file (drive_purposing.db) to contain

incorrect node equivalence information for the nodes in the manual node pools.Because OneFS relies on the information in the drive_purposing.db file when

provisioning nodes, if this issue was encountered, it might have prevented newnodes from being provisioned.

142026

Resolved issues


Diagnostic tools


If you ran the isi_gather_info command with the --ftp-port <alt-port> --save-only options, where <alt-port> was the name of the alternate FTP port to set

as the new default, the isi_gather_info command ignored the request, and

used the default FTP port (port 21) instead. As a result, the alternate FTP portnumber had to be specified each time the isi_gather_info command was run.

141922

Because the following isi_gather_info command options were processed

immediately before all other command options, the options that followed theseoptions were sometimes ignored:

l --verify-uploadl --savel --save-onlyl --re-uploadAs a result, the .tar file that is created when the isi_gather_info command

is run might not have been uploaded to Isilon Technical Support, and running thecommand sometimes had unexpected results. For example, if you ran the followingcommand, the --ftp-proxy-host option was ignored:

isi_gather_info --verify-upload --ftp-proxy-host=x

135541

If you ran the isi_gather_info command with the -f option—an option that

enables you to designate a specific directory to gather—and if you specified thatthe /ifs/data/Isilon_Support directory should be gathered, the .tar file

that was created by the command could have been extremely large. This issueoccurred because /ifs/data/Isilon_Support is the default temporary

directory that is used to store the .tar files that are created when the

isi_gather_info command is run, and, as such, this directory might contain

previous .tar files that are large in size. In addition, the isi_gather_info -fcommand gathers the contents of the /ifs/data/Isilon_Support directory

from each node in the cluster, multiplying the size of the resulting .tar file <x>

times, where <x> is the number of nodes in the cluster.

Note

Beginning in OneFS 7.2.0.1, if you run the isi_gather_info command with the

-f option, and if you specify that the /ifs/data/Isilon_Support directory

should be gathered, the following message appears on the console and thecommand does not run:

WARNING: ignored path /ifs/data/Isilon_Support

135540

Resolved issues




In some cases, a race condition between the I/O request packet (IRP) cancellationcallback function and the IRP dispatch function caused the lwio process to restart.If the process restarted as a result of this issue, client connections to the clusterwere disrupted, and the following lines appeared in the /var/log/messagesfile:

/boot/kernel.amd64/kernel: /lib/libc.so.7:thr_kill+0xc/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0xa3/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IopFltContextReleaseAux+0x79/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IoFltReleaseContext+0x2f/boot/kernel.amd64/kernel: /usr/lib/libisi_flt_audit.so.1:_init+0x3b37/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IopFmIrpCancelCallback_inlock+0x2af

147471

In 7.2.0.1, if a file whose name contained multibyte characters was audited, theisi_audit_cee process did not decode the file name correctly when it forwardedaudit events to the EMC Common Event Enabler (CEE). As a result, the name of a filethat contained multibyte characters was incorrect within the auditing software.

146609

Some information regarding NFS clients that were being audited, such as theuserID, was omitted from the audit stream. As a result, NFS clients could not becorrectly audited.

Note

Although all of the necessary information regarding NFS clients is now included inthe audit stream, NFS clients might not be correctly audited by some auditingsoftware.

138945

If memory allocated to the CELOG monitoring process (isi_celog_monitor) becamevery fragmented, the isi_celog_monitor process stopped performing any work. As aresult, no new events were recorded, alerts regarding detected events were notsent, and messages similar to the following were repeatedly logged inthe /var/log/isi_celog_monitor.log file:

isi_celog_monitor[5723:MainThread:ceutil:92]ERROR: MemoryError isi_celog_monitor[5723:MainThread:ceutil:89]ERROR: Exception in serve_forever()

Note

Allocated memory is considered fragmented when it is not stored in contiguousblocks. Memory allocated to the CELOG process is more likely to becomefragmented in environments with frequent configuration changes and in whichmany CELOG events are being generated.

138874

On the Cluster Status tab under Monitoring, the Cluster size pie chart did notdisplay Virtual Hot Spare (VHS) reserved space. VHS reserved space could beviewed by running the isi status command from the command-line interface.

138737

Due to an error in the newsyslog.conf.1000MB and the newsyslog.conf.500MB files, the /var/log/nfs_convert.log file was not rotated.

138675

Resolved issues

Events, alerts, and cluster monitoring 79


Note

Log files that are not correctly rotated can grow in size, and might eventually fillthe /var partition, which can affect cluster performance.

Because commas were not correctly escaped in the output of the isistatistics--csv command, if the data returned from the command contained

commas, the commas were treated as separators, and the data could not beaccurately interpreted by third-party monitoring tools.

138613

If users attempted to access a file under an audited SMB share and the attemptfailed, the failed access attempts were not recorded in the audit log. As a result,these events could not be tracked.

138068

File system


If L3 cache was enabled on a cluster running OneFS 7.2.0.1, it was possible forOneFS to erroneously report that the journal on one or more nodes was invalid. Thisissue was more likely to affect S210 and X410 nodes.

Note

Although OneFS reported that a node’s journal was invalid, the journal was actuallyintact. This issue occurred because a OneFS script erroneously detected that thejournal was invalid.

If this issue occurred, the affected node or nodes could not boot, and the followingmessage appeared on the console:

Checking Isilon Journal integrity...Attempting to save journal to default locationWarning: /etc/ifs/journal_bad exists. Saving bad journal.OneFS is unmountedA valid backup journal already exists. Not saving.NVRAM autorestore status: Not performed...Attempting to restore journal from disk backup...Restore from disk failedAttempting to save and restore journal to clear any ECC errors in unused DRAM blocks...Restore failedCould not recover journal. Contact Isilon Customer Support immediately.

147475

On clusters with L3 cache enabled, if you updated SSD firmware by using an IsilonDrive Support Package (DSP), it was possible to encounter an issue that couldcause data loss. If this issue occurred, data integrity (IDI) issues were reported asan IDI event, and a critical event notification similar to the following was sent:

Detected IDI failure on LIN 1:001c: 4758::HEAD, lbn 1005511 (fec) 2,12,2760679424:8192 (type user data)

For more information, see article 200097 on the EMC Online Support site.

146182

When a node joins an Isilon cluster, the file system acquires a merge lock in orderto postpone joining the node until running file system operations are complete. Inrare cases, if an AutoBalance, FlexProtect, or MediaScan job was running while a

144214

Resolved issues




node was joining the cluster, the merge lock was not released in a timely manner,and the merge lock timed out. If this occurred, the file system could not beaccessed until the issue was resolved. In addition, messages similar to thefollowing appeared in the /var/log/messages log file, where <time> was the

number of milliseconds that the merge lock was held before timing out:

error 85 from rtxn_exclusive_merge_lock_get after < time> ms

If the lwio-device-srv symbolic link located in the /var/lib/likewisedirectory became damaged, the srv service could not start on any nodes in thecluster. If this occurred, SMB services were unavailable and SMB clients wereunable to connect to the cluster.

Note

When a node is rebooted, srv—an lwio driver—creates a symbolic link namedlwio-device-srv in the /var/lib/likewise directory. Beginning in

7.2.0.2, if this symbolic link is damaged, the damaged symbolic link is overwrittenwith a functioning copy.

142835

Although the UseDNS parameter was set to no in the /etc/ssh/sshd_configfile, if you connected to a node through SSH, establishing a connection to the nodetook longer than expected, approximately 15 seconds. This issue occurred becausethe UseDNS no parameter was not enforced.

Note

By default, the UseDNS parameter is set to yes. Setting the parameter to nospecifies that reverse DNS lookups should not be performed. It is typically used todecrease the length of time it takes to establish an SSH connection to the cluster.

142087

In rare cases, while installing a drive firmware update on a node that containedSSD drives that were configured to be used for L3 cache, data was sometimesmoved from the SSD drives too slowly, a condition that caused the node to rebootunexpectedly. If this occurred, the following lines appeared in the /var/log/messages file:

panic @ time 1418749697.371, thread0xffffff013ba275b0: l3 slow drain cpuid = 0 Panic occurred in module kernel loaded at0xffffffff80200000: Stack:-------------------------------------------------- kernel:drive_drain_timeout_cb+0x1ca kernel:softclock+0x2ee kernel:ithread_loop+0x208 kernel:fork_exit+0x75 --------------------------------------------------

140906

If either L1 or L2 prefetch was disabled for a 4TB file, nodes that handled the fileunexpectedly rebooted while reading the last block of the file. If this issueoccurred, the following FAILED ASSERTION message appeared

in /var/log/messages file:

*** FAILED ASSERTION end_l1 <= max_lbn @/build/mnt/src/sys/ifs/bam/bam_file.c:1128

140639

Resolved issues

File system 81


Note

L1 and L2 prefetch is disabled on a file by default if the file is managed by a filepool policy configured with the Optimize for random access data access pattern. L1and L2 prefetch are also disabled by default if you run the isi set -a command

to configure the random or disabled file access patterns. L1 and L2 prefetch canalso be configured manually through the use of specific sysctl commands. For moreinformation about configuring file access patterns, see the OneFS 7.1.1 WebAdministration Guide and the OneFS CLI Administration Guide.

While a node was disconnecting from the cluster—for example, while a node wasrebooting—it was possible for the node to encounter a race condition that caused adeadlock between a transaction that was performing a batch operation and thedisconnect operation. If this issue occurred, the node unexpectedly rebooted.

138487

If you deleted a snapshot, the subsequent SnapshotDelete job might not havedeleted all the files. This issue was more likely to occur if the snapshot contained avery large number of files—for example, 200,000 or more. If this issue occurred andif you ran the isi job reports view command to view a job report for an

affected SnapshotDelete job, the output showed that only a portion of the logicalinodes (LINs) were deleted during phase two of the SnapshotDelete job. Inaddition, subsequent SnapshotDelete jobs might have taken longer to completethan expected until all of the residual files were eventually deleted.

137911

Because the Simple Network Management Protocol (SNMP) process is single-threaded, and because the default behavior of the snmpget function used by SNMPwas to time out after one second and retry up to five times over six seconds, it waspossible for the snmp process to appear to stop responding to requests fromapplications such as Nagios and Cacti.

Note

Beginning in OneFS 7.2.0.1, the snmpget function will time out after ten secondsand will retry the affected request once.

141927

Hardware


If you ran the isi_inventory_tool command with the --startUp option on

an S210 node or an X410 node that contained components with EMC partnumbers, a CTO exception similar to the following appeared on the console and thenode might not have booted correctly:

WARNING: softAVL is missing Isilon P/N(s) for vendor ID="303-409-000A-00", firmware ID="rp180b04",hwType="nvram", hwName="LOx NVRAM"CTO exception --P/N="105-575-001-01" is not valid

This issue occurred because the isi_inventory_tool --startUp command

was configured to handle part numbers that followed an NNN-NNNN-NN format—forexample, 123-4567-89—and EMC part numbers follow a different format.

148129

Resolved issues






If an InfiniBand host channel adapter (HCA) was unresponsive, data transmissionbetween nodes in the cluster might have slowed. This condition could also havecaused OneFS to split one or more nodes from the cluster, adversely affecting clientconnections to the cluster.

145424

If a drive in an HD400 node was replaced while the drive was in the process ofbeing smartfailed, and if the node that contained the replaced drive was rebootedbefore the smartfail process was complete, the affected node failed to mountthe /ifs partition. If this occurred, a message similar to the following appeared in

the /var/log/messages file:

ifsd[2159]: ifs_work_request: IFS is umounted; exitingisi_group_change_d: A mounted /ifs is required.mount_efs: Reporting missing logical drive 55 with guid 54de6e460001d4de 578b6ef62a39f3e1 as DOWNmount_efs: driveConfIdentifySSD: Error mapping lnum 55 to bay/boot/kernel.amd64/kernel: [bam_vfsops.c:260](pid 1742="mount_efs")(tid=100145) too many drives: 61root[2200]: IFS failed to mount. Aborting boot.

This issue occurred because the value assigned to the maximum number of logicaldrives allowed was not updated to fully accommodate HD400 nodes. For moreinformation, see article 198924 on the EMC Online Support site.

142946

On older nodes running OneFS 7.2.0.0 through 7.2.0.1, if thegetNumBatteries() function was called to count the number of NVRAM

batteries in the node, the function did not return the correct number. As a result,processes that relied on this information might not have performed correctly. Forexample, battery tests might not have been correctly configured.

142159

If you ran the isi firmware update command to update node firmware on an

X210 or an X410 node, the update failed and the following error appeared on theconsole, where <X> was the number of the node on which the update failed:

ERROR: Node <X>: failed to cold reset car and unable to get completion code and bit flag

Note

This issue occurred only on X210 and X410 nodes with CMC firmware version 00.0for earlier. You can confirm your version of the CMC firmware by logging on to anynode in the cluster and running the following command:

isi firmware status

142141

Under some circumstances, if a node containing self-encrypting drives (SEDs) thatcould not be released from ownership was reimaged by using a USB flash drive,after the USB flash drive was removed, the node failed to boot.

141986

Resolved issues

Hardware 83



Note

Beginning in OneFS 7.2.0.2, if a node containing SEDs that cannot be released fromownership is reimaged by using a USB flash drive, before the node shuts down, thefollowing messages appear:

Failed to release SEDs, one or more drive(s) will be in SED_ERROR state after reimage is complete and will require a PSID revert. This may result in /ifs being unable to mount.Press Enter to continue…

If you press ENTER, the reimage process completes, and the node shuts down.When the node is subsequently booted, you might be required to manually revertthe affected SEDs to restore the node to normal operation.

If the /var/db/hwmon/isi_hwmon.p file was damaged and you attempted to

start the isi_hwmon service, the service failed to start. In addition, lines similar tothe following appeared in the /var/log/isi_mcp file, confirming repeated

attempts to restart the isi_hwmon service:

isi_mcp[1894]: FAILED on action list'start': action 1/1 SERVICE isi_hwmon (pid=3722) returned exit status 1isi_mcp[1894]: Action list 'start' has completed. Releasing shared lock 0x80393e690 (pid=3722)isi_mcp[1894]: Executing 'start' actions of service 'isi_hwmon'.isi_mcp[3937]: Executing '/usr/bin/isi_hwmon -m' command of actionlist 'start'.isi_mcp[1894]: FAILED on action list 'start': action 1/1 SERVICE isi_hwmon (pid=3937) returned exit status 1isi_mcp[1894]: Action list 'start' has completed. Releasing shared lock 0x80393e690 (pid=3937)

Note

Beginning in OneFS 7.2.0.2, if a damaged /var/db/hwmon/isi_hwmon.p file is

encountered, the file is recreated and the following message is logged inthe /var/log/isi_hwmon.log file:

Exception while opening /var/db/hwmon/isi_hwmon.p. Reinitializing file.

141929

Under rare circumstances, a failing drive caused a node to restart unexpectedly. Ifthis occurred, the following lines appeared in the /var/log/messages file:

Stack: --------------------------------------------------kernel:trap_fatal+0x9fkernel:trap_pfault+0x293kernel:trap+0x323kernel:devstat_end_transaction+0x3ckernel:g_disk_ioctl+0x20ekernel:g_part_ioctl+0x94efs.ko:drv_sync_drive_cache+0xbdefs.ko:jdr_do_sync_one+0x101efs.ko:kt_main+0x80kernel:fork_exit+0x7f--------------------------------------------------

139718

Resolved issues



If the isi_bootdisk_read_test was run on a node, messages related to the test,including extraneous messages similar to the following, appeared inthe /var/log/messages file on all nodes in the cluster:

isi_bootdisk_read_test: Running bootdisk read test on ad2.isi_bootdisk_read_test: Running bootdisk read test on ad2.isi_bootdisk_read_test: Running bootdisk read test on ad2

Beginning in OneFS 7.2.0.1, the preceding messages are no longer logged, and therelevant messages related to this test appear only on the node on which theisi_bootdisk_read_test is run.

139697

Even though a drive was smartfailed, physically removed, and replaced, the olddrive appeared in the output of the isi devices list command in a

suspended, smartfailed, or erased state. For example:

Unavailable drives: Lnum 40 [SUSPENDED] Last Known Bay N/AUnavailable drives: Lnum 40 [SMARTFAIL] Last Known Bay N/AUnavailable drives: Lnum 40 [ERASE] Last Known Bay N/A

If you then ran the isi devices -a smartfail -d <device> command to

smartfail the drive in question, where <device> is the drive to be smartfailed, anerror similar to the following appeared on the console:

isi: error: Unknown drive: '<device>' does not map to a valid bay or lnum

138207

After you installed an Isilon Drive Support Package (DSP) on a cluster, the year andmonth of the date recorded in the /var/log/isi_dsp_tool.log file was

overwritten. Because the day of the month was not also overwritten, it was possiblefor the resulting date to be invalid. For example, the date could have been changedto February 31st. If this occurred, an error similar to the following appeared on theconsole during the post-install verification phase of the installation:

ValueError: day is out of range for month

Although an error appeared, the DSP was successfully installed. For moreinformation, see article 194343 on the EMC Online Support site.

137271

If the LCD server process was unable to communicate with the LCD on the frontpanel of a node, extraneous messages were repeatedly logged in the /var/log/messages file.

Note

Beginning in OneFS 7.2.0.2, if this condition is encountered, informative messageswill continue to be logged; however, the following message will no longer belogged:

ERROR lcd.daemon: server Traceback (most recent call last): File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/daemon.py", line 209, in serve_forever File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/display.py", line 849, in open File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/noritake.py", line 247, in open

136603

Resolved issues

Hardware 85



File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/noritake.py", line 357, in verifyModel File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/noritake.py", line 350, in waitForResponseLCDError: LCD did not respond

If you ran the isi firmware status command, OneFS might have

encountered an error while attempting to log a value that was too large. If thisoccurred, the following error appeared on the console after running the isifirmware status command:

File "/usr/local/lib/python2.6/logging/handlers.py",line 804, in emiterror: [Errno 40] Message too long

135083

Job engine

Job engine issues resolved in OneFS 7.2.0.2

When certain jobs were run, the isi_job_d process created temporary files inthe /var/tmp directory. Files written to this directory are stored on the cluster’s

boot flash drives. In rare cases, writing to the boot flash drives could causeexcessive wear and premature boot flash drive failure.Beginning in OneFS 7.2.0.2, the temporary files are created in the /ifs/.ifsvar/tmp/jobengine directory.

141951

If a snapshot, or the first of a set of snapshots, was empty when the snapshotdelete job ran, the isi_job_d process failed, and lines similar to the followingappeared in the /var/log/messages log file:

Note

The job recovered after 30 to 60 minutes.

Stack: --------------------------------------------------/lib/libc.so.7:thr_kill+0xc/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/usr/bin/isi_job_d:sdl_lin_range_job_finalize+0x4a7/usr/bin/isi_job_d:job_virt_finalize+0x51/usr/bin/isi_job_d:job_phase_done+0xb68/usr/bin/isi_job_d:coord_task_checkpoint+0x31d/usr/bin/isi_job_d:coord_from_director_task_done_handler+0x292/usr/bin/isi_job_d:coord_from_director_handle_task_done+0x55/usr/bin/isi_job_d:handle_msg+0x15f5/usr/bin/isi_job_d:coord_main+0xad6/usr/bin/isi_job_d:main+0xcbf/usr/bin/isi_job_d:_start+0x8c

140865

Migration

Migration issues resolved in OneFS 7.2.0.2

After an initial VNX data migration, if a source file was replaced by a file that was aBlock Device file or a Character Device file with the same name, the new file wasnot copied to the target during the next or subsequent incremental data migrations.

147197

Resolved issues


Migration issues resolved in OneFS 7.2.0.2

As a result, the Block Device file or Character Device file was not backed up and thetarget cluster contained some files that no longer existed on the source cluster.

The isi_vol_copy_vnx utility did not copy alternate data streams to the target

cluster. As a result, operations that relied on the alternate data streams of migratedfiles failed on the target cluster.

147004

After an initial VNX migration, if a file on the source array was replaced by asymbolic link with the same name, during the next incremental migration, thesymbolic link was not migrated to the target. As a result, the data on the targetcluster did not precisely match the data on the source array.

146151

Networking

Networking issues resolved in OneFS 7.2.0.2

On the Cluster Management > Network Configuration page in the OneFS webadministration interface, if you enabled the int-b interface and the InfiniBand (IB)internal failover network and specified a valid subnet mask, and then assigned thesame IP address range or overlapping IP address ranges to the int-b network andthe IB failover network, a Subnet overlaps error appeared and you could not

edit the configuration.

142889

Although it is a valid configuration, if the same static route was assigned todifferent SmartConnect node pools, messages similar to the following wererepeatedly logged in the isi_flexnet_d.log file:

isi_flexnet_d[1399]: Adding static route <IP address> on interface: lagg1 via <IP address>

142068

If you configured the auto-unsuspend-delay parameter to prevent

automatically unsuspended nodes from serving requests to a designated IP poolfor a specified period of time, and if a node that was serving requests to that IPpool was rebooted, the affected node might have remained suspended for a periodof time that was longer than the time period specified by the auto-unsuspend-delay parameter. As a result, DNS replies did not provide the IP address of the

affected node for a longer period of time than was expected.

Note

This issue did not affect nodes that were rebooted following an upgrade.

142065

A race condition sometimes occurred when the isi_flexnet_d and isi_dnsiq_dprocesses were both configuring IP addresses. If this condition occurred, the nodesrestarted unexpectedly, and lines similar to the following appeared inthe /var/log/messages file:

Stack: --------------------------------------------------kernel:trap_fatal+0x9fkernel:trap_pfault+0x287kernel:trap+0x313kernel:sysctl_iflist+0x1e7kernel:sysctl_rtsock+0x200kernel:sysctl_root+0x121kernel:userland_sysctl+0x18f

141924

Resolved issues

Networking 87


kernel:__sysctl+0xa9kernel:isi_syscall+0x64kernel:syscall+0x26e--------------------------------------------------

If the isi networks --dns-servers and the isi networks dnscachedisable commands were run to update the DNS configuration, the updates were

written to the /etc/nsswitch.conf.tmp temporary file before being moved to

the /etc/nsswitch.conf file. Because an error in isi_dns_update prevented

the temporary file from closing, the updated information was not moved tothe /etc/nsswitch.conf file. As a result, messages similar to the following

were repeatedly written to the /var/log/isi_flexnet_d.log file:

isi_flexnet_d: /usr/bin/isi_dns_update caught '<type 'exceptions.AttributeError'>'; traceback: File "/usr/bin/isi_dns_update", line 240, in main setDnsInfo(domains, servers, options) File "/usr/bin/isi_dns_update", line 195, in setDnsInfo nssDirty = processNsswitchConf(dnsON) File "/usr/bin/isi_dns_update", line 177, in processNsswitchConf nnsf.close()isi_flexnet_d[933]: DNS update script did not exitcleanly (0x4600)

141920

Although the auto-unsuspend-delay timeout parameter was enabled, if the

cluster was configured with dynamic IP address allocation and IP failover, it waspossible for SmartConnect to rebalance IP addresses to a node before the specifiedauto-unsuspend-delay timeout period had elapsed. If this occurred, it was

possible for the IP address that clients were using to connect to the cluster to bemoved to the node before all of that node's services were available. Affectedclients might have been disconnected from the cluster or temporarily preventedfrom performing tasks related to those services.

141917

If you ran the isi network dnscache statistics command to view the

DNS cache statistics, the DNS cache statistics were not displayed, and an errormessage similar to the following appeared on the console:

show statistics ^error: expecting {cache,cluster,debug,dns,parameters,server}

141587

On the Cluster Management > Network Configuration page of the OneFS webadministration interface, it was possible to configure multiple subnets with thesame gateway priority value, even though gateway priority values must be unique.If multiple subnet gateways were configured with the same priority value, userswere unable to access the cluster from a client in one subnet, but couldsuccessfully connect to the same cluster from client in a different subnet.

Note

It is not possible to configure multiple subnet gateways with the same priority valuefrom the command-line interface.


140368

If a client used statically assigned cluster IP addresses to mount the cluster, and ifthat client was connected to the cluster through SMB 2, the client could be

139170

Resolved issues




disconnected if the node was rebooted or shut down, for any reason. If this issueoccurred, the client was unable to reconnect to the cluster for 45 to 90 seconds.

Although you could configure three DNS servers through the OneFS webadministration interface, information about the third server was not added to thelocal host entry of the /etc/resolv.conf file. As a result, only two of the

configured DNS servers were available, and queries failed if both of those DNSservers were unavailable.

139044

If a node was configured so that both of its interfaces responded to traffic on aVLAN and then one interface was later removed from all pools associated with thatVLAN, the interface was not always immediately removed from the VLANconfiguration, and IP addresses were not always immediately disassociated fromremoved interface. As a result, clients could temporarily continue to connect to theaffected node through IP addresses assigned to the removed interface.

138727

If you removed a gateway from a subnet, either through the OneFS webadministration interface or the command-line interface, the IP address for thegateway remained in the routing table. As a result, if you ran the netstatcommand to view information about the network configuration, the IP address thatwas removed continued to appear in the output.

133973

If source-based routing (SBR) was enabled and static routes were also configured,it was possible for SBR to override the static routes.

Note

Beginning in OneFS 7.2.0.2, if SBR is enablied and static routes are alsoconfigured, SBR excludes the static routes from SBR management.

123581

NFS

NFS issues resolved in OneFS 7.2.0.2

When an NFSv4 client initiated a request to mount the pseudo file system, theinformation that OneFS returned about the file system indicated that the maximumfile size allowed within the system was zero. As a result, some NFSv4 clients—forexample, AIX 6.1 clients—did not attempt to mount the file system.

143912

While OneFS was closing an idle client connection to an NFS export, it was possibleto encounter a race condition. If this race condition was encountered, the NFSserver unexpectedly restarted and NFSv4 clients were disconnected from thecluster. In addition, the following lines appeared in the /var/log/messagesfile:

/usr/likewise/lib/lwio-driver/nfs.so:__svc_zc_clean_idle+0x1f7 /usr/likewise/lib/lwio-driver/nfs.so:rendezvous_request+0x7f6 /usr/likewise/lib/lwio-driver/nfs.so:svc_getreq_xprt+0x120 /usr/likewise/lib/lwio-driver/nfs.so:NfsListenerProcessTask+0x3b0x800f15e5c (lookup_symbol: error copying in Ehdr:14) 0x800f1da9e (lookup_symbol: error copying in Ehdr:14) 0x8014f56bd (lookup_symbol: error copying in Ehdr:14)

142269

If an NFS client that had placed an advisory lock on a system resourceunexpectedly shut down, the lock might not have been released when the client

142074

Resolved issues

NFS 89

NFS issues resolved in OneFS 7.2.0.2

rebooted and reconnected to the cluster. As a result, the locked resources mighthave been inaccessible until the lock was manually released.

If you ran a command from an NFSv3 or NFSv4 client to query for files or directoriesin an empty folder, and if you included the asterisk (*) or question mark (?)characters in the command, the query failed and an error message appeared on theconsole. For example, if you ran the ls * command, the command failed and the

following error appeared on the console:

ls: cannot access *: Too many levels of symbolic links

141533

If an NFSv4 client sent a request to the cluster while the file system wasunavailable—for example, while nodes were rebooting—OneFS returned the wrongresponse and did not correctly disconnect the client. If this occurred, lines similarto the following appeared in the /var/log/messages file:

nfs[8962]: [nfs] SERVERFAULT on v4 operation 9, ntStatus 0xefff0066 (UNKNOWN)

Note

Beginning in OneFS 7.2.0.2, if an NFSv4 client sends a request to the cluster whilethe file system is unavailable, the client is disconnected from the cluster and aninformative message is logged inthe /var/log/messages file.

140511

Under some circumstances, although an NFS export was configured to return 32-bitfile IDs for files created within the export, 64-bit file IDs were instead sent to theclient. As a result, the client could not access files on the cluster.

140372

In environments where many NFSv4 clients were reading from and writing to thecluster, it was possible to encounter a condition that enabled a memory resource tobe over-allocated. If this issue occurred, the following lines appeared inthe /var/log/messages file:

/lib/libc.so.7:thr_kill+0xc /lib/libc.so.7:__assert+0x35 /usr/likewise/lib/lw-svcm/nfs.so:xdr_iovec_allocate+0x191 /usr/likewise/lib/lw-svcm/nfs.so:svc_zc_getrec+0x1db /usr/likewise/lib/lw-svcm/nfs.so:svc_zc_recv+0xa1 /usr/likewise/lib/lw-svcm/nfs.so:svc_getreq_xprt+0x11e /usr/likewise/lib/lw-svcm/nfs.so:NfsSocketProcessTask+0x415 /usr/likewise/lib/liblwbase.so.0:EventThread+0x6b0 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0x100 /lib/libthr.so.3:_pthread_getprio+0x15d

139910

The isi_cbind command did not parse numbers correctly. As a result, the

command could not be used to change settings that required a numeric value.

139008


OneFS web administration interface issues resolved in OneFS 7.2.0.2

If the name of your cluster started with a capital letter or a lowercase letter a or

letter b, and you clicked Start Capture on the Cluster Management >

Diagnostics > Packet Capture page of the OneFS web administration interface,

141970

Resolved issues


OneFS web administration interface issues resolved in OneFS 7.2.0.2

the resulting .tar file did not contain the expected network packet capture

(pcap) file, and the .tar file also contained some incorrect content.

SmartLock

SmartLock issues resolved in OneFS 7.2.0.2

On clusters running in compliance mode, the compadmin user did not havepermission to run the newsyslog command. As a result, the compadmin could

not manually rotate OneFS log files.

141953

SMB

SMB issues resolved in OneFS 7.2.0.2

In some cases, while the lwio process was shutting down on a node (because itwas manually or automatically restarted), the lwio SRV component waitedindefinitely for a file object to be freed and did not shut down. If this occurred, after5 minutes, the SRV service was stopped by the lwsm process and thenautomatically restarted. SMB clients were unable to connect to the affected nodeuntil the SRV service restarted.

147473

Distributed Computing Environment (DCE) Remote Procedure Calls (RPCs) that weresent to the cluster in big-endian byte order were not correctly handled. As a result,clients with CPUs designed to format RPCs in big-endian byte order—includingPowerPC-based clients—were unable to communicate with the cluster. Forexample, PowerPC-based clients running Mac OS 10.5 and earlier were unable toconnect to SMB shares. If a packet capture was gathered to diagnose this issue, annca_invalid_pres_context_id RPC reject status code appeared in the

packet capture.

147470

Although path names that are up to 1024 bytes in length are supported in OneFS7.2.0.x, if a user who was connected to the cluster from an SMB client attempted torename a file on the cluster in Windows Explorer, and if the full path to the renamedfile was greater than 255 bytes in length, the file was not renamed and thefollowing error appeared:

The file name(s) would be too long for the destination folder. You can shorten the file name and try again, or try a location that has a shorter path.

144100

If you ran the isi smb settings shares modify command with the --revert-impersonate-user option to restore the --impersonate-useroption applied to a share to the default value, the command did not take effectuntil the lwio process was restarted.

142066

After upgrading a cluster to OneFS 7.2.0.0 through OneFS 7.2.0.1, Linux and Macclients connecting to the cluster through SMB 1 were unable to view or list SMBshares. If an affected Linux client attempted to list shares the following errorappeared:

NT_STATUS_INVALID_NETWORK_RESPONSE

142060

Resolved issues

SmartLock 91


If an affected Mac client attempted to view shares in the Finder, an error similar tothe following appeared:

There was a problem connecting to the server.

As a result, SMB shares were not accessible to those Linux and Mac clients.

If an SMB2 client sent a compound request to the cluster, OneFS did not send thecorrect response. As a result, the client was disconnected from the cluster.

141961

In rare instances, if an SMB1 echo request was received on an SMB2 connection,the lwio process restarted unexpectedly. If the lwio process restarted, SMB clientsconnected to the cluster were disconnected, and messages similar to the followingappeared in the /var/log/messages file:

/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid 30325="lwio")(tid=100436) Stack trace:/boot/kernel.amd64/kernel: Stack: --------------------------------------------------/boot/kernel.amd64/kernel: /lib/libc.so.7:thr_kill+0xc /boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:__LwRtlAssertFailed+0x13c/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecute2+0x115f/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolTransport2DriverDispatchPacket+0x2f2/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolTransportDriverNegotiateData+0xe4a/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTaskReadBuffer+0x485/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTaskRead+0x36/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTask+0x53f/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:RunTask+0x8d/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:ProcessRunnable+0x95/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:EventLoop+0xeb/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:EventThread+0x3f/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0x8e/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d

141943

If the SMB2Symlinks option was disabled on the cluster and a Windows client

navigated to a symbolic link that pointed to a directory, under some circumstances,the system returned incorrect information about the symbolic link. If this occurred,the symbolic link appeared to be a file, and the referenced directory could not beopened.In addition, because OneFS 7.2.0.1 did not consistently check the OneFS registry toverify whether the SMB2Symlinks option was disabled, in some cases, although

the SMB2Symlinks option was disabled, the lwio process attempted to handle

symbolic links when it should have allowed them to be processed by the OneFS filesystem. If this occurred, the following error appeared on the client:


141323

If both the antivirus Scan files when they are opened option and the

SMBPerformance Settings Oplocks option were enabled, and a file was opened,modified, and closed multiple times through an application such as Microsoft

138763

Resolved issues



Excel, it could take 30 seconds longer than expected for the system to savechanges to the file.

If you attempted to create an SMB share of the /ifs/.snapshot directory or one

of its subdirectories through the OneFS web administration interface or thecommand-line interface, an error similar to the following appeared:

'/ifs/.snapshot' is under '/ifs/.snapshot': Invalid argument

138594

If an SMB client attempted to access an application through a symbolic link thatcontained Unicode characters, a backslash (\) followed by a zero (0) wassometimes appended to the symbolic link. As a result, the symbolic link did notlead to its intended target, and the application could not start.

137822

In Microsoft Windows, if you ran the mklink command to create a symbolic link to

a file or directory in an SMB share on the cluster, the command failed and the lwioprocess sometimes unexpectedly restarted, if the name of the symbolic link beganwith a colon (:). In addition, the following error appeared on the console:

The specified network name is no longer available

137820

An issue sometimes occurred that prevented access to absolute paths to filesthrough symbolic links. If this issue occurred, the link failed to return the file, andthe requested file could not be opened.

137772

Because OneFS did not respond correctly to a specific Local Security Authority(LSA) request made by Mac OS 10 clients running Mac OS 10.6 through 10.10, theACLs and POSIX owner applied to an affected share could not be viewed from MacOS 10 clients running those versions.

135560


Upgrade and installation issues resolved in OneFS 7.2.0.2

During a OneFS upgrade, there was a window of opportunity during which thearray.xml file on some nodes in the cluster could have contained out-of-date

version information. If a node whose array.xml file was out-of-date sent

messages to a node whose array.xml file was current, the affected node

exhibited unexpected behavior, such as random group changes.

Note

Although the array.xml file on the affected node contained out-of-date

information about the version of OneFS installed on the node, the node wassuccessfully upgraded. The unexpected node behavior was resolved when thearray.xml file was eventually updated.

If this issue occurred, messages similar to the following appeared on the console:

/boot/kernel.amd64/kernel: [gmp_rtxn.c:2636](pid 5052="kt: gmp-config")(tid=100178) gmp config took 0s/boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 5052="kt: gmp-config")(t/boot/kernel.amd64/kernel: id=100178) group change: <8,1787394> [up: 6 nodes, down: 123 nodes, shutdown_read_only: 3 nodes] (no change)

146937

Resolved issues

Upgrade and installation 93

Upgrade and installation issues resolved in OneFS 7.2.0.2

/boot/kernel.amd64/kernel: [gmp_info.c:1735](pid 5052="kt: gmp-config")(tid=100178) new group: <8,1787394>: { 8:0-11,13-22, 11:0-23, 46,55,70,93:0-11, down: 2, 4-7, 9-10,12-45, 47-54, 56-69, 71-92, 94-131, shutdown_read_only: 84, 91, 126, diskless: 100-108, 119-120, 123 }/boot/kernel.amd64/kernel: [gmp_rtxn.c:2636](pid 5052="kt: gmp-config")(tid=100178) gmp config took 0s/boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 5052="kt: gmp-config")(tid=100178) group change: <8,1787395> [up: 6 nodes, down: 123 nodes, shutdown_read_only: 3 nodes] (no change)

If a OneFS upgrade was performed while nodes were down, the SmartPools portionof the upgrade failed without presenting an error or logging a CELOG event. If thisissue occurred, new nodes could not be added to the cluster and nodes that wereremoved—for example, nodes that were smartfailed—could not be re-added to thecluster.If you encountered this issue, and you ran the following command, the disk poolversion listed was not correct for the version of OneFS to which the cluster wasupgraded:

isi_for_array -s 'sysctl efs.bam.disk_pool_db | grep version'

Note

The correct disk pool version for clusters running OneFS 7.2.0.x is version 8.

139285

If a USB flash drive with a bootable image of OneFS was attached to a node whilethe node was being smartfailed, the partition table on the flash drive becamedamaged. As a result, the node could not boot from the flash drive after it wassmartfailed, and the image on the flash drive was unusable.

110337

Virtual plug-ins

Functional area

Virtual plug-ins issues resolved in OneFS 7.2.0.2 ID

Virtualplug-ins

Attempts to register a OneFS 7.2.0 cluster as a VASA provider failed ifthe cluster had no iSCSI LUNs configured, and, following the failedregistration, portions of the OneFS web administration interface becameinaccessible. In addition, the httpd process unexpectedly restarted andthe following lines appeared in the /var/crash/httpd.log file:

prodisi1-6(id6) /boot/kernel.amd64/kernel: [kern_sig.c:3349](pid 52204="httpd")(tid=100097)Stack trace:prodisi1-6(id6)/boot/kernel.amd64/kernel: Stack: --------------------------------------------------prodisi1-6(id6) /boot/kernel.amd64/kernel:/usr/lib/libisi_vasa_service.so:_ZNK15vasa_db_manager34get_associated_ports_for_processorERKSt6vectorISsSaISsEERKSsPP9isi_error+0xcaprodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/lib/libisi_vasa_service.so:_ZNK16vasa_server_impl32queryAssociatedPortsForProcessorEPP9isi_errorP38_ns4__queryAssociatedPortsForProcessorP46_ns4__queryAss+0x504prodisi1-6(id6) /boot/kernel.amd64/kernel:

138741

Resolved issues


Functional area


/usr/lib/libisi_vasa_service.so:_Z39__ns5__queryAssociatedPortsForProcessorP4soapP38_ns4__queryAssociatedPortsForProcessorP46_ns4__queryAssociatedPortsForProce+0x102prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/lib/libisi_vasa_service.so:_Z50soap_serve___ns5__queryAssociatedPortsForProcessorP4soap+0xf7prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/lib/libisi_vasa_service.so:_Z10soap_serveP4soap+0x58prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/modules/libmod_gsoap.so:_init+0x1b66prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:ap_run_handler+0x72prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:ap_invoke_handler+0x7eprodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:ap_process_request+0x18eprodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:ap_process_http_connection+0x13dprodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:ap_run_process_connection+0x70prodisi1-6(id6) /boot/kernel.amd64/kernel: /usr/local/apache2/bin/httpd:worker_thread+0x24bprodisi1-6(id6) /boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d


Antivirus


AVScan reports were deleted from the OneFS system 24 hours after the jobsuccessfully completed because the end date for the reports was incorrectly set to1970-01-01.

Note

Detected threats could still be viewed through the AVScan database.

113563

Authentication


If different nodes in a cluster were connected to different network subnets and ifthose subnets were assigned to different Active Directory sites, the siteconfiguration information on the cluster was repeatedly updated. Because updates

138750

Resolved issues

Resolved in OneFS 7.2.0.1 95


to the site configuration information require a refresh of the lsass service, thisbehavior caused authentication services to become slow or unresponsive.

On a cluster with multiple access zones configured that was upgraded from OneFS7.0.x or earlier to OneFS 7.2.0.0, if you attempted to create a local user from thecommand line interface or through the OneFS web administration interface in anaccess zone other than the System access zone, an error similar to the followingappeared, and the user could not be added to the access zone:

Failed to add user <username>: SAM database error

135537

Intermittently, incoming SMB sessions were successfully authenticated andreceived the correct username, but were mapped to the wrong SID. As a result,audit logs associated the incorrect SID with the affected user and the affected userwas denied access to their files. To resolve the problem, the lsass process had tobe restarted on all nodes in the cluster.

135182

If you ran a recursive chmod command to add, remove, or modify an access control

entry (ACE) to a directory that contained files that were quarantined by an antivirusscan, the command stopped running when it encountered a quarantined file. As aresult, ACEs were only modified on the files and directories that were processedbefore the command stopped running.

134860

In the OneFS web administration interface, if you created a user mapping rule thatcontained incorrect syntax related to the use of quotation marks, the following errorappeared when you attempted to save the updated Access Zone Details:

Your access zone edit was not savedError #1: Rules parsing failed at ' ': syntax error, unexpected QUOTED, expecting BINARY_OP or UNARY_OP

In addition, future attempts to create mapping rules sometimes failed.

134825



A SyncIQ job configured with the --disable_stf option set to true sometimes

failed when an sworker—a process responsible for transferring data duringreplication—detected differences between files on the source and target clustersand then attempted to access and update the linmap database. If a SyncIQ jobfailed as a result of this issue, the following error appeared in theisi_migrate.log file:

A work item has been restarted too manytimes. This is usually caused by a network failure or a persistentworker crash.

132579

If a Multiscan or Collect job was running, it was possible for the job to attempt toupdate the snapshot tracking file (STF) for a snapshot at the same time that a writewas made to a file under that snapshot. If this occurred, and if the STF filecontained a large number of files (in the millions), it was possible for the Multiscanor Collect job to fail to account for some blocks of data in the STF file, or to account

138403

Resolved issues



for some blocks of data more than once. If this issue occurred, errors similar to thefollowing appeared in the /var/log/idi.log file:

Malformed block history: marking free block

or

Malformed block history: freeing free block

Note

In addition to the errors that were logged, a coalesced event appeared in the list of

new events on the Dashboard > Events > Summary page in the OneFS web

administration interface. The event ID, which can be found by clicking Viewdetails in the Actions column, was 899990001, and the message was as follows:

File system problems detected

The NDMP process ignored the protocol version setting in the config.xml file. As

a result, only NDMP version 4 messages were accepted and sent.

135187

In environments with a large number of configured SyncIQ policies, theisi_classic sync job report and isi_classic sync listcommands sometimes took several minutes to return a list of SyncIQ reports.

135183

The NDMP process unexpectedly restarted after attempting to back up a symboliclink that referenced a file whose name contained EUC-JP encoded characters. If theNDMP process restarted as a result of this issue, the in-progress backup job failed.

134846

If the paths added to the NDMP EXCLUDE or FILES environment variables exceededthe maximum length allowed—1024 characters—the affected backup job would failand an error similar to the following appeared in the ndmp_debug.log file:

ERRO:NDMP fnmmatching.c:413:isi_fnm_is_valid_patternExclude pattern longer than 1024 limit

Note

The maximum length allowed is now handled by the Data Management Application(DMA).

134845

In rare circumstances, the isi_snapshot_d process failed due to an internal errorbut the process would not exit. As a result, it was not possible to create newscheduled snapshots or to recover previous versions of snapshot files created bythe scheduling system, and the following error message appeared inthe /var/log/isi_snapshot_d.log file, where [####] is the PID for the

isi_snapshot_d service:

isi_snapshot_d[####]: Unable to manage orphaned snapshots:Socket is not connected

134808

In environments with a large number of configured SyncIQ policies, the isi syncjob report and isi sync list commands sometimes took several minutes

to return a list of SyncIQ reports.

134429

Resolved issues



SmartLock compliant files and directories that were backed up through an NDMPfile list back up could not be restored to a SmartLock domain. This issue occurredbecause the selected files were not backed up in SmartLock compliance mode. Ifthis issue occurred, lines similar to the following appeared in thendmp_debug.log file:

Restoring NDMP files from <source_smartlock_domain> to [See line below]Restoring NDMP files from [See line above] to <target_smartlock_domain>DAR disabled - continuing restore without DARAttempting normal restore.Cannot extract non-Compliant archive entry to a SmartLock Compliance directory.

134227



Lwio subscriptions held by the isi_gconfig_d process were not always released in atimely manner. As a result, the subscriptions sometimes accumulated. If a largenumber of subscriptions accumulated, it sometimes took a long time to releasethese resources back to the system and it was possible for the isi_gconfig_dprocess to become unresponsive until the operation was complete. Because theisi_gconfig_d process is responsible for maintaining SMB share configurationinformation, if this issue occurred, SMB clients were prevented from viewing orcreating shares, and messages similar to the following appeared inthe /var/log/lwiod.log file:

lwio[4814]: StoreChangesWatcherThreadRoutine store errorsubscription request failed: could not update local database:cluster database(revision 0) older than local database (revision3)lwio[83454]: StoreChangesWatcherThreadRoutine store error did notget response from server

139741

Command-line interface

Command-line interface issues resolved in OneFS 7.2.0.1 ID

If you ran the isi status -d -w command in an environment with long pool

names, the pool names broke into multiple lines in the output—as many as wereneeded to fit into the table. Because the table was not widened to accommodatethe pool name, this caused issues with scripts that parse the output in the table.

134717



The safe.id.nvram onsite verification test (OVT) did not include support for theversion 2.1 MLC NVRAM card model. As a result, the safe.id.nvram test failed and

139905

Resolved issues



errors similar to the following appeared on the console and in the /ifs/.ifsvar/ovt log files:

[safe.id.nvram]: NVRAM card detected: /dev/mnv0: NVRAM battery voltages okayFAILED : NVRAM Rev: 5 (should be 3)

If you edited or added a notification rule, the first six configurable events listed on

the Edit Notification Rule and Add Notification Rule pages were related toCloudPools, a feature that was not available on the cluster.

136709

If an Simple Network Management Protocol (SNMP) request was sent to a node towhich multiple IP addresses were assigned, the reply to that request could havebeen returned from an IP address that differed from the address to which therequest was sent.

Note

In some environments, such as those configured with a firewall, replies receivedfrom an address other than the address to which a request is sent are unrecognizedand rejected. If the reply to an SNMP request is rejected because the IP addressisn't recognized, the SNMP request fails.

135006

On clusters where a large number of events were regularly logged, events weresometimes logged faster than the EMC CEE Event Forwarder (isi_audit_cee)

was able to forward them. If this occurred, a backlog of events waiting to beforwarded could have developed and might have continued to grow.

134420

File system


Under rare circumstances, the FlexProtect and FlexProtectLin jobs left pointers toblocks on a node or a drive that was no longer in the cluster. If a file was partiallytruncated during a repair job (the job that is responsible for removing nodes ordrives), there was a narrow window where, if a further unlikely circumstanceoccurred (such as a node reboot or a temporary network issue that affected back-end network connections between nodes), then some snapshot data might havebeen left under-protected. A subsequent mark job (such as MultiScan orIntegrityScan) would then log attempts to mark blocks owned by a snapshot of thetruncated file on the node or drive that was no longer on the cluster. As a result,messages similar to the following appeared in the /var/log/idi.logand /var/log/messages files, where <Node>,<Drive> identified the device that

was no longer in the cluster:

Marking a block on gone node or drive: Marking block <Node>,<Drive>,98820513792:8192 on a gone drive.

In addition, running the isi events list command displayed messages

similar to the following, where <instanceID> is the instance ID value:

<instanceID> 01/30 16:25 -- W 1 Filesystem problems detected

139723

Resolved issues

File system 99


And running the isi events show <instanceID> -w command displayed

coalesced events similar to the following:

ID: <instanceID>Coalesced events:(l 1::HEAD b 2,0,311296:8192, Marking a block on gone node or drive)(l 1::HEAD b 2,0,311296:8192, Accessing a gone drive on mark)

Note

This information is also available on the Dashboard > Events > Cluster EventsSummary page in the OneFS web administration interface. Contact EMC IsilonTechnical Support immediately if you see these messages on the console or in theweb administration interface.

If protocol auditing was enabled and the NFS auditing service was running, the NFSservice failed to start. As a result, data access through NFS was limited. In addition,the following NFS statuses appeared in the output after running the lwsm list |grep nfs command:

flt_audit_nfs [driver] running nfs [driver] stopped onefs_nfs [driver] stopped

136061

After adding a node to a large cluster that had L3 Cache enabled, some nodes inthe cluster might have unexpectedly rebooted.

136031

If there were millions of back end batch messages in a single batch initiator on anode, the counter in the batch data structure sometimes reached the maximumallowed value. If this occurred, the affected node could have rebootedunexpectedly, causing clients connected to the node to be disconnected, and amessage similar to the following appeared in the var/log/messages

log file:NULL msg context for rbid

135828

In the OneFS web administration interface, if you increased the size of an existingiSCSI LUN, OneFS did not include the space already used by the LUN whencalculating how much space the LUN would occupy after the LUN was resized. As aresult, the web administration interface would display a Size exceedsavailable space on cluster error even if there was sufficient space to

accommodate the larger LUN. For example, on a 10 GB cluster configured with a 5GB LUN and 5 GB of available space, if you attempted to increase the size of the 5GB LUN to 6 GB, OneFS would calculate the amount of space needed for the 6 GBLUN based on the 5 GB of available space, and would return the error.

134851

If an Integrity Scan was run on a damaged, mirrored file, the node checking the fileunexpectedly rebooted, and lines similar to the following appeared inthe /var/log/messages file and on the console:

Stack:--------------------------------------------------kernel:isi_assert_halt+0x42efs.ko:bam_verify_file_data_mirrors+0xdd5efs.ko:bam_verify_file_data+0x611efs.ko:bam_mark_file_data+0x6a8efs.ko:ifs_mark_file_data+0x373

134725

Resolved issues



efs.ko:_sys_ifs_mark_file_data+0x14ckernel:isi_syscall+0x53kernel:syscall+0x1db--------------------------------------------------

In addition, a FAILED ASSERTION message similar to the following appeared

in the /var/log/messages file and clients connected to the affected node were

disconnected when the node rebooted:

*** FAILED ASSERTION error != 0 @ /build/mnt/src/sys/ifs/bam/bam_verify.c:1144:

In the OneFS command-line interface, the descriptions of some sysctl optionsreferred to incorrect time units. For example, the description of theefs.bam.av.scanner_wait_time sysctl option indicated that the

assigned value represented the number of milliseconds that the scanner threadwould sleep, when the value actually represented the number of operating systemticks that the thread would sleep. The descriptions of the following sysctl optionshave been updated to reflect the correct information:

l efs.bam.av.scan_on_open_timeout

l efs.bam.av.scan_on_close_timeout

l efs.bam.av.batch_scan_timeout

l efs.bam.av.nfs_request_expiration

l efs.bam.av.scanner_wait_time

l efs.bam.av.nfs_worker_wait_time

l efs.bam.av.av_opd_restart_sleep

Note

To view the description of a sysctl option, run the following command where<option> is the option whose description you want to view:

sysctl –d <option>

134217

Hardware


X210 and X410 nodes that were configured to communicate through a 10 GigEnetwork interface card that was using the Broadcom NetXtreme Ethernet (BXE)driver that was introduced in OneFS 7.2.0 might have restarted unexpectedly. If thisoccurred, a message similar to the following appeared in the var/log/messages file:

Node panicked with Panic Msg: sleeping thread 0xffffff04692a0000 owns a nonsleepable lock

138521

Because the isi_inventory_tool command could not handle part numbers

with more than 11 digits, if you ran the isi_inventory_tool --137173

Resolved issues

Hardware 101


configCheck command on an HD400 node (a node that uses a new part number

format with more than 11 digits), the part number could not be processed, anderrors similar to the following appeared on the console:

Unexpected exception: <type 'exceptions.TypeError'>

If you attempted to install a drive support package (DSP) while the /ifs partition

was not mounted, the following lines appeared on the console:

File "/usr/bin/isi_dsp_install", line 730, in <module> rc = main()File "/usr/bin/isi_dsp_install", line 701, in main installed = dsp_installed()File "/usr/bin/isi_dsp_install", line 593, in dsp_installed info = isi_pkg_info()File "/usr/bin/isi_dsp_install", line 188, in isi_pkg_info error("%s: rc=%d%s" % (estring, rc, rc and ':' or ''))NameError: global name 'estring' is not defined

Note

Beginning in OneFS 7.2.0.1, if you attempt to install a DSP when the /ifs partition

is not mounted, the following error appears:

ERROR: Cannot check if DSP is installed. Please ensure /ifs is mounted.

136710

If you ran the isi firmware update command on an HD 400 node and it

included updating the Chassis Management Controller (CMC) device firmwarealong with other devices, the firmware update process might have failed. If theprocess failed, errors similar to the following appeared on the console:

Error uploading firmware block, compcode = d5| Error in Upload FIRMWARE command [rc=-1]TotalSent:0x10Firmware upgrade procedure failed

136039

HDFS


If Kerberos is enabled, a Cloudera 5.2 client cannot connect to datanodes that donot have Simple Authentication and Security Layer (SASL) security enabled, unlessthe datanode service is running on a port lower than port 1024. Because OneFS didnot support SASL security for datanodes and because OneFS ran the datanodeservice on port 8021, Cloudera 5.2 clients could not connect to the cluster. If aCloudera 5.2 client was unable to connect for this reason, errors similar to thefollowing might have appeared in log files on the client:

java.io.IOException: Cannot create a secured connection if DataNode listens on unprivileged port (8021) and no protection is defined in configuration property dfs.data.transfer.protection.

138484

Resolved issues



If a HAWQ client attempted to connect to HDFS over Kerberos, the connection andauthentication process failed and an error similar to the following was logged inthe /var/log/isi_hdfs_d.log file:

Requested identity not authenticated identity.

137967

If an application, such as Cloudera Impala, queried OneFS for information aboutsupport for HDFS ACLs, OneFS did not respond correctly. As a result, the applicationthat sent the query unexpectedly stopped running and the following messageappeared in the /var/log/messages file:

isi_hdfs_d: Deserialize failed: Unknown rpc: getAclStatus

137303

During read operations, an HDFS client sometimes closed its connection to theserver before reading the entire message received from the server. Although closingconnections in this manner did not cause any issues on the cluster, if this occurred,the following message appeared multiple times in the isi_hdfs_d.log file:

Received bad DN READ ACK status: -1

135859

If a user ran the hdfs dfs -ls command to view the contents of a directory on

the cluster, files to which the user did not have read access did not appear in theoutput of the command.

135858

In a Kerberos environment, applications (including Hive, Pig, and Mahout) thatmade multiple and simultaneous HDFS connections through the same usersometimes encountered authentication errors similar to the following:

Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException (Delegation Token can be issued only with kerberos or web authentication)'

135644

Because OneFS did not properly handle requests from HDFS clients if the requestscontained fields that the OneFS implementation of HDFS did not support, affectedclients were unable to write data to the cluster. If this issue occurred, ajava.io.EOFException error similar to the following appeared on the client:

[user@hadoop-client]$ hdfs dfs -put file.txt /14/12/19 12:53:36 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block isi_hdfs_pool: blk_4297916419_1000java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)put: All datanodes 10.7.135.55:8021 are bad.Aborting...

135568

Resolved issues

HDFS 103


In addition, lines similar to the following appeared in the /var/log/messagesfile:

2014-12-19T12:53:34-08:00 <1.3> cluster-1(id1) isi_hdfs_d: Malformed packet, dropping. DN ver=28, packet seqno=0, payload len: 1476487168, crc len = 0 data len: 0)2014-12-19T12:53:34-08:00 <1.3> cluster-1(id1) isi_hdfs_d: Error while receiving packet #0

Under some circumstances, the isi_hdfs_d process handled the return value of asystem call incorrectly, causing the HDFS process to restart. If this occurred, HDFSclients were disconnected from the affected node, and the following error appearedin the isi_hdfs_d.log file:

FAILED ASSERTION pr >= 0

135185

During read operations, an HDFS client sometimes closed its connection to theserver before reading the entire message received from the server. Although closingconnections in this manner did not cause any issues on the cluster, if this occurred,the following message appeared multiple times in the isi_hdfs_d.log file:

Received bad DN READ ACK status: -1

135184

If a Hadoop Distributed File System (HDFS) client attempted to perform a recursiveoperation on a directory tree, a race condition sometimes occurred in theisi_hdfs_d process which caused the process to restart unexpectedly. This racecondition was most frequently encountered while an HDFS client was recursivelydeleting directories. If the isi_hdfs_d process unexpectedly restarted as a result ofthis condition, HDFS clients connected to the affected node were disconnected andmessages similar to the following might have appeared in the /var/log/isi_hdfs_d.log file:

isi_hdfs_d: RPC delete raised exception:Permission denied from rpc_impl_delete (/usr/src/isilon/bin/isi_hdfs_d/rpc_impl.c:484) from _rpc2_delete_ap_2_0_2 (/usr/src/isilon/bin/isi_hdfs_d/rpc_v2.c:811)

134863

Job engine


If a cluster was experiencing heavy client traffic, OneFS might have significantlylimited the amount of cluster resources that job engine jobs were allowed toconsume, causing jobs to run very slowly.

136193

Migration


After performing an initial full migration from a VNX array to an Isilon clusterthrough isi_vol_copy_vnx, if a hard link was deleted from the source VNX

135028

Resolved issues



array and a new file with the same name was then created on the source array, itwas possible for the data from the new file to be improperly copied to the hard linkon the target cluster. This issue occurred because the isi_vol_copy_vnx utility

copied data from the new file into the pre-existing hard link when it should havedeleted the hard link from the target cluster, and then created the new file on thetarget cluster. If this occurred, the new file was not accessible on the target cluster.

If the isi_vol_copy utility was unable to resolve on-disk identities associated

with data being migrated to a OneFS cluster, the operation timed out. If theoperation timed out, the correct user and group information might not have beenapplied to the migrated data, and valid users and groups might not have hadaccess to the data following the migration. In addition, messages similar to thefollowing appeared on the console and in the /var/log/messages file:

Warning: Unable to convert security descriptor blob, bytes:328 err:60[Operation timed out] Error after looking up ACL: no sd aclino 56974197 for ./bde_1.22.0/snapshot/groups/bas/group/bas.cap, inode 32017462, err:Operation timed out

134715

If you ran the isi_vol_copy utility to migrate files from a NetApp filer to an Isilon

cluster, and the ACL setting Deny permission to modify files withDOS read-only attribute over both UNIX (NFS) and WindowsFile Sharing (SMB) was enabled, incremental migrations might have failed to

transfer some files to which the DOS read-only attribute was applied. If thisoccurred, errors similar to the following appeared in the isi_vol_copy.log file:

./dirX/fileY.txt: cannot create file: Operation not permitted

134434

Networking


The OneFS web administration interface allowed the same IP address range oroverlapping IP address ranges to be assigned to the int-a and int-b interfaces and the InfiniBand internal failover network. If a cluster was configured with the sameor overlapping IP address ranges, nodes sometimes displayed unexpectedbehavior or unexpectedly rebooted.

Note

Beginning in 7.2.0.1, the IP ranges for the int-b interface and the InfiniBandinternal failover network cannot be configured until a valid Netmask has beenspecified.

136888

The rate of data transfer to and from nodes that were configured with linkaggregation on their 10GbE network interfaces in combination with a maximumtransfer unit (MTU) of 1500 was sometimes slower than the rate of data transfer toand from nodes that were not configured in this way.

136887

If SmartConnect zone aliases were configured on a Flexnet pool, a memory leak thatcould affect several processes related to the SyncIQ scheduler was sometimesencountered. If this memory leak occurred, scheduled SyncIQ jobs did not move to

136704

Resolved issues

Networking 105


the running state, and lines similar to the following appeared in theisi_migrate.log file:

isi_migrate[6923]: sched: siq_gc_conf_load: Failed to gci_ctx_new: Could not allocate parser read buffer: Cannot allocate memory

As a result, SyncIQ jobs in a scheduled state never moved to the running state.

If a new node was added to a cluster that was configured for dynamic IP allocation,SmartConnect did not detect the configuration change and did not assign the newnode an IP address. As a result, clients could not connect to the affected node. If agroup change occurred after the new node was added, or if IP addresses weremanually rebalanced by running the isi networks --sc-rebalance-allcommand, SmartConnect then detected the configuration change and assigned anIP address to the new node.

136295

Because the driver for the 10 GbE interfaces on the A100 Accelerator nodes wasout-of-date, the interfaces sometimes unexpectedly stopped transferring data. Ifyou ran the ifconfig command to confirm the status of an affected interface, a

no carrier message appeared, even if a cable in good working order was

connected to the interface. To restore functionality, the affected node had to berebooted.

136293

By default, OneFS assigned an IPv6 IP address to the loopback interface, downinterfaces, and ifdisabled interfaces. As a result, AAAA (IPv6) requests were sent toDNS servers. If AAAA requests were sent to a DNS server that was not configured torespond to them, the following error was returned:

Server Failure

This affected the performance of applications running on the cluster that performedlarge numbers of DNS lookups, such as mountd.

135193

If an IPv4 SmartConnect zone was a subdomain of another SmartConnect zone (forexample, name.com and west.name.com), clients that sent a type AAAA (IPv6) DNSrequest for the subdomain zone received an NXDOMAIN (nonexistent domain)response from the server. This response could have been cached for both type A(IPv4) and type AAAA requests. If this occurred, future DNS requests for thesubdomain zone (in this example, west.name.com) could also receive anNXDOMAIN response, preventing access to that SmartConnect zone.

135173

If a network interface that had IP addresses assigned to it by the Flexnet processfailed, the IP addresses were not failed over to another node or interface. As aresult, a Failed to open BPF message would appear in the var/log/messages file, and the interfaces had to be manually removed from the pool.

134723

NFS


If all of the following factors were true, a user with appropriate POSIX permissionswas denied access to modify a file:

l The user was connected to the cluster through NFSv3.

141210

Resolved issues



l The user was a member of a group that was granted read-write access to thefile through POSIX mode bit permissions, for example, -rwxrwxr-x (775).

l The user was not the owner of the file.

Depending on how the file was accessed, errors similar to the following might haveappeared on the console:

Permission denied

or

Operation not permitted


If users were being authenticated through a Kerberos authentication mechanism,NFS export mapping rules such as map-root and map-user were not being enforcedfor those users. As a result, the file permissions check was not correct, and usersmight have had incorrect allow or deny file access permissions.

139001

If the NFS server was unable to look up a user through the expected provider—forexample, if the LDAP provider was not accessible—the NFS server did not attemptto look up the user in the local database, but instead mapped the user to thenobody (anonymous) user account. As a result, some users were denied access toresources that they should have had access to.

138784

Due to a memory leak, each time an NFS client registered or unregistered throughNetwork Lock Manager (NLM), some memory was allocated but never returned tothe system. Over time, this behavior could have caused a node to run out ofavailable memory, which would have caused the affected node to unexpectedlyreboot. If a node unexpectedly rebooted, clients connected to that node weredisconnected.

137261

If an NFS export that was hosting a virtual machine's (VM) file system over NFSv3became unresponsive, the VM's file system became read-only.

136637

If the OneFS NFS server was restarted, it assigned client IDs to NFS clientsbeginning with client ID 1. As a result, in environments with very few NFS clients, itwas possible for a client to be assigned the same client ID before and after the NFSserver was restarted. If this occurred, the NFS client did not begin the necessaryprocess to recover from the loss of connection to the NFS server, and the NFS clientbecame unresponsive.

136365

If a network or network provider became unavailable, the LDAP provider might haveevaluated some error conditions incorrectly, causing inaccurate or empty netgroupinformation to be cached and distributed to nodes in the cluster. If incorrect orempty netgroup information was distributed, LDAP users could not beauthenticated and could not access the cluster.

135780

If the isi_nfs4mgmt tool was called to manage clients on a node that had

thousands of NFSv4 clients connected, the NFS service unexpectedly restarted,causing a brief interruption in service, and lines similar to the following appearedin the /var/log/messages file:

[ … Several possibly unrelated calls … ]/usr/likewise/lib/lwio-driver/nfs.so:xdr_nfs4client+0x45/usr/likewise/lib/lwio-driver/nfs.so:xdr_reference+0x42

135690

Resolved issues

NFS 107



/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74/usr/likewise/lib/lwio-driver/nfs.so:xdr_nfs4client+0x114/usr/likewise/lib/lwio-driver/nfs.so:xdr_reference+0x42/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74/usr/likewise/lib/lwio-driver/nfs.so:xdr_nfs4client+0x114/usr/likewise/lib/lwio-driver/nfs.so:xdr_reference+0x42/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74[ … repeats many times … ]

While the NFS service was being shut down, it could have attempted to use memorythat was already freed. If this occurred, the NFS service restarted. Because theservice was being shut down, there was no impact to client services.

135528

In environments with NFSv4 connections, the 30-second lease time setting for thevfs.nfsrv.nfsv4.lockowner_nolock_expiry sysctl was not properly

applied by the OneFS NFS server if locks were held for a very brief duration. As aresult, the server prematurely timed out lock owners, causing the server to send anNFS4ERR_BAD_STATEID error to the client. In some cases, affected NFS

clients were temporarily prevented from accessing one or more files on the cluster.

135467

Because the NFS refresh time was in the range of 10 minutes per 1000 NFS exports,if you had thousands of exports, there was a significant delay before changes andadditions became effective. This delay might have adversely affected NFSworkflows.

135222

If you ran the isi nfs exports create command with the --force option

to force the command to ignore bad hostname errors, the command also ignoredexport rule conflicts. As a result, it was possible to create two exports on the samepath with different rules. For example, you could create two exports of the /ifs/data directory where export 1 was set to read-write permissions and export 2 was

set to read-only permissions. If an NFS client connected to the /ifs/data export,either rule could have been applied, resulting in an inconsistent experience for theclient.

135217

During the NFS export host check, although an IPv6 address (AAAA) was notconfigured on the node, AAAA addresses were searched. As a result, during startup,mountd would be very slow to load export configurations that referred to manyclient hosts.

135192

On systems with thousands of NFS exports, it might have taken several minutes tolist the exports with the isi nfs export list command.

135111

If you attempted to modify thousands of exports using the isi nfs exportmodify command, the following error appeared:

RuntimeError: Incomplete response from server.

In addition, the export might or might not have been modified.

Note

Increasing the --timeout value did not resolve this issue.

135107

Due to a race condition between an NLM unlock message and a lock completioncallback message, it was possible for the primary delegate to unregister anddestroy LKF client entries that the backup delegates retained, causing the lock datafor the affected client to become inconsistent. If this occurred, lock requests from

134452

Resolved issues



NFS clients to the affected node sometimes timed out and messages similar to thefollowing appeared in the /var/log/messages file:

lkfd_simple_waiter_backup_resp_cb: Unregister for client: 0x<lkf-client-id> failed with error: 16

If the cluster was handling many client requests from clients connected throughdifferent protocols (for example, both SMB and NFS clients), contention for file-system resources sometimes caused delays in client request processing. If theprocessing of client requests was delayed, kernel resources might have beenreserved more quickly than they were released until all resources were eventuallyconsumed, and then the node restarted unexpectedly.

133963

OneFS API

OneFS API issues resolved in OneFS 7.2.0.1 ID

Because the RESTful Access to the Namespace (RAN) API process was not case-sensitive, if you queried for a directory or file name through the RAN API, it waspossible for the query to return the wrong file. For example, if the file systemcontained a file named AbC.txt and a file named abc.txt, a query for AbC.txtmight have returned abc.txt instead.

136526

If a user with an RBAC role was deleted from Active Directory and then the role thatthe user belonged to was modified, an erroneous entry was added to the sudoersfile. As a result, if a user ran the sudo command, a syntax error similar to the

following appeared:

sudo: >>> /usr/local/etc/sudoers: syntax error near line 86 <<< sudo: parse error in /usr/local/etc/sudoers near line 86 sudo: no valid sudoers sources found, quitting sudo: unable to initialize policy plugin

Consequently, the sudo command could not be used.

135186

Resolves an issue where a Role-based Access Control (RBAC) privilege wasincorrectly applied.

134445

If a namespace API query used the max-depth query parameter to discover the

number of files and subdirectories in the /ifs/home directory, the query

sometimes returned only a portion of the contents of the directory. In other cases,the query returned the entire contents of the directory. If either result was returned,the object_d job unexpectedly restarted.

134416



In the OneFS web administration interface, on the Cluster Diagnostics > GatherInfo page, if you clicked the Start Gather button to collect and send log files to

EMC Isilon Technical Support and the file upload failed, the Gather Status barindicated that the gather succeeded. However, no .tgz file was created and new

gathers could not be started.

134854

Resolved issues

OneFS API 109


In the OneFS web administration interface, if the cluster time zone was changed,the new date and time set on the cluster was sometimes incorrect. If the new dateand time set on the cluster was significantly different than the correct date andtime in the selected time zone, the difference could prevent the cluster fromproperly communicating or synchronizing with external systems, such as ActiveDirectory domain controllers.

134426

SmartLock


In compliance mode, the compadmin role did not have read permissions for severallog files, including the isi_papi_d and isi_papi_d_audit log files. As a

result, the log files were not collected during the isi_gather_info process.

134422

SmartQuotas


If a default-user quota existed on a directory where the user did not have a linkedquota, and you modified the default-user quota to clear a threshold and then againto set a threshold, the user quota domain was not created, and the followingmessage appeared if the isi quota quotas create command was run,

where <username> was the name of the specific user:

Creating: user:<username>@snaps=no@/ifs/data/ec_workareas FAIL !! Failed to create domain 'user:<username>@snaps=no@/ifs/data/ec_workareas': Failed to save !! domain: Invalid argument

As a result, cluster space could not be allocated to specific user data.

135225

If a quota was created with a hard threshold, and the hard threshold was clearedduring a quota modify operation, the --enforced option remained enabled. As a

result, it was also possible to enable the --container option, although the --container option applies only to hard thresholds.

134213

If you applied a default-user quota to a directory, and then attempted to create auser quota on the same directory by using the isi quota quotas createcommand, the operation failed and the following message appeared:

Failed to save domain: File exists

133641

SMB


If a Windows client that was connected to the cluster through SMB copied a filefrom the cluster, the timestamp metadata applied to the file might have becomeinvalid. This issue occurred because OneFS did not properly interpret the value

142313

Resolved issues



assigned to a file's timestamp metadata if the value was set to -1, which is a validvalue. Workflows that rely on timestamp metadata might have been negativelyaffected by this issue.

Note

The SMB protocol specifies that, when file attributes are set, a value of -1 indicatesthat the attribute in the corresponding field must not be changed.

For more information, see ETA 198187 on the EMC Online Support site.

On a Microsoft Windows client, if you attempted to delete a file from an SMB shareand the letter case of the file path that you wanted to delete did not exactly matchthe actual letter case of the share path, the file was not deleted, and, if lwio loggingwas increased to the DEBUG level, the following messages appeared inthe /var/log/lwiod file:

Status: STATUS_OBJECT_NAME_NOT_FOUND

Note

Other file operations (Read, Write, and Rename) work as expected.

139852

After the lwio process unexpectedly restarted, the process could no longercommunicate with the srvsvc service. As a result, SMB clients that were connectedto the cluster could not view or list shares, and SMB shares on the cluster could notbe managed from a Windows client.

138694

After the NetBIOS Name Service (NBNS) was enabled, the service failed to listen onport 139. As a result, clients that relied on NBNS could not establish a connectionto the cluster.

136889

NetBIOS requests sent over SMB 2 were not properly handled. As a result, the lwioprocess unexpectedly restarted and lines similar to the following appeared inthe /var/log/messages file:

/lib/libc.so.7:thr_kill+0xc /usr/likewise/lib/liblwbase_nothr.so.0:__LwRtlAssertFailed+0x5a /usr/lib/libisi_ntoken.so.1+0x23d673:0x808490673 /usr/lib/libisi_ntoken.so.1+0x243b4e:0x808 496b4e /usr/lib/libisi_ntoken.so.1+0x2453c5:0x8084983c5 /usr/lib/libisi_ntoken.so.1+0x2e450f:0x80853750f /usr/likewise/lib/liblwbase.so.0:EventThread+0x333 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec /lib/libthr.so.3:_pthread_getprio+0x15d

135468

If SRV logging was enabled in the OneFS registry, incoming SMB 1 requests causedthe lwio process to unexpectedly restart. If the lwio process restarted, SMB clientsconnected to the cluster were disconnected.

134224

OneFS permitted the use of forward slashes in path names, and, in OneFS, forwardslashes within an SMB request were converted to backslashes. This behavior didnot comply with the SMB protocol, which specifies that such a request should failand return the following error:

OBJECT_NAME_INVALID

132952

Resolved issues

SMB 111



Note

OneFS 7.2.0.1 and later versions of OneFS comply with the SMB protocol. If an SMBrequest that contains a forward slash is received, an OBJECT_NAME_INVALID erroris returned.

Because OneFS sent an incorrect response to a NetBIOS session request, therequest to connect was closed and the NetBIOS client could not connect to thecluster. If the session request was closed, lines similar to the following appeared inthe packet capture:

10.0.0.35 10.0.0.100 NBSS 138 Session request10.0.0.100 10.0.0.35 TCP 66 netbios-ssn > 51660 [FIN, ACK]10.0.0.35 10.0.0.100 TCP 66 51660 > netbios-ssn [FIN, ACK]10.0.0.100 10.0.0.35 TCP 66 netbios-ssn > 51660 [ACK]

132574

Virtual plug-ins


Due to an error that occurred when drive capacity was checked during the creationof a new OneFS 7.2.0.0 cluster through the OneFS 7.2.0.0 simulator, after creatinga cluster on a system running Microsoft Windows or on a Microsoft Windows virtualmachine, the new cluster did not boot up, and the following messages appeared onthe console:

mount_efs: Reading GUID from da2s1e: No such file or directory mount_efs: update_ifs_drives: No drive available to mount. mount_efs: OneFS: Operation not supported by device IFS failed to mount. Aborting boot.

133546


Antivirus


If the ID of an antivirus scan report was more than 15 characters long, the OneFSweb administration interface and command-line interface would report the job asrunning forever. Any threats detected by the scan would not be associated with thecorrect policy.

125535

If you ran the isi avscan report purge command while an antivirus scan

was running, OneFS would sometimes delete the report of the antivirus scan thatwas currently in progress.

125534

A syntax error in the .xml file from which AVScan reports are generated caused

reports accessed from the Data Protection > Antivirus > Reports page in theOneFS web administration interface to not include threats that appeared on the

Data Protection > Antivirus > Detected Threats page.

125526

Resolved issues


Authentication


Users assigned to the admin group were able to reuse a previously used password

immediately even if the Password History Length option was configured toprevent the reuse of a specified number of previously used passwords.

130656

Users with assigned roles could not access the Cluster Management >Diagnostics page because permission to access the Diagnostics page wasassigned only to the ISI_PRIV_SYS_SUPPORT privilege.

130342

OneFS now defaults to LDAP paged search if both paged search and Virtual ListView (VLV) are supported. If paged search is not supported and VLV is enabled onthe LDAP server, OneFS will use VLV when returning the results from a search.

Note

In most cases, bind-dn and bind-password must be enabled in order to use VLV.

130171

If a mapping rule contained a username with a space, mapping tokens would fail,which prevented users from joining.

130024

Because the lsass process could not distinguish between different trust domainsthat shared the same NetBIOS name, role-based authentication would fail whenclients that were connected to the cluster through SSH, CIFS, or the webadministration interface tried to access the identically-named domains. As a result,the identically-named domains were inaccessible.

130003

If the dup() function (a function that duplicates a file descriptor) failed, no error wasreturned to the lsass process. As a result, the lsass process attempted to pass anonexistent file descriptor to the lwio process. If this condition was encountered,there was a potential for SMB clients to be temporarily prevented fromauthentication on the cluster.

128435

If you changed the machine name for the local provider (system zone) to includeperiods or commas, errors similar to the following were logged in the /var/log/messages file when an administrator attempted to create new users from thecommand line:

Failed to add user <usename>: Invalid Ldap distinguished name(DN)

123878

While the lwio process was in the process of shutting down, it sometimesreferenced a data structure that no longer existed. If this occurred, the followinglines were logged in the /var/log/messages file:

Stack: --------------------------------------------------/lib/libthr.so.3:_pthread_mutex_lock+0x1d/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncTableGet+0x1f/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncUpcallCallback+0x58/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xb9/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x8c/usr/likewise/lib/liblwbase.so.0:EventThread+0x333/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec/lib/libthr.so.3:_pthread_getprio+0x15d--------------------------------------------------

123397

Resolved issues

Authentication 113


When the RequireSecureConnection over LDAP setting was enabled, connection tothe LDAP server failed because the StartTLS command was not sent from the clusterto the LDAP server.

114935



If one or more objects, such as a file or directory, were moved out of the scope of aSyncIQ policy’s root path between two sequential snapshots, subsequentChangelistCreate jobs for those two snapshots failed and errors similar to thefollowing appeared in the isi_job_d log file:

Error 70 finding path of 4297707296Error in task 1-1: Stale NFS file handle

Worker 8 Busy: (worker_process_task, 1092) ChangelistCreate[296].0: Error in task 1-1: 14-09-05 12:30:17 ChangelistCreate[296] Node 1 (1) task 1-1: Stale NFS file handle from dir_get_dirent_path (/build/mnt/src/isilon/lib/isi_migrate/migr/utils.c:1891) from dir_get_utf8_str_path (/build/mnt/src/isilon/lib/isi_migrate/migr/utils.c:1996) from changelist_add_change_entry (/build/mnt/src/isilon/bin/isi_job_d/changelist_job.c:1369) from changelist_item_process (/build/mnt/src/isilon/bin/isi_job_d/changelist_job.c:1142)

Additionally, ChangelistCreate jobs could fail in a similar manner if the SyncIQpolicy root path was set to /ifs.

133809

If you ran a ChangelistCreate job, multiple nodes sometimes unexpectedlyrebooted, and lines similar to either of the following sets appeared inthe /var/log/messages file:

Stack: --------------------------------------------------kernel:isi_assert_halt+0x2ekernel:btree_leaf_get_key_at_or_before+...kernel:sbt_txn_get_entry_at+...kernel:_sys_ifs_sbt_get_entry_at+0x2a7kernel:isi_syscall+0x7fkernel:syscall+0x325--------------------------------------------------*** FAILED ASSERTION pct.num < pct.den @ /build/mnt/src/sys/ifs/btree/btree_leaf.c:2418

Stack: --------------------------------------------------kernel:isi_assert_halt+0x2ekernel:btree_leaf_get_key_at+0x15ckernel:sbt_txn_get_entry_at+0x287kernel:_sys_ifs_sbt_get_entry_at+0x2a7kernel:isi_syscall+0x7fkernel:syscall+0x325--------------------------------------------------*** FAILED ASSERTION pct.num < pct.den @ /build/mnt/src/sys/ifs/btree/btree_inner.c:2532

133504

A SyncIQ job sometimes failed while handling a file with a hard link if the hard linkreferred to a file that no longer existed.

131302

If you created a Smartlock directory, set the retention date to "forever," and thenattempted to restore the directory through NDMP, the NDMP job failed and the

131138

Resolved issues



following assertion failure message appeared in the/var/log/messages and

the /var/log/isi_ndmp_d files:

Assertion failed: (date >= 0), function pax_attribute, file archive_read_support_format_tar.c, line 1644.

A race condition caused parallel restores to fail. 131001

In environments where a large number of SyncIQ policies were configured (severalhundred), SyncIQ policies were not listed in the OneFS web administrationinterface and isi sync commands that list policies sometimes failed with the

following error:

CLI timeout exceeded while waiting for the server to respond; the request still may have completed.

130756

A SyncIQ job configured with the --disable_stf option set to true sometimes

failed when an sworker (a process responsible for transferring data duringreplication) detected differences between files on the source and target clustersand then attempted to access and update the linmap database.If a SyncIQ job failed as a result of this issue, the following error appeared in theisi_migrate.log file:

A work item has been restarted too many times. This is usually caused by a network failure or a persistent worker crash.

130340

If a SyncIQ policy designated a target directory that was nested within the SyncIQtarget directory of a preexisting policy, an error occurred during SyncIQ protectiondomain creation which caused the SyncIQ policy's protection domain to beincomplete. If this occurred, the following message appeared in the /var/log/isi_migrate.log file:

create_domain: failed to ifs_domain_add

In addition, if you ran the isi domain list-lw command, the Type field for the

affected SyncIQ target was marked Incomplete.

130337

SyncIQ requests from the OneFS command-line interface and web administrationinterface repeatedly opened and closed the reports.db SQlite database. As a result,changes made in the web administration interface would not take effect andcommands run from the command-line interface might not return results andeventually failed.

130000

If a large number of replication policies existed on the cluster, the isi syncpolicies list command might timeout before the command completed.

129999

The pthread-cancel process would sometimes fail without releasing the resources itcontained. As a result, other processes stopped indefinitely.

129997

If the number of file system event snapshots exceeds the amount of spaceallocated by the OnefsEnumerateSnapshots buffer, the lwio process will restart onvarious nodes, causing clients to be disconnected and then reconnected to thecluster.

125571

Resolved issues



If your environment included more than 16 interfaces, NDMP backups wouldsometimes fail with the following error message:

ERRO:NDMP custom.c:129:createIPList MAX_INTERFACES exceeded.

125536

Replication reports created for the first run of a replication policy sometimescontained inaccurate values. All other replication reports were accurate.

122906

Due to a memory leak in the isi_papi_d process, the process would sometimes stopresponding. As a result, SyncIQ policies would not be listed in the OneFS webadministration interface or after running the isi sync policy list command

from the command-line interface.

120509



If an unprovisioned drive was physically removed from a node without first beingsmartfailed and the isi_drive_d process was subsequently restarted (eithermanually or automatically), OneFS attempted to reprovision the removed drive,preventing new drives and nodes from being provisioned. As a result, new drivesand nodes could not be added to the cluster.

132913

Default SmartPools jobs incorrectly scanned configuration information for all fileson a cluster. As a result, SmartPools jobs progressed for days, but did notcomplete.

132309



Duplicate object identifiers (OIDs)in the ISILON-TRAP-MIB.txt file prevented

the use of certain SNMP monitoring tools.

131035

Some events were configured with invalid variable bindings (the associationbetween a variable name and its value). As a result, SNMP alerts were not sent forthese events.

130621

If a Windows 8.1 client or a Windows server 2012R2 SMB2 client requested filesystem volume and attribute information from the cluster and the maximumresponse length requested by the client was too small to hold the entire response,the affected node would return a STATUS_BUFFER_TOO_SMALL response

instead of the expected STATUS_BUFFER_OVERFLOW response. The client

was unable to handle this response, and, as a result, the request failed. This issuewas typically encountered while attempting to open a file with Notepad.

130589

On nodes that support the new non-disruptive drive firmware updates (NDFU)feature, if the CELOG process checked the state of a drive while a drive firmwareupdate was in progress, erroneous drive is ready to be replacedalerts were sometimes issued. NDFU is supported on the following nodes: S200,S210, X200, X400, X410, 108NL, and NL400.

130155

Resolved issues



While upgrading or reimaging a node, theisi_first_post_ifs_merged_configs file was not generated during

system start due to a condition requiring the existence of thedrive_gconfig.gc file. As a result, Drive Support Package (DSP) firmware

CELOG alerts were not generated.

130128

The flt_audit lwio filter driver would fail to audit SMB traffic on files with non-ASCIIcharacters in their names. As a result, these files were not audited, and Failedto allocate memory for a path component errors were displayed

in the /var/crash/lwiod.log file, even when memory was available.

130011

Scripts that called isi commands would sometimes cause a large number of the

following error message to appear in the /var/log/messages file:

expander_inquiry: attempt xx got no ses0 inquiry data

129143

Access to files with non-ASCII characters in their names was not audited. A clientcould access and modify such a file without problem, but the action would causean error in the audit filter driver and the following message would display in thelwiod.log file:

ERROR:flt_audit:0x805c02560:SyncGetFileName():audit_info_util.cpp:563: Failed to allocate memory for a path: UNKNOWN'

127508

If you ran the isi statistics heat command with the --events option, the

output was not filtered correctly.

125549

Incorrect SNMP traps were sent for some alerts. The alerts were sent with thecorrect alert level but indicated that the wrong threshold had been exceeded. Forexample, when a high temperature threshold was exceeded, a critical SNMP trapwas sent, however the trap stated that the low temperature threshold wasexceeded.

125541

SNMP traps were not sent if two or more SNMP recipients were defined in an eventnotification rule.

125537

File system


Whenever the asynchronous delete operation (an operation which deletes files inthe background while the user can run other OneFS operations) finished before allthe data was deleted, the synchronous delete path reverted a file back toasynchronous delete. As a result, the asynchronous delete operation became stuckin an endless loop, and multiple nodes attempted to delete the file at the sametime. This resulted in performance issues for the user.

132921

A race condition could be encountered if Network Lock Manager (NLM) received awould_block lock request from an NFS client just before a group change began. Ifthe race condition was encountered, a node could have been prevented fromleaving the group and the node that prevented the group change could becomeunavailable. If a node became unavailable, client connections to the affected nodetimed out or were unresponsive.

131724

Resolved issues

File system 117


If a node was leaving the cluster at the same time that the node received a lockrequest from an NFS client, a lock failover (LKF) waiter might be created. If thisoccurred, the affected node was prevented from leaving the cluster and wouldunexpectedly restart.

131198

It was possible to configure the overcommit limit below the low and highovercommit thresholds (which is an invalid configuration). If the overcommitthresholds were configured incorrectly, nodes sometimes ran out of memory andunexpectedly rebooted.

Note

The overcommit thresholds are set through the following sysctl settings:

l vfs.nfsrv.rpc.request_space_overcommitl vfs.nfsrv.rpc.request_space_highl vfs.nfsrv.rpc.request_space_low

130363

When reads were attempted at 4 MB or larger that cross a 16 GB boundary, thefollowing assertion failure appeared in the /build/mnt/src/sys/ifs/ifm/ifm_pg_cache.c:915 file for certain read size/offset combinations. As a result,

the node could restart unexpectedly.

FAILED ASSERTION lk_range_is_contained(&pg_range, lock_range)

129562

If you were running a data-width-changing restripe on a 4 TB file, a node mightunexpectedly reboot.

128750

If the filename related to a change notify event was not a valid UTF-8 string, anassertion error would sometimes occur, resulting in the lwio process restarting.

128077

If the event count for change events exceeded the 32-bit counter limit, multiplenodes might reboot unexpectedly and lines similar to the following would appear inthe kernel stack trace:

kernel:isi_assert_halt+0x42efs.ko:bam_event_synchronize+0x7f5efs.ko:ifs_vnop_wrapunlocked_write_mbuf+0x612kernel:recvfile+0x6e7kernel:isi_syscall+0x64kernel:syscall+0x26e

124720

If a customer-issued API call was used to look up the lock status of any given file inthe cluster at the exact same time that a node was being taken off the cluster, andthat node was the one tasked with the API verification, the entire cluster might havebecome unable to serve any NFS or SMB connections for about five minutes.

122759

Resolved issues


File transfer

File transfer issues resolved in OneFS 7.2.0.0 ID

If an HTTP client sent a request to the cluster through the Apache WebUI service,the following message appeared repeatedly in the /var/log/apache2/webui_httpd_error.log file:

Requested service"WK" doesn't match authenticated session services.

130648

If the httpd process was handling a large number of client connections, the processsometimes unexpectedly restarted while accepting a connection from an httpclient. If the process restarted, http clients connected to the affected node weredisconnected from the cluster.

127190

Hardware


In some cases, when an X410 or S210 node was configured for the first time,during the initial boot-up process the node did not boot completely, and thefollowing error messages appeared on the console:

Bad Magic in superblock: 0Failed to read journal during scan: No such file or directoryTest journal exited with an error.Aborting boot.

For more information about this issue, see KB 190590 on EMC Online Support.

131674

If a battery that supplies power to the mt25208 NVRAM failed, the LED on thatbattery remained green instead of turning red, even though the CELOG alertcorrectly indicated the battery’s failure. This issue affected the following nodetypes: S200, X200, X400, or NL400 nodes.

130683

If the PCIe connection between the motherboard and the NVRAM/IB card wasdisrupted, the affected node stopped responding. If the unresponsive node wassubsequently powered down, the NVRAM/IB card failed to set Fast Self Refresh(FSR). If FSR was not set when the node was powered down, the NVRAM journal wasnot preserved and the following message appeared on reboot:

Could not recover journal. Contact Isilon Customer Support Immediately.

In addition, the following entry appeared in the /var/log/messages file:

fsr=0 pwr=0

129914

Installing a power supply unit (PSU) firmware update on a common form factor (CFF)PSU in a node with only one working PSU caused the node to shut down. Thisoccurred because the working PSU was rebooted as part of the PSU firmwareupdate process.

129810

Attempting to SmartFail a SED would sometimes fail, even after the drive had beenmanually removed and successfully replaced.

129326

Resolved issues

File transfer 119


If the QLogic 10 Gig Ethernet card experienced a timeout from a request initiated bythe Direct Memory Access Engine (DMAE), lines similar to the following appeared inthe /var/log/messages file.

Stack: --------------------------------------------------if_bxe.ko:bxe_write_dmae+0xd0if_bxe.ko:bxe_write_dmae_phys_len+0x78if_bxe.ko:ecore_init_block+0x122if_bxe.ko:bxe_init_hw_common+0x7d5if_bxe.ko:bxe_init_hw_common_chip+0x18if_bxe.ko:ecore_func_hw_init+0xd7if_bxe.ko:ecore_func_state_change+0x10cif_bxe.ko:bxe_init_hw+0x41if_bxe.ko:bxe_nic_load+0x726if_bxe.ko:bxe_init_locked+0x18cif_bxe.ko:bxe_handle_chip_tq+0x86kernel:taskqueue_run_locked+0x9akernel:taskqueue_thread_loop+0x48kernel:fork_exit+0x7f--------------------------------------------------

As a result, the affected node had to be rebooted to restore node functionality.

129252

Nodes that contained bootflash drives would sometimes reboot unexpectedlyduring a disk firmware update if there was not enough memory available to meetthe requirements of the disk firmware update process.

129012

If you added an unsupported boot drive to a node, a CELOG alert was properlygenerated, but a traceback blocked the addition of the entry to the baseboardmanagement controller's (BMC) system events log. As a result, there was no reportof the unsupported drive in the events log.

128984

When installing the drive support package (DSP) that contained firmware forupcoming hardware models, the installer reported errors similar to the following. Inaddition, although installation was successful, an error message indicating that theinstallation had failed appeared on the console and in the isi_dsptool.logfile:

- ERROR: Found 2 error messages in isi_dsptool logfile- ERROR Gconfig parse warnings for file /dsp_staging/config/models/HGST_HUS726060ALA640.gc; dropping unrecognized entries- ERROR Gconfig parse warnings for file /ifs/.ifsvar/modules/hardware/drives/config/models/HGST_HUS726060ALA640.gc.2; dropping unrecognized entries DSP Install Failed

Note

These messages are now logged in the isi_dsptool.log file as warnings.

127958

The CTO upgrade process did not complete on clusters in compliance mode. 126895

If you added a node to a cluster that was configured to synchronize its time with anexternal NTP server, the cluster would sometimes synchronize its time with thenode that was added. As a result, the cluster time might have been so differentfrom the time on the NTP server that the cluster would not automatically correctitself.

126652

Unprovisioned nodes could not be added to a manual node pool. If you addednodes in your cluster to one or more manual node pools, and then attempted toadd one or two nodes to the cluster, OneFS would not be able to add those nodes

126363

Resolved issues



to a node pool, and so those nodes would be unprovisioned, and you would beunable to add those unprovisioned nodes to the manual node pools.

A100 nodes reported warning-level sensor errors when a power supply wasremoved or failed, rather than reporting a critical-level redundant power supplyfailure.

126321

If you remove a power cable from the Power Supply Unit (PSU) on an A100 node,the isi_hw_status command incorrectly displayed the following output:

Power Supplies OK

126240

After a drive-down operation completed, nodes would sometimes panic with thefollowing error message:

Fatal trap 12: page fault while in kernel mode

126239

If you shut down a node with a failed boot drive, the node would sometimes stopresponding during the shut down process because the journal for the node couldnot be saved to the failed boot drive.

126219

If the InfiniBand card in a node ran out of memory, the affected node might havebeen disconnected from the cluster.

124325

HDFS


Due to an issue in the Hadoop Distributed File System (HDFS) code, HDFS-1497, thesequence numbers assigned to HDFS data packets were not always consecutive. IfOneFS received an HDFS data packet with a sequence number that was notconsecutive, the affected HDFS client connection was closed.

127983

Job engine


The isi_job_d process might fail, causing jobs to briefly pause and then resume. 136028

Data reliability issues could occur after the job engine ran Collect or MultiScanjobs.

132695,132696,132697,132698

Job engine logging always ran at trace level, a level used to gather detailedinformation about job engine processes. As a result, job engine performance wasadversely affected and the job engine log file, isi_job_d.log, was

unnecessarily flooded with messages.

132895

If the cluster was being monitored by InsightIQ, the FSA job could fail with thefollowing error message:

[TRACE_SQL_ERRS]: database is locked

130999

Resolved issues

HDFS 121


If you ran the FlexProtect job while the impact of the job was set to high, nodes thatcontained SSD drives would sometimes panic.

129445

When smartfailing a drive with very little data on it, a FlexProtect or FlexProtectLinjob could pause in phase 2 for as long as two hours, causing the job to becancelled by the system. Until a FlexProtect or FlexProtectLin job successfullycompleted, no other jobs could run. In addition, the cluster could fall below theconfigured protection level.

129349

In rare cases, restriping a 4 TB file could cause a node to panic. 126675

After creating a custom job policy with isi job policy create, the values for

the job impact policies were incorrectly set and jobs could not be run. Errors similarto the following appeared:

Parse warnings from defaults: Multiple errors: Repeated disk record: old={ivar:impact.policies {token:0, version:1, flags:---I---} = (read: write:)} new={ivar:impact.policies {token:0, version:1, flags:---I---} = (read: write:)} Repeated disk record: old={ivar:impact.policies {token:0, version:1, flags:---I---} = (read: write:)} new={ivar:impact.policies {token:0, version:1, flags:---I---} = (read:HIGH write:HIGH)}

125544

If you upgraded your cluster from OneFS 6.5.5, you could not modify job impactpolicies.

125543

Migration


The isi_vol_copy_vnx utility did not properly handle new files that were added tonew directories between incremental copies. As a result, incremental copies failed.

131728

Networking


Some changes to VLAN tagging pools, such as adding a Network Interface Card(NIC) or rebalancing dynamic IPs, caused the Smartconnect process to stopresponding to DNS queries until the cluster was rebooted or until the isi_dnsiq_dservice was restarted.

132022

Due to issues in the failover code path in the Sockets Direct Protocol (SDP), failoverto the backup InfiniBand (IB) fabric could fail. If failover was unsuccessful, theIsilon cluster was unavailable until the IB switches were rebooted. The potential forencountering these issues was limited, but the potential increased in proportion tothe number of nodes in the cluster.

131544

Under some conditions, the flx_conf.xml file could not be accessed

immediately after a group change occurred on a cluster. If this issue wasencountered, the SmartConnect process, isi_dnsiq_d, unexpectedly restarted on

130702

Resolved issues



one or more nodes and the following lines appeared in the /var/log/messagesfile of the affected nodes:

/usr/lib/libisi_flexnet.so.1:flx_config_get_kevent+0x13/usr/sbin/isi_dnsiq_d:update_flx_config_kevent+0x25/usr/sbin/isi_dnsiq_d:realloc_ips+0xf9/usr/sbin/isi_dnsiq_d:main+0xbf0/usr/sbin/isi_dnsiq_d:_start+0x8c

If you ran the isi networks support sc_put command to manually assign

a dynamic IP address to an interface on a specific node, the command failed and aFAILED ASSERTION message similar to the following appeared:

FAILED ASSERTION config->_lock_state == O_EXLOCK @ /build/mnt/src/isilon/lib/isi_flexnet/flexnet.c:2309

130652

If the SmartConnect service IP address was the only IP address assigned to theexternal network interface of a node, Flexnet would not populate the subnetgateway. As a result, the affected node did not respond to DNS queries.

130642

If a static route was assigned to a storage pool (by running the isi networksmodify pool command with the --add-static-routes option), Flexnet

checked each node in the pool for UP interfaces to which the route could beassigned. If Flexnet did not detect any UP interfaces, the following informationalmessage was sometimes repeatedly logged in the isi_flexnet_d.log file:

isi_flexnet_d[8079]: No available UP interfaces for route:

Note

This message will now be logged only if the Flexnet logging level is set to debug or

higher.

130343

SmartConnect did not pause for 10 seconds between rebalance operations andthus rebalanced IP addresses more frequently than necessary.

130318

If you added a static route and incorrectly set the gateway, the affected nodesometimes became unresponsive and the OneFS software watchdog rebooted thenode.

Note

The following IP addresses should not be assigned to the external gateway on anode:

l An IP address that falls within the cluster's internal network IP range

l An IP address that is assigned to a node

l An unreachable IP address

l The broadcast IP address

130263

After an IB switch was rebooted, the FlexNet process running on each nodeupdated the flx_config.xml file, causing the SmartConnect process to lock the

file. As a result, SmartConnect would fail to respond to new DNS requests for up totwo minutes on large clusters.

125193

Resolved issues

Networking 123


If an administrator added a static route that would send traffic across differentinterface types using the IP address of a node as the route destination, the affectednode rebooted unexpectedly.

88072

NFS


The isi nfs exports list command sometimes timed out, preventing users

from viewing or configuring NFS exports from the command-line Interface or theOneFS web administration interface.

130270

If an inherit_only Access Control Entry (ACE) was applied to the owner of a file, andthe Access Control List (ACL) was modified, the inherit_only ACE was mapped to theNFSv4entry OWNER@. If the OWNER@ entry was subsequently remapped, the entrywas re-mapped to creator_owner rather than the original owner of the file, whichcould prevent the original owner from accessing the file.

130253

If either of the following conditions existed, the lockd process would stopresponding, preventing some NFS clients from accessing files on the clusterbecause they could not be granted file locks:

l A lock was granted to an NFS client while the client was unregistering from theLKF system.

l The isi_classic nfs client rm command was run while there were

several lock waiters on the NFS client.

129900

If a cluster received NLM requests that included the AUTH_NONE credential, OneFSwould return a locking error instead of the correct error message.

125482

During the OneFS boot process, a race condition prevented some sysctl parametersthat were required for NFS Kerberos authentication from being read. This issuecaused Kerberos authentication to be unavailable to NFS clients; if this issueoccurred, messages similar to the following were logged in the nfs.log file:

Kerberos not available: gss_acquire_cred: Key table entry not found;

125479

If an NFS client sent lock requests with security type AUTH_NONE, the clientreceived an incorrect error message that did not indicate the reason for failure.

123567

If the cluster was configured with the overcommit limit below the low and highsettings (which is an invalid configuration), nodes could run out of memory andunexpectedly reboot.

116133



In the OneFS web administration interface, on the Snapshot Schedules page,

after clicking View details to view detailed information about a snapshotschedule, some of the details were not displayed or the details were displayedafter a noticeable delay.

130720

Resolved issues



On the Cluster Management > Access Management > LDAP page in the OneFS

web administration interface, if the length of the Bind to value exceeded the widthof the page , the corresponding edit link was not available.

130336

The OneFS web administration interface was not accessible to clients usingMicrosoft Internet Explorer 8, 9 , or 10 in compatibility view. In addition, if a clientattempted to access the web administration interface using Internet Explorer incompatibility view, the IE console displayed the following error: :

SCRIPT1028: Expected identifier, string or number all-classes.js?70100b00000002a, line 13717 character 5

119315

You could not set a netmask of 0.0.0.0 through the OneFS web administrationInterface.

96604

SmartLock


If a file was committed to a WORM directory through the RESTful Namespace API,the file permissions were altered and, as a result, the file was accessible toeveryone.

130319

On clusters running in compliance mode, the compadmin user did not have accessto core files that were created when system processes stopped running. Thisprevented the compadmin user from analyzing the cause of a failure if a systemprocess unexpectedly restarted. This also prevented the compadmin user fromdeleting the files.

130284

If a cluster was running in SmartLock compliance mode, you could not renew theSSL certificate of the Isilon web administration interface.

128443

The CTO upgrade process did not complete on clusters in compliance mode. 118428

SmartQuotas


When a soft quota was modified, if the --soft-grace option was modified but

the --soft-threshold option was not modified, the command-line interface

ignored the configuration change.

130640

SMB


Because OneFS relied on a function that could handle only file descriptors with amaximum value of 1024, the lsass process unexpectedly restarted when itattempted to process file descriptors assigned a value higher than 1024. As aresult, SMB users could not be authenticated for the few seconds it took for theprocess to restart.

132043

Resolved issues

SmartLock 125


While the lwio process was handling a symbolic link (a file that acts as a referenceto another file or directory) a memory allocation issue could occur in the lwioprocess. If this issue was encountered, the lwio process unexpectedly restartedand SMB clients that were connected to the affected node were disconnected.

131751

While executing a zero-copy system call, the lwio process could attempt to accessmemory that was previously released to the system (also known as freed memory).If the lwio process attempted to access freed memory, the lwio processunexpectedly restarted and SMB clients that were connected to the affected nodewere disconnected.

131748

The lwio process sometimes attempted to read data from a socket connection thatwas not ready to be read from. If this occurred, the lwio process unexpectedlyrestarted and the following ASSERTION FAILED message appeared in the

lwiod.log file:

[lwio] ASSERTION FAILED: Expression = (pConnection->readerState.pRequestPacket->bufferUsed <= (maxHeader +sizeof(NETBIOS_HEADER)))

131745

Under some circumstances, the lwio process reported the length of a file name inbytes when a different value type was expected. As a result, the lwio processattempted to access memory that wasn't allocated to it, causing the lwio process tocrash. If the lwio process crashed, SMB clients that were connected to the affectednode were disconnected.

131711

If an SMB2 client experienced connection issues at the same time that it attemptedto place a lease on a file, a race condition could occur that resulted in the clientbeing disconnected from the cluster.

131681

Under rare circumstances, if a subprocess of the lwio process opened a new filehandle on an existing lease at the same time that another subprocess was breakingthe lease, the lwio process unexpectedly restarted. If the lwio process restarted,SMB clients that were connected to the affected node were disconnected.

131586

When upgrading to OneFS 7.1.1.0, if any share names contained an invalidcharacter (for example, a bracket, colon, asterisk, or question mark), or if a sharepath did not start with /ifs, the SMB configuration could not be upgraded. In

addition, no SMB shares would be visible after the cluster was upgraded and SMBclients could not connect to the cluster until the invalid shares were removed andthe SMB configuration was successfully upgraded.

Note

In OneFS 6.0 and earlier, an SMB share name could contain invalid characters andshares could be created outside of the /ifs directory (an invalid share

configuration). On an upgrade to OneFS 6.5 through 7.0, an SMB shareconfiguration that contained shares with an invalid character or share paths thatdid not start with /ifs could be successfully upgraded; however, the invalid

shares were inaccessible. Although the shares were inaccessible in OneFS 6.5 andlater, the existence of these shares could adversely affect upgrades to OneFS7.1.1.0.

131364

Under some circumstances, the lwio process reported the length of a file name inbytes when a different value type was expected. As a result, the lwio processattempted to access memory that wasn't allocated to it, causing the lwio process to

130641

Resolved issues



crash. If the lwio process crashed, SMB clients that were connected to the affectednode were disconnected.

While a network socket was being closed, contention between process threadscould cause data structures referencing the socket to be prematurely freed. If thefreed structures were then accessed by another thread, the lwio processunexpectedly restarted and SMB clients connected to the affected node weredisconnected.

130353

In environments with more than 12,000 SMB shares, the isi_webui_d processsometimes ran out of memory and stopped running . If the isi_webui_d processstopped running, the OneFS web administration interface was unavailable until theprocess restarted and existing connections to the web administration interfacewere unresponsive or were disconnected.

Note

12,000 SMB shares exceeds the maximum number of shares supported by OneFS.

130267

A work item could be scheduled in SRV and then freed before it could run. As aresult, crashes could occur.

130132

Because some SMB2 functions used an incorrect value type to manage SMBmessage sequence numbers, SMB sometimes incorrectly returned aSTATUS_INVALID_PARAMETER error in response to SMB2 client requests. If

the STATUS_INVALID_PARAMETER error was returned, the affected SMB

client was disconnected from the cluster.

Note

Sequence numbers associate requests with responses and determine whatrequests are allowed for processing.

130130

If an SMB1 session setup request and the Tree Connect process simultaneouslyattempted to access the security context in-memory object, a race condition wouldoccur that stopped the lwio process and closed connections to the node.

130032

If a user requested access to a file to which they had write access and the file waslocated in a share to which they had Read-Only access, the user might beincorrectly denied access to the file if the create disposition of the request wasFILE_OPEN_IF.

Note

The create disposition of the file specifies the action the system will take inresponse to a request to access a file, based on whether the requested file does ordoes not already exist.

130030

SMB2 message IDs larger than 64kb in size were incorrectly displayed as zero,which caused Active Directory domain controller connections to be reset.

130021

When snapshot data was queried with the SMB2_FIND_ID_BOTH_DIRECTORY_INFOcommand after being deleted, the system incorrectly reported the file as found and"." as the filename.

130010

Resolved issues

SMB 127


Clients connected to the cluster through SMB were disconnected if the lwiodprocess crashed. When the process crashed, the following lines were logged in thestack trace:

/lib/libthr.so.3:pthread_rwlock_init+0x117 /usr/likewise/lib/lwio-driver/srv.so: SrvConnection2SetInvalidEx +0x22/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolTransport1DriverSendDone +0x6e /usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTaskWrite+0x2dc /usr/likewise/lib/lwio-driver/srv.so: SrvSocketProcessTask+0x3d0 /usr/likewise/lib/liblwbase.so.0:EventThread+0x333 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec /lib/libthr.so.3:_pthread_getprio+0x15d

130001

If auditing was enabled and a directory was accessed, the isDirectory flag wassometimes incorrectly set to false. As a result, the audit log incorrectly indicatedthat the item accessed was a file rather than a directory.

129455

A race condition would occur wherein an SMB1 session setup request and the TreeConnect process simultaneously tried to access the security context in-memoryobject. As a result, the lwio process would stop, and existing connections to thenode would close.

128076

The DC connection would reset when the MessageID wrapped to zero at 64 KB,although it should have continued incrementing up to 0xFFFFFFFF (64 bits).

127778

If you created a snapshot of a directory through the OneFS web administrationinterface, deleted the files, and then attempted to restore those files and folders in

Windows through the Restore Previous Versions option by right clicking thedirectory, the deleted files were never restored to that directory.

127010

If there is was a mismatch between share names stored in memory and sharenames stored in the registry, an assert would sometimes occur and lwio mightrestart unexpectedly with a signal 6 error.

127005

Applications that required a volume label could not establish a connection to anSMB share on the cluster.

126496

You could not create a share for a path that didn't exist through MMC. If you didthis, you could view the share through the OneFS web administration interface.However, you could not access the share, because the path did not exist.

125888

When the number of file system event snapshots exceeded the amount of spaceallocated by the OnefsEnumerateSnapshots buffer, the lwio process restarted onvarious nodes, causing clients to be disconnected and then reconnected to thecluster.

125570

If OneFS received an SMB request that contained a filepath, OneFS would convertany forward slashes (/) to backslashes (\) before processing the request. This wascontrary to SMB standards, which specify that requests containing file paths thatinclude forward slashes return a STATUS_OBJECT_NAME_INVALID error.

125566

If a user had permission to access a shared directory, but the user was not grantedaccess to the parent directory that contained the shared directory, the user couldnot rename files or folders contained in the shared directory.

125036

Resolved issues



Clients connected to the cluster over SMB were disconnected when the lwioprocess crashed. When the process crashed, the following lines were logged inthe /var/log/messages file:

/lib/libthr.so.3:pthread_rwlock_init+0x117 /usr/likewise/lib/lwio-driver/srv.so: SrvConnection2SetInvalidEx +0x22/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolTransport1DriverSendDone +0x6e /usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTaskWrite+0x2dc /usr/likewise/lib/lwio-driver/srv.so: SrvSocketProcessTask+0x3d0 /usr/likewise/lib/liblwbase.so.0:EventThread+0x333 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec /lib/libthr.so.3:_pthread_getprio+0x15d

124981

If a client sent an oplock or lease break acknowledgment for an oplock or lease thatwas never requested, a crash would occur with the following stack trace:

/boot/kernel.amd64/kernel: /lib/libc.so.7:thr_kill+0xc/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.0:__LwRtlAssertFailed+0x5a/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvPrepareOplockStateAsync_SMB_V2+0x57/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvOplockBeginPolling+0x36/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvExecContextContinue2+0x1c7/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecute2+0xdf/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/srv.so:SrvExecuteCreateAsyncCB_SMB_V2+0x6a/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IopIrpCompleteInternal+0x324/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x8c4/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.0:IoIrpComplete+0x33</msgblock> /boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/onefs.so:OnefsCompleteIrpContext+0xa9/boot/kernel.amd64/kernel: /usr/likewise/lib/boot/kernel.amd64/kernel: /lwio-driver/onefs.so:OnefsProcessIrpContext+0x18b/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d

123747

If OneFS received a request from an SMB client whose Kerberos service ticket couldnot be decrypted, OneFS returned a STATUS_LOGON_FAILURE response to

the SMB client that sent the request. If this response was sent, the affected SMBclient might have experienced issues accessing files or applications that werestored on the cluster.

114524

Resolved issues

SMB 129


Note

In OneFS 7.2.0.0 and later, if OneFS receives a request from an SMB client whoseKerberos service ticket cannot be decrypted, aSTATUS_MORE_PROCESSING_REQUIRED response is returned. This

response prompts the affected SMB client to search for a secondary cluster. If thesearch for a secondary cluster fails, the affected SMB client could still experienceissues accessing files or applications on the cluster.


Upgrade and installation issues resolved in OneFS 7.2.0.0 ID

Ff all of the following conditions were met and you if upgraded to OneFS 7.0 orlater, the SMB configuration was not successfully upgraded and one or moreservices were sometimes disrupted following the upgrade:

l The cluster was being upgraded from OneFS 6.5.0 or earlier.

l The file path to one or more SMB shares on the cluster contained a multibytecharacter.

130266

The upgrade process might detect events that do not appear in the OneFS webadministration interface or the output of the isi events list command.

Because these events are older than 30 days, they are not displayed by default.

125551

During a OneFS upgrade, the crontab file was not updated with data from thecrontab.smbtime file. As a result, crontab overrides that were configured before

the upgrade were not applied after the cluster was upgraded.

125550

Virtual plug-ins

Virtual plug-ins ssues resolved in OneFS 7.2.0.0 ID

Due to a capacity checking error, if you created a new cluster through the OneFSSimulator on Windows, Windows VMs, or VMWare Fusion workstations, the clusterfailed to mount /ifs and the following error message appeared:

IFS failed to mount. Aborting boot. Please contact Isilon Customer Support.

133546

Resolved issues


CHAPTER 6

Isilon ETAs and ESAs related to this release

The following section provides a list of EMC Technical Advisories (ETAs) and EMC SecurityAdvisories that describe issues that affect the latest 7.2.0 release or previous 7.2.0releases.

For the most up-to-date list of Isilon ETAs and ESAs, see the Notifications section of the Isilon Uptime Info Hub on the EMC Isilon Community Network site. You can also subscribeto receive ETAs and ESAs related to OneFS via email by visiting the EMC Isilon OneFSproduct page on the EMC Isilon Support site and clicking the Manage AdvisorySubscriptions link under Advisories.

l ETAs related to OneFS 7.2.0................................................................................ 132l ESAs related to OneFS 7.2.0................................................................................ 133

Isilon ETAs and ESAs related to this release 131


https://community.emc.com/community/products/isilon

https://support.emc.com/products/15209_Isilon-OneFS


ETAs related to OneFS 7.2.0Functionalarea

ETA Description Status ID

Authentication 199379 If Microsoft Security Bulletin MS15-027was installed on a Microsoft ActiveDirectory server that authenticated SMBclients that were accessing an Isiloncluster, and if the server used theNTLMSSP challenge-response protocol,the SMB clients could not beauthenticated.

Resolvedin OneFS7.2.0.2

147221

Backup,recovery, andsnapshots

203815 If you used the snapshot-basedincremental backup feature during abackup operation and multiplesnapshots were created betweenbackups, the feature might have failed torecognize that data had changed duringthe backup procedure. As a result, somechanged files were not backed up.


154269

File system 202452 If a node ran for more than 497consecutive days without beingrebooted, an issue that affected theOneFS journal buffer sometimesdisrupted the drive sync operation. If thisissue occurred, OneFS reported that thejournal was full, and, as a result,resources that were waiting for aresponse from the journal entered adeadlock condition. In addition, clustersthat contained a node that ran for morethan 497 consecutive days with nodowntime could have unexpectedlyrebooted as a result of this issue.


158417

Hardware 198924 If a drive in an HD400 node was replacedwhile the drive was in the process ofbeing smartfailed, and if the node thatcontained the replaced drive wasrebooted before the smartfail processwas complete, the affected node failed tomount the /ifs partition.


142946

NFS 197460 Under specific conditions, a user withappropriate POSIX permissions wasdenied access to modify a file.


141210

NFS 204898 Although the correct ACLS were assignedto a file—for example, std_delete ormodify— NFSv3 and NFSv4 clients couldnot delete, edit, or move the file unlessthe delete_child permission was set onthe parent directory.


149743









Functionalarea

ETA Description Status ID

NFS 205085 Because OneFS 7.2.0 and later returned64-bit NFS cookies, some older, 32-bitNFS clients were unable to correctlyhandle read directory (readdir) andextended read directory (readdirplus)responses from OneFS. In some cases,the affected 32-bit clients becameunresponsive, and in other cases, theclients could not view all of thedirectories in an NFS export. In the lattercases, the client could typically view thecurrent directory (".") and its parentdirectory ("..").


153737

Networking 200096 If the cluster contained X410, S210, orHD400 nodes that had BXE 10 GigE NICcards and any external network subnetsconnected to the cluster were set to 9000MTU, the affected nodes rebooted, andan error similar to the following appearedin the /var/log/messages file :

ERROR: mbuf alloc fail for fp[01] rx chain (55)


152083,148695

SMB 198187 If a Windows client that was connected tothe cluster through SMB copied a filefrom the cluster, the timestampmetadata applied to the file might havebecome invalid.


142313

ESAs related to OneFS 7.2.0ESA Description Status ID

ESA-2015-154 The network time protocol (NTP)service was updated to version4.2.8P1.

Resolved in OneFS7.2.0.4

154655

ESA-2015-114 The version of Apache that isinstalled on the cluster was updatedto version 2.2.29.


136994

ESA-2015-112 User input that is passed to acommand line is now escaped usingquotation marks.


140931

ESA-2015-093 An update was applied to address adenial of service vulnerability inApache HTTP Server.


137884


ESAs related to OneFS 7.2.0 133








ESA Description Status ID

ESA-2014-146 The version of GNU bash installed onthe cluster was updated to version4.1.17.


143337

ESA-2015-015 Because SSL v3 was vulnerable tosome man-in-the-middle attacks.Support for SSL v3 for HTTPSconnections to the cluster wasremoved beginning in OneFS7.2.0.1.


137904

ESA-2015-038 The version of ConnectEMC installedon the cluster was updated fromversion 3.2.0.4 to 3.2.0.6. Thisupgrade changes the behavior of theConnectEMC component so that itno longer uses an internal version ofOpenSSL and instead relies on theversion of OpenSSL installed on theIsilon cluster.


134760






CHAPTER 7

OneFS patches included in this release

The following section provides a list of patches that address issues that are now fixed inOneFS. If you previously installed one or more of the listed patches, and you upgrade to arelease that includes the fix for the issue the patch addressed, you do not need toreinstall those patches after you upgrade.

After upgrading, see Current Isilon OneFS Patches on the EMC Online Support site to findout if any new patches were released that might apply to the version of OneFS youupgraded to.

l Patches included in OneFS 7.2.0.4......................................................................136l Patches included in OneFS 7.2.0.3 (Target Code)................................................ 136l Patches included in OneFS 7.2.0.2......................................................................137l Patches included in OneFS 7.2.0.1......................................................................139

OneFS patches included in this release 135


Patches included in OneFS 7.2.0.4Functionalarea

Description Patch ID

Authentication Functionality change:Users that attempt to connect to the cluster over SSH,through the OneFS API, or through a serial cable, can nolonger be authenticated on clusters running in compliancemode if any of the following identifiers are assigned to theuser as either the user's primary ID or as a supplemental ID:

UID: 0

SID: S-1-22-1-0

patch-156748

HDFS This patch addresses multiple issues that affect the HDFSprotocol. For more information about the issues addressedby this patch, review the patch README.

patch-159065

HDFS Adds 1.7.0_IBM HDFS to the list of supported Ambari servers. patch-157202

NFS This patch addresses multiple issues that affect the NFSprotocol. For more information about the issues addressedby this patch, review the patch README.

patch-158509

NFS This patch addresses multiple issues that affect the NFSprotocol. For more information about the issues addressedby this patch, review the patch README.

patch-156230

SMB This patch addresses multiple issues that affect SMB2symbolic links. For more information about the issuesaddressed by this patch, review the patch README.

patch-154603

Patches included in OneFS 7.2.0.3 (Target Code)Functionalarea


Events, alerts,and clustermonitoring

If you ran the isi statistics client or isistatistics heat command with the --csv option, the

following error appeared instead of the statistics data:

unsupported operand type(s) for %: 'NoneType' and 'tuple'

patch-153659

Job engine If a MediaScan job detected an ECC error in a file's data, thejob did not properly restripe the file away from the ECC error.As a result, the file was underprotected, and was at risk fordata loss if further damage occurred to the data—forexample, if a device containing a copy of the data failed. If

patch-156835



Functionalarea


this issue occurred, a message similar to the followingappeared in the /var/log/isi_job_d.log file:

mark_lin_for_repair:1331: Marking forrepair: 1:0001:0003::HEAD

NFS Because NFSv3 Kerberos authentication requires all NFSprocedure calls to use RPCSEC_GSS authentication, someolder Linux clients—for example, RHEL 5 clients—that startedthe FSINFO procedure call with AUTH_NULL authenticationbefore attempting the FSINFO procedure call withRPCSEC_GSS authentication, were prevented from mountingan NFS export if the export was configured with the KerberosV5 (krb5) security type. Newer clients that started the FSINFOprocedure call with RPCSEC_GSS were not affected.

patch-151610

SMB Due to a file descriptor (FD) leak that occurred when SMBclients listed files and directories within an SMB share, it waspossible for OneFS to eventually run out of available filedescriptors. If this occurred, an ACCESS_DENIED orSTATUS_TOO_MANY_OPENED_FILES response was sent toSMB clients that attempted to establish a new connection tothe cluster or SMB clients that were connected to the clusterthat attempted to view or open files. As a result, new SMBconnections could not be established and SMB clients thatwere connected to the cluster could not view, list, or openfiles. If this issue occurred, messages similar to the following

appeared on the Dashboard > Event summary page of theOneFS web administration interface, and in the command-line interface when you ran the isi events list -w |grep -i descriptor command:

System is running out of file descriptors

In addition, messages similar to the following appeared inthe /var/log/ lwiod.log file:

Could not create socket: Too many open filesFailed to accept connection due to too many open files

patch-154168

Patches included in OneFS 7.2.0.2Functional area Description Patch ID

Authentication If a cluster that was joined to an Active Directory (AD)domain was also configured with an IPv6 subnet, andif the AD domain controller was configured to use anIPv6 address, the netlogon process on the clusterrepeatedly restarted and members of the WindowsAD domain could not be authenticated to the cluster.

patch-143372


Patches included in OneFS 7.2.0.2 137

Functional area Description Patch ID

If the netllogon process restarted as a result of thisissue, Windows clients might have received anAccess Denied error when attempting to access

SMB shares on the cluster, or they might havereceived a Logon failure: unknown username or bad password message when

attempting to log on to the cluster. In addition, thefollowing lines appeared in the /var/log/messages file:

Stack: -------------------------------------------------- /lib/libc.so.7:thr_kill+0xc /lib/libc.so.7:__assert+0x35 /usr/likewise/lib/libnetlogon_isidcchooser.so:IsiDCChooseDc+0xbb3 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetChooseDc+0x27 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvPingCLdapArray+0x1187 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCNameDiscoverInternal+0x72a /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCNameDiscover+0x111 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvGetDCName+0xb20 /usr/likewise/lib/lw-svcm/netlogon.so:LWNetSrvIpcGetDCName+0x4f /usr/likewise/lib/liblwmsg.so.0:lwmsg_peer_assoc_call_worker+0x20 /usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16 /usr/likewise/lib/liblwbase.so.0:WorkThread+0x256 /usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec /lib/libthr.so.3:_pthread_getprio+0x15d --------------------------------------------------

NFS If you ran a command from an NFSv3 or NFSv4 clientto query for files or directories in an empty folder,and if you included the asterisk (*) or question mark(?) characters in the command, the query failed andan error message appeared on the console. Forexample, if you ran the ls * command, the

command failed and the following error appeared onthe console:

ls: cannot access *: Too many levels of symbolic links

patch-142630

SMB If Microsoft Security Bulletin MS15-027 was installedon a Microsoft Active Directory server thatauthenticated SMB clients that were accessing anIsilon cluster, and if the server used the NTLMSSPchallenge-response protocol, the SMB clients could

patch-147684




not be authenticated. As a result, SMB clients couldnot access data on the cluster.For more information, see article 199379 on the EMCOnline Support site.

SMB If Microsoft Security Bulletin MS15-027 was installedon a Microsoft Active Directory server thatauthenticated SMB clients that were accessing anIsilon cluster, and if the server used the NTLMSSPchallenge-response protocol, the SMB clients couldnot be authenticated. As a result, SMB clients couldnot access data on the cluster.This patches is deprecated by patch-147684.

patch-145051

SMB If the SMB2Symlinks option was disabled on the

cluster and a Windows client navigated to a symboliclink that pointed to a directory, under somecircumstances, the system returned incorrectinformation about the symbolic link. If this occurred,the symbolic link appeared to be a file, and thereferenced directory could not be opened.In addition, because OneFS 7.2.0.1 did notconsistently check the OneFS registry to verifywhether the SMB2Symlinks option was disabled, in

some cases, although the SMB2Symlinks option

was disabled, the lwio process attempted to handlesymbolic links when it should have allowed them tobe processed by the OneFS file system. If thisoccurred, the following error appeared on the client:


patch-143767

Security Updates the version of GNU bash installed on thecluster to version 4.1.17.For more information, see ESA-2014-146 on the EMCOnline Support site.

patch-139164

Patches included in OneFS 7.2.0.1Functional area Description Patch ID

Networking If different nodes in a cluster were connected todifferent network subnets, and if those subnets wereassigned to different Active Directory sites, the siteconfiguration information on the cluster wasrepeatedly updated. Because updates to the siteconfiguration information require a refresh of thelsass service, this behavior caused authenticationservices to become slow or unresponsive.

Patch-138767


Patches included in OneFS 7.2.0.1 139




NFS If all of the following factors were true, a user withappropriate POSIX permissions was denied access tomodify a file:

l The user was connected to the cluster throughNFSv3.

l The user was a member of a group that wasgranted read-write access to the file throughPOSIX mode bit permissions. For example, -rwxrwxr-x (775).

l The user was not the owner of the file.

Depending on how the file was accessed, errorssimilar to the following might have appeared on theconsole:

Operation not permitted.

or

Permission denied.

Patch-141322

SMB If a Windows client that was connected to the clusterthrough SMB copied a file from the cluster, thetimestamp metadata applied to the file might havebecome invalid. This issue occurred because OneFSdid not properly interpret the value assigned to afile's timestamp metadata if the value was set to -1,which is a valid value. Work flows that rely ontimestamp metadata might have been negativelyaffected by this issue.

Note

The SMB protocol specifies that, when file attributesare set, a value of -1 indicates that the attribute inthe corresponding field must not be changed.

Patch-142418



CHAPTER 8

Known issues

Unless otherwise noted, the following issues are known to affect OneFS 7.2.0.0 throughOneFS 7.2.0.4.

l Target Code known issues...................................................................................142l Antivirus............................................................................................................. 142l Authentication.................................................................................................... 142l Backup, recovery, and snapshots ....................................................................... 143l Cluster configuration...........................................................................................145l Command-line interface...................................................................................... 146l Diagnostic tools.................................................................................................. 146l Events, alerts, and cluster monitoring................................................................. 146l File system.......................................................................................................... 149l File transfer......................................................................................................... 151l Hardware............................................................................................................ 151l HDFS...................................................................................................................153l iSCSI................................................................................................................... 154l Job engine...........................................................................................................154l Migration............................................................................................................ 156l Networking..........................................................................................................156l NFS..................................................................................................................... 157l OneFS API........................................................................................................... 159l OneFS web administration interface.................................................................... 160l Security...............................................................................................................160l SmartQuotas.......................................................................................................161l SMB.................................................................................................................... 161l Upgrade and installation.....................................................................................162l Virtual plug-ins....................................................................................................163

Known issues 141

Target Code known issuesThe issues in the following table are known to affect OneFS 7.2.0.3 (Target Code). Inaddition, unless otherwise noted, issues that are resolved in OneFS 7.2.0.4 and later areknown to affect OneFS 7.2.0.3. For a complete list of issues that are known to affectOneFS 7.2.0.3, you should also review the OneFS 7.2.0.4 and later resolved issuestopics.

AntivirusAntivirus known issues ID

In the OneFS web administration interface, on the Antivirus Policies page, if you

double-click the Start link for a policy, multiple instances of the AVScan job start.

54477

AuthenticationAuthentication known issues ID

If there are files or directories on the cluster with ACLs that include SIDs for whichno corresponding UID is found—for example, ACLs that include SIDs for users thatwere deleted from an Active Directory (orphaned SIDs)—OneFS queries externalauthentication providers in an attempt to map the SID to an authoritative UID.If the attempt to map an orphaned SID to an authoritative UID fails, OneFScontinues to query external authentication providers for the missing UID, and, inenvironments where a large number of orphaned SIDs exist, the volume of queriessent to the providers might adversely affect the performance of the externalauthentication provider. If this occurs, users might be prevented from beingauthenticated to the cluster.

158867

If the alternate security identities attribute is enabled for an LDAP provider on acluster running OneFS 7.2.0.0 through 7.2.0.4, the lsass process fails to look upalternate security identities, and, as a result, the affected users cannot beauthenticated to the cluster.

Note

Enabling the alternate security Identity setting is not a typical configuration.

158243

In some cases, if the security mode of the SMB file sharing service is unchangedfrom the default configuration in OneFS 6.5.5.x, and if another SMB share setting—

for example, Change Notify—is also changed from the default setting in OneFS6.5.5, then, during an upgrade to OneFS 7.1.1.x, the Impersonate guest security

parameter is changed from Always to Never. If this issue occurs, following theupgrade, SMB clients might not be able to access shares on the cluster until theImpersonate Guest value for the share is manually set to Always.

154826

The OneFS SMB server might fail to respond to NetrWkstaGetInfo remote procedurecalls at info level 102 that a client—typically, embedded systems in a printer orscanner—might make when establishing a connection to the cluster. This couldcause the client to fail to establish a connection to the cluster.

134324

Known issues



Authentication known issues ID

The lwio process cannot rename files to a name longer than 255 bytes. 134304

Incorrect sequence numbers during SMB2 traffic could cause the lwsmd process tofail, resulting in a temporary loss of SMB service.

134247

The lsass process might fail while running the NtlmGetDomainNameFromResponsefunction due to an incorrectly formed request during NTLM authentication, resultingin a temporary loss of authentication service.

134239

The lsass process might fail while running the NtlmValidateResponse function dueto an incorrectly formed request during NTLM authentication, resulting in atemporary loss of authentication service.

134238

The lsass process might fail while running the AuthenticateNTLMv2 function due toan incorrectly formed request during NTLM authentication, resulting in a temporaryloss of authentication service.

134237

The lsass process might fail while running the NtlmGetCStringFromUnicodeBufferfunction due to an incorrectly formed request during NTLM authentication, resultingin a temporary loss of authentication service.

134236

When lsass is sequentially restarted across nodes in the cluster, the lwio processmight restart unexpectedly on a node, causing all SMB and NFS clients on thatnode to be disconnected. If this occurs, the following lines appear in the stacktrace:

/lib/libc.so.7:recvfrom+0xc0x807f0f2d5 (lookup_symbol: symtab/strtab not found:2)0x807f0f417 (lookup_symbol: symtab/strtab not found:2)0x807f0a819 (lookup_symbol: symtab/strtab not found:2)0x807f0ab44 (lookup_symbol: symtab/strtab not found:2)0x80bb2e474 (lookup_symbol: symtab/strtab not found:2)/lib/libthr.so.3:_pthread_getprio+0x15d

131835

When you run the isi auth local user modify command with the --password-never-expires option on one of the default services accounts, you

receive an Invalid Parameter error. For example, running the following

command attempts to set the password to never expire for the insightiq useraccount:

isi auth local users modify --name=insightiq --password-never-expires


83444

Backup, recovery, and snapshotsBackup, recovery, and snapshots known issues ID

If you configure a snapshot schedule expiration time by running the isisnapshot schedules command or through the OneFS web administration

interface, the expiration time is not always configured correctly and might notdisplay correctly in either interface. For example, if you configure a snapshot toexpire in 88 weeks, the web interface might display a snapshot expiration time of14953 Hours. In addition, the snapshot schedule expiration time that is displayedin the web interface or at the command-line interface might not accurately reflect

139186

Known issues



Backup, recovery, and snapshots known issues ID

the actual configured expiration time. This issue occurs because the expirationtime is interpreted differently by the web interface, the command-line interface,and the isi_papi_d process.Workaround: Use the following commands to configure snapshot scheduleexpiration times from the command-line interface. Values set by running thesecommands are not interpreted by the isi_papi_d process and are, therefore, notaffected by this issue:

isi_classic snapshot schedule createisi_classic snapshot schedule modifyFor more information about the preceding commands, run the followingcommands:

isi_classic snapshot schedule create –hisi_classic snapshot schedule modify –h

The isi_migr_sched process might fail while it is not possible to run replicationjobs, such as while a node is shutting down.

135744

Restoring an especially large amount of data (more than 50 TB), might fail due to amemory allocation error with the following error message:

<3.3> scdepot-4(id4) isi_ndmp_d[94807]: isi_ndmp_d: *** FAILED ASSERTION rb->datap @ /build/mnt/src/isilon/lib/isi_ndmpbrm2/fast_restore.c:1088: Failed to allocate memory - Cannot allocate memory

133591

Parallel NDMP restores might fail while the cluster is under heavy load. 130693

If you set the read-only DOS attribute to deny modification of files over both UNIX(NFS) and Windows File Sharing (SMB) on a target directory of a replication policy,the associated replication jobs will fail.

127652

While a Backup Accelerator is running multiple NDMP sessions, memoryexhaustion or a crash might occur, and false sessions might appear on the NDMPsession list.Workaround: Open an SSH connection on any node in the cluster, log in using theroot account, and run the following command:

rm -rf /ifs/.ifsvar/modules/ndmp/sessions/*

This will remove all stale files while retaining current sessions.

125897

The following message might appear in the /var/log/messages file:

isi_migrate[98488]: coord[cert2-long123-d0b]: Problem reading from socket of (null): Connection reset by peer

Workaround: Ignore the error message. This is a transient error that OneFS willrecover from automatically.

124767

Backing up large sparse files takes a very long time because OneFS must buildsparse maps for the files, and OneFS cannot back up data while building a map.OneFS might run out of memory while backing up a sparse file with a large numberof sparse regions.

124216

File list backups are not supported with dir/node file history format. 113999

Known issues


Backup, recovery, and snapshots known issues ID

The SyncIQ scheduler service applies UTF-8 encoding even if the cluster is set witha different encoding. As a result, DomainMark and SnapRevert jobs, which applycluster encoding, might fail to run.

99383

If you revert a snapshot that contains a SmartLock directory, the operation mightfail and leave the directory partially reverted.

99211

When SyncIQ and SmartQuotas domains overlap , a SyncIQ job might fail with oneof the following errors:

l Operation not permittedl unable to deletel failed to movel unable to renameFor more information, see article 88602 on the EMC Online Support site.

97492

If you are using the Comvault Simpana data management application (DMA), youcannot browse the backup if the data set has file names with non-ASCII characters.As a result, you cannot select single files to restore. Full restoration of the datasetis unaffected.For more information, see article 88714 on the EMC Online Support site.

96545

If you use SyncIQ to synchronize data and some data is freed on the source clusterbecause a file on the source decreased in size, the data is not freed on the targetcluster when the file is synchronized. As a result, the space consumed on the targetcluster might be greater than the space consumed on the source.

94614

SyncIQ allows a maximum of five jobs to run at a time. If a SnapRevert job startswhile five SyncIQ jobs are running, the SnapRevert job might appear to stopresponding instead of pausing until the SyncIQ job queue can accept the new job.

93061

After performing a successful NDMP backup that contains a large number of files(in the tens of millions), when you restore that backup using Symantec NetBackup,the operation fails and you receive the following error message:

error db_FLISTreceive: database system error 220


87092

Cluster configurationCluster configuration known issues ID

If a user is assigned only the ISI_PRIV_AUDIT privilege, the user can view the

controls to delete file pool policies on the File System > Storage Pools > FilePool Policies page.

134378

Known issues





Cluster configuration known issues ID

Note

Although the ISI_PRIV_AUDIT privilege does not allow a user to delete file poolpolicies, a user who is assigned the ISI_PRIV_AUDIT privilege can view the controls

to delete file pool policies on the File System > Storage Pools > File PoolPolicies page.

The isi_cpool_io_d process might fail while attempting to close a file, generating"bad file descriptor" errors in the log. This is due to leaving a stale descriptor forthe cache header.

132397

The command-line wizard requires a default gateway to set up a cluster. You maynot have a default gateway if your network uses a local DNS server.Workaround: Enter 0.0.0.0 for your default gateway.

24621

Command-line interfaceCommand-line interface known issues ID

If you run an isi command with the --help option to get more information about

the command, the text that is displayed might provide information about therelated isi_classic command instead of providing information about the

command that you typed. For example, if you run the isi storagepoolscommand with the --help option, the following information appears:

'isi_classic smartpools health' options are:--verbose, -v Print settings to be applied.

--help, -h Print usage help and exit

129637

The isi version osreldate command returns a random number rather than

the expected OneFS release date.

98452

Diagnostic toolsDiagnostic tools known issues ID

On the Gather Info page In the OneFS web administration interface, the GatherStatus progress bar indicates that the Gather Info process is complete while theprocess is still running.

103906

Events, alerts, and cluster monitoringEvents, alerts, and cluster monitoring known issues ID

If an NFS request specifies an inode rather than a file name, and more than onehard link to the specified inode exists, OneFS auditing will be unable to determine

136038

Known issues


Events, alerts, and cluster monitoring known issues ID

which hard link was intended by the NFS client. If this happens, OneFS auditingmight select the incorrect hard link, which can cause client permissions to bemisrepresented in audit logs.

The isi_papi_d process might fail while InsightIQ begins monitoring a cluster thatcontains 80 or more nodes.

135767

The isi_stats_hist_d process might fail when the cluster is under heavy load, withthe following lines in the stack trace:

/lib/libc.so.7:thr_kill+0xc/lib/libc.so.7:__assert+0x35/usr/sbin/isi_stats_hist_d:_ZN15stats_hist_ring4initEitb+0x506/usr/sbin/isi_stats_hist_d:_ZN10ring_cache3getEiiiiii+0x228/usr/sbin/isi_stats_hist_d:_ZN11db_mgr_impl5queryER20stats_timeseries_setP10stats_impltRK11query_timesRK14stats_hist_pol+0x33d/usr/sbin/isi_stats_hist_d:_ZN16database_manager5queryER20stats_timeseries_setP10stats_impltRK11query_timesRK14stats_hist_pol+0x28/usr/sbin/isi_stats_hist_d:_ZN20ecd_query_timeseries8query_dbEP10stats_impltRK11query_timesRK14stats_hist_pol+0x3d/usr/sbin/isi_stats_hist_d:_ZN20ecd_query_timeseries12proc_commandEl+0x56c/usr/sbin/isi_stats_hist_d:main+0xbcd/usr/sbin/isi_stats_hist_d:_start+0x8c

135641

The isi_celog_coalescer process fails when the garbage collector reaches acrossmultiple threads/connections and attempts to clear out what it deems asunreferenced.

132398

The SNMP daemon might restart after a drive is smartfailed and then replaced. 129711

If you have auditing with NFS enabled on your cluster, the NFS service might restartunexpectedly. If this occurs, lines similar to the following appear inthe /var/log/messages file:

Stack: --------------------------------------------------/usr/lib/libstdc++.so.6:_ZNSs6assignERKSs+0x1e/usr/lib/libisi_flt_audit.so.1:_init+0x3b60/usr/lib/libisi_flt_audit.so.1:_init+0x4092/usr/likewise/lib/libiomgr.so.0:IopFmIrpStateDispatchPostopExec+0x16a/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x74d/usr/likewise/lib/libiomgr.so.0:IopIrpDispatch+0x317/usr/likewise/lib/libiomgr.so.0:IopRenameFile+0x117/usr/likewise/lib/libiomgr.so.0:IoRenameFile+0x22/usr/lib/libisi_uktr.so.1+0x167873:0x8082f2873/usr/lib/libisi_uktr.so.1+0x194a17:0x80831fa17/usr/lib/libisi_uktr.so.1+0x18fc90:0x80831ac90/usr/lib/libisi_uktr.so.1+0x169b2c:0x8082f4b2c/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7--------------------------------------------------

Workaround: Disable auditing with NFS.

129098

When alert traffic is high, running the isi events quiet all command might

time out. As a result, some events might not be quieted and the following errormight be displayed:

Error marking events: Error while getting response from isi_celog_coalescer (timed out)

Workaround: Run the isi events quiet all command on the master node.

113689,112774

Known issues

Events, alerts, and cluster monitoring 147


If the email address list for an event notification rule is modified from the commandline, the existing list of email addresses is overwritten by the new email addresses.For more information, see article 88736 on the EMC Online Support site.

89086

Although SNMP requests can reference multiple object IDs, the OneFS subtreeresponds only to the first object ID.

81183

If you have a large number of LUNs active, the event processor might issue awarning about open file descriptors held by the iSCSI daemon.You can safely ignore this warning.

79341

On the Cluster Overview page of the OneFS web administration interface, clickingthe ID of a node that requires attention, as indicated by a yellow Status icon, doesnot provide details about the status.

Workaround: In the list of events, sort the nodes by the Scope column or by the

Severity column, and then click View details.

Alternatively, run the isi events list --nodes <id> command to view the

events.


77470

If you run the isi status command, the value displayed for the sum of per-node

throughput might differ from the value displayed for the sum of cluster throughput.This occurs because some data is briefly cached. The issue is temporary.Workaround: Re-run the isi status command.


73554

Reconfiguring aggregate interfaces can leave active events for inactive interfaces.Workaround: Cancel the events manually.

72200

Event system databases that store historical events might fail to upgrade correctly.If the databases fail to upgrade, they are replaced by an empty database with anew format and historical events are lost.

71840

A network interface that is configured as a standby without an IP address triggersan interface down event.Workaround: Quiet the event manually.

71399

Monitoring with SNMP, InsightIQ, or the isi statistics command can fail

when a cluster is heavily loaded.

68559

While a cluster processes a heavy I/O load, graphs in the OneFS webadministration interface might display the following message:

Warning: Unreliable Data

Workaround: Run the isi statistics command.

62736

When using Simple Network Management Protocol (SNMP) to report on aggregatedinterfaces, for example, LACP, LAGG, and fec, the interface speed is displayed as100 MB instead of 2 GB.For more information, see article 89363 on the EMC Online Support site.

55247

You might receive an alert that a temporary license is expired even though apermanent license is installed.

24504

Known issues







Workaround: Use the command-line interface or the web administration interfaceto quiet the alert

File systemFile system known issues ID

If you create or open Alternate Data Stream (ADS) with the Permission toDelete option enabled at open time, a memory resource leak on the virtual file

system can result. This might degrade overall cluster performance.

153312

If a dedupe job is running on a file that is also in the process of being deleted, theworkers for the job can be delayed long enough to generate a hangdump file. Thededupe job will continue afterwards. If this issue is encountered, messages similarto the following appear in the /var/log/messages file:

isi_hangdump: Lock timeout: 720.008538 fromefs.lin.lock.initiator.oldest_waiter isi_hangdump: LOCK TIMEOUT AT 1421800091UTC isi_hangdump: Hangdump timeout after 0 seconds:Received HUP isi_hangdump: Lock timeout: 725.018597 fromefs.lin.lock.initiator.oldest_waiter isi_hangdump: Lock timeout: 730.028656 fromefs.lin.lock.initiator.oldest_waiter isi_hangdump: Lock timeout: 735.038715 fromefs.lin.lock.initiator.oldest_waiter isi_hangdump: Lock timeout: 740.048774 fromefs.lin.lock.initiator.oldest_waiter isi_hangdump: END OF DUMP AT 1421800091UTC

141028

A node might fail to shut down or reboot if the shutdown process is unable to stopthe lwsm process in less than 2 minutes. If this issue occurs the following errorappears in the /var/log/messages file:

rc.shutdown: 120 second watchdog timeout expired. Shutdown terminated.

If you encounter this issue, wait 5 minutes and then try to reboot the node byrunning the reboot command. If the node fails to reboot, contact EMC Isilon

Technical Support for assistance.

140822

The lwio process might fail while a node is being shut down. 135869

The lwio process might fail while the cluster is under heavy load, causing clients tobecome disconnected. If this occurs, the following lines appear in the logs:

/lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0xb6/usr/likewise/lib/libiomgr.so.0:IopFmIrpStateDispatchFsdCleanupDone+0x26/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x36c/usr/lib/libisi_cpool_rdr.so:_Z16cprdr_pre_createP21_IO_FLT_CALLBACK_DATAP23_IO_FLT_RELATED_OBJECTSPPvPPFvS0_S3_ES4_+0x646/usr/lib/libisi_cpool_rdr.so:_Z19process_pre_op_itemP13_LW_WORK_ITEMPv+0x54

134343

Known issues

File system 149

File system known issues ID

/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/lib/libthr.so.3:_pthread_getprio+0x15d

During the upgrade process, an MCP error might prevent the last node on a clusterfrom upgrading and corrupt the /etc/mcp/mlist.xml file.

Workaround: Delete the /etc/mcp/mlist.xml file and restart MCP. MCP will

autogenerate a new mlist.xml.

133115

When processing a restart request, MCP service configuration scripts that call isiservices might result in a recursive service stop request, and this second

request might cause the MCP to simultaneously stop a service while startinganother that depends upon it. This will result in unnecessary service restarts.Workaround: Manually stop the processes in the reverse order of their dependency.

131924

If a node crashes on a three-node cluster and it is not re-added to the cluster, andthen you add a node, one of the remaining nodes might unexpectedly reboot. Youmight need to wait for a significant amount of time before you can add the node tothe cluster successfully.Workaround: Add the node to the cluster while no writes are being made to thecluster. This will prevent the issue from occurring.

124603

LDAP user and group ownership cannot be configured in the OneFS webadministration interface.Workaround: Use the command-line interface to configure LDAP user and groupownership.

103983

An Alternate Data Stream (ADS) block-accounting error might cause the InodeFormat Manager (IFM) module to fail, causing the following message to be loggedto the stack trace:

kernel:isi_assert+0xdekernel:isi_assert_mayhalt+0x70efs.ko:ifm_compute_new_ads_summary+0x9aefs.ko:ifm_update_ads_summary+0x15befs.ko:ifm_end_operation+0x11adefs.ko:txn_i_end_all_inode_ops+0x11defs.ko:txn_i_end_operations+0x5eefs.ko:txn_i_end+0x3defs.ko:bam_remove+0x198efs.ko:ifs_vnop_wrapremove+0x1bfkernel:VOP_REMOVE_APV+0x33kernel:kern_unlinkat+0x2a6kernel:isi_syscall+0x49kernel:syscall+0x26e

Workaround: Ignore this error message. This is a transient error that OneFS willrecover from automatically.

100118

Nodes without subpools appear in the per-node storage statistics, but are not inthe cluster totals because you cannot write data to unprovisioned nodes.

86328

The OneFS web administration interface does not prevent multiple rolling upgradesfrom being started simultaneously. If multiple rolling upgrades are runningsimultaneously, the upgrades fail.

84376

Some configuration changes cannot be made if the cluster is 99 percent full. As aresult, the cluster might stop responding until sufficient free space is made

74272

Known issues


File system known issues ID

available. See Best Practices Guide for Maintaining Enough Free Space on IsilonClusters and Pools on the EMC Online Support site.

When you attempt to create a hard link to a file in a WORM (Write Once Read Many)directory, the following incorrect error message displays:

Numerical argument out of domain

73790

When FlexProtect is run with verify upgrade check enabled and one or more drivesare down, OneFS occasionally reports false data corruption. If this issue occurs,contact EMC Isilon Technical Support.

73276

If you run an incorrectly formatted shutdown command, a node might be placedinto read-only mode and could fail to shutdown. In some cases the node isinaccessible through the network but is still accessible through a serial connection.For more information, see article 89544 on the EMC Online Support site.

54120

In the OneFS web administration interface, file names that contain characters thatare not supported by the character encoding that is applied to the cluster do notdisplay correctly when viewed through File System Explorer.Workaround: Rename the files using characters supported by the characterencoding that is applied to the cluster.

18901

File transferFile transfer known issues ID

The FTP output of the isi statistics command might be inaccurate. 129599

By default, the Very Secure FTP Daemon (vsftpd) service supports clear-textauthentication, which is a possible security risk.

Note

For more information about this issue, see the Protocols section of the OneFS 7.2Security Configuration Guide.

127738

In the OneFS web administration interface, on the Diagnostics > Settings page, if

you enter an invalid address in the HTTP host or FTP host field, ConnectionVerification Succeeded is displayed when you click Verify.

70448

HardwareHardware known issues ID

If you run the isi devices -a smartfail -d <device> command to

smartfail a drive that failed before it was purposed by OneFS, an error similar to thefollowing appears on the console:

!! Error: the smartfail action is invalid for a missing drive.

159412

Known issues

File transfer 151




Hardware known issues ID

Note

In the command example above, <device> is the drive to be smartfailed.

If you reboot or shut down a node with a Broadcom 10 GbE network interface cardthat is configured for legacy fec aggregation, the node might stop responding untilit is manually powered off.

136915

If the power supply fan in an HD400 node fails, the power supply indicator lightturns yellow, but no alert is sent. If this condition is not addressed, the powersupply will eventually fail and an alert will be sent for the power supply failure.Contact EMC Isilon Technical Support if you encounter this issue.

135814

If a node encounters a journal error during an initial boot, OneFS allows the user tocontinue booting the node through the following text:

Test Journal exited with error - Checking Isilon Journal integrity...NVRAM autorestore status: not performed...Could not restore journal. Contact Isilon Customer Support Immediately. Please contact Isilon Customer Support at 1-877-ISILON.

Command Options:1) Enter recovery shell2) Continue booting3) Reboot

If the node is booted in this state, and then joined to a cluster, it will remain in adown state and might affect cluster quorum.

Workaround: Do not continue booting the node. Contact Isilon Technical Support.

135354

If an SED SSD drive is set to SED_ERROR, and the drive is formatted while L3

cache is enabled on the cluster, the drive will be formatted for storage and willreport a status of HEALTHY.

Workaround: SmartFail the SED SSD that has been formatted for storage and thenformat the drive again.

133696

The isi firmware update command might incorrectly report that a firmware

update has failed because OneFS requires nodes to be rebooted after a firmwareupdate, but the command performs a shutdown -p command instead.

133606

The isi firmware update command might incorrectly report that a firmware

update has failed on a remote node.

133317

Node firmware updates will fail if HPM downloads return error code D5 during theupgrade process.Workaround: Retry updating the node firmware. If this issue persists, contact EMCIsilon Technical Support.

132523

Chassis Management Controller (CMC) firmware update procedures might fail.Workaround: Run the following command and then retry the update.

/usr/bin/isi_ipmicmc -c -a cmc

123303

An internal sensor that monitors components might not correctly detect the sourceof a hardware component failure, such as the I2C bus. If this occurs, the wrong alertor no alert might be generated.

73050

Known issues


Hardware known issues ID

Nodes with invalid system configuration numbers are split from the cluster afterjoining.Workaround: Use smartfail to remove the node from the cluster. Contact IsilonTechnical Support to apply a valid system configuration number to the node andthen add the node to the cluster again.

71354

A newly created cluster might not be visible to unconfigured nodes for up to threeminutes. As a result, nodes will fail to join the cluster during that time period.

69503

If the /etc/isilon_system_config file or any /etc VPD file is blank, an

isi_dongle_sync -p operation will not update the VPD EEPROM data.

67932

There are multiple issues with shutting down a node incorrectly that can potentiallylead to data loss.Workaround: Follow instructions about shutting down nodes exactly.


35144

HDFSHDFS known issues ID

DataNode connections can potentially experience a memory leak in the data path.Over time, this can result in an unexpected restart of the HDFS server. As a result,clients connected to that node are disconnected.Workaround: The HDFS server will automatically be operational again within a fewseconds and no further action is necessary.

158083

If the Hadoop datanode services are left running on Hadoop clients that areconnected to a cluster, the isi_hdfs_d process will continuously log the followingmessage to /var/log/messages and /var/log/isi_hdfs_d.log as it

receives the requests:

org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol from verify_ipc_protocol (/build/mnt/src/isilon/bin/isi_hdfs_d/protoutil.c:18) from parse_connection_context (/build/mnt/src/isilon/bin/isi_hdfs_d/protoutil.c:100) from ver2_2_parse_connection_context (/build/mnt/src/isilon/bin/isi_hdfs_d/protocol_v2_2.c:388) from process_out_of_band_rpc (/build/mnt/src/isilon/bin/isi_hdfs_d/protocol_v2_2.c:1000)

135993

If the cluster is under heavy HDFS load, it might cause the isi_hdfs_d process torestart. If this occurs, the following lines appear in the stack trace:

/lib/libc.so.7:__sys_kill+0xc/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/usr/lib/libisi_hdfs.so.1:hdfs_enc_mkdirat_p+0x2b1/usr/lib/libisi_hdfs.so.1:hdfs_mkdir_p+0x41/usr/bin/isi_hdfs_d:config_init_directory+0x13

123802

Known issues

HDFS 153


iSCSIiSCSI known issues ID

The iSCSI protocol can log a data digest error message in the iSCSI log.

No workaround is required; the protocol will recover and reconnect.

83537

VSS 32-bit installation succeeds on a Windows initiator, but the provider does notappear in the list of installed providers. This issue affects Windows Server 2003only.Workaround: Install the Microsoft iSCSI Software Initiator.


74303

In the OneFS web administration interface, the iSCSI Summary page sometimesloads slowly. When this occurs, the page might time out and the isi_webui_dprocess might be consuming a high percentage of CPU resources on one or morenodes.

73038

If you create a new target after you move iSCSI shadow clone LUNs, the OneFS webadministration interface might become unresponsive.

71919

Job engineJob engine known issues ID

In rare instances, if a drive fails while IntegrityScan is running, the IntegrityScan jobcan fail. In addition, if you run the isi job events list --job-typeintegrityscan command, a message similar to the following appears on the

console, where <x> is the job ID:

2015-02-12T15:35:31 <x> IntegrityScan 1State change Failed

The job should automatically restart and then run to completion.

139708

In rare instances, if a drive fails while MediaScan is running, the MediaScan job canfail. In addition, if you run the isi job events list --job-typemediascan command, a message similar to the following appears on the console,

where <x> is the job ID:

2015-02-12T15:35:31 <x> MediaScan 1 Statechange Failed

The job should automatically restart and then run to completion.

139704

The isi_job_d process might fail while a QuotaScan job is running. If this happens,the QuotaScan job will continually pause and resume, and the following lines willappear in the stack trace:

/lib/libc.so.7:thr_kill+0xc/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0/usr/lib/libisi_job.so.1:tw_opendir+0x207/usr/lib/libisi_job.so.1:tw_tree_init+0x327/usr/bin/isi_job_d:treewalk_task_next_item+0x150

134301

Known issues



Job engine known issues ID

/usr/bin/isi_job_d:quotascan_task_next_item+0x4c/usr/bin/isi_job_d:worker_process_task+0x307/usr/bin/isi_job_d:worker_main+0x11cd : /lib/libthr.so.3:_pthread_getprio+0x15d

Workaround: Cancel the QuotaScan job.

If you queue multiple jobs while smartfailing drives, AutoBalance jobs might fail. 133771

The MediaScan job reports errors for drives that have been removed from thecluster.Workaround: Don't fail a drive after a MediaScan job has started, or cancel the job.

132083

If the MultiScan, Collect, or Autobalance jobs are disabled before a rolling upgrade,the jobs will automatically become enabled after the rolling upgrade completes.Workaround: If MultiScan, Collect, or Autobalance jobs are disabled before a rollingupgrade, and you want those jobs to be disabled after the rolling upgradecompletes, manually disable those jobs after the rolling upgrade completes.

124744

If a FlexProtect or FlexProtectLin job is started during a rolling upgrade, OneFSmight cancel the job. The job might not complete until after the rolling upgrade iscomplete.Workaround: If OneFS creates a FlexProtect job because a device failed during arolling upgrade, pause the upgrade until the job completes. It is recommended thatyou pause the rolling upgrade and do not pause the FlexProtect job.

123167

The isi job status command displays jobs in numerical order by running ID

instead of displaying active jobs before inactive jobs.

114802,114583

The isi job reports view job command sometimes returns reports twice. 112265

The Dedupe and DedupeAssess jobs can only run with a job-impact level of low. 110129

When you run a DomainMark job after taking a snapshot, and then run aSnapRevert job with a job impact policy set higher than low, the impact policy has

no effect.For more information, see article 88597 on the EMC Online Support site.

93603

Job engine operations occasionally fail on heavily loaded or busy clusters. Whenthe command fails, a message similar to the following is displayed:

Unable to pause integrity scan: pause command failed: Resourcetemporarily unavailable.

Workaround: If an operation fails, wait a moment and then retry the operation.

72109

The final phase of the FSAnalyze job runs on one node and can consume excessiveresources on that node.

64854

Known issues

Job engine 155


MigrationMigration known issues ID

If you migrate ACLs to the cluster through the isi_vol_copy_vnx command and

then attempt to read those ACLs over NFSv4, the read will fail with the followingerror message:

An NFS server error occurred

131299

If you migrate FIFO files using the isi_vol_copy utility, the following message

displays:

Save checkpoint error: Could not match file history.

100366

If the isi_vol_copy command is run twice, with different source paths but the

same target path, the second run fails without migrating any files.

100365

NetworkingNetworking known issues ID

If a network socket is already closed when sbflush_internal is called, the affectednode might unexpectedly reboot. If a node reboots as a result of this issue, an errorsimilar to the following appears in the /var/log/messages file:

Software Watchdog failed (userspace is starved!)

150739

In clusters with a large number of nodes, after an InfiniBand switch is rebooted, thecluster might experience a high level of group change activity for approximately twohours. Because, by default, a single Device Work Thread (DWT) is handling all nodetransitions to the new InfiniBand connections, some requests are not handled in atimely manner. As a result, nodes might not successfully failover to the newInfiniBand connection, and, in some cases, might fail to rejoin the cluster.Workaround: To increase the number of DWT threads handling requests to failoverto a new InfiniBand connection, set the following sysctl value:

sysctl efs.rbm.dwt_threads=4

For more information about viewing and setting sysctl options, see article 89232 onthe EMC Online Support site.

Note

Increasing the number of DWT threads might affect CPU performance, dependingon the number of processors in the node.

134665

Known issues



Networking known issues ID

The OpenSM process might fail, causing cluster-wide actions to slow for a shortperiod of time. If this occurs, the following lines appear in the stack trace:

/lib/libc.so.7:thr_kill+0xc/lib/libc.so.7:__assert+0x35/usr/lib/libcomplib.so.1:cl_spinlock_acquire+0x53/usr/libexec/opensm:osm_log+0xef/usr/libexec/opensm:umad_receiver+0x55b/usr/lib/libcomplib.so.1:__cl_thread_wrapper+0x18/lib/libthr.so.3:_pthread_getprio+0x15d

132546

Ixgbe interfaces might report a status of inactive, even if the cable and the port thatthe cable is plugged into is functioning correctly.

127706

If a port on an A100 node has IP addresses assigned to it, the port will reinitializewhen the node is booted up.

126464

After a group change, the dnsiq_d process might fail. After this, the followingmessage is logged to the stack trace:

/usr/sbin/isi_dnsiq_d:vip_configured+0x54/usr/sbin/isi_dnsiq_d:vip_ifconfig_down+0x18/usr/sbin/isi_dnsiq_d:apply_flx_subnet+0x7c/usr/sbin/isi_dnsiq_d:gmp_group_changed+0x122/usr/sbin/isi_dnsiq_d:main+0x660/usr/sbin/isi_dnsiq_d:_start1+0x80/usr/sbin/isi_dnsiq_d:_start+0x15

78588

When a node with a static IP address is smartfailed, the IP address is assigned toanother node. In some cases, the IP address that is moved might be moved to anode that already has an IP address assigned to it, replacing the IP address on thatnode.

71687

If an IPv6 subnet includes two or more NICs, one NIC might become unresponsiveover IPv6.

57880

NFSNFS known issues ID

If you run the rmdir command to remove a directory from an NFS export that is

configured with character encoding other than the default encoding—for example,CP932 or ISO-8859-1 encoding—and if the name of the directory you want toremove contains a special character, the directory is not removed and a messagesimilar to the following appears on the console:

failed to remove `\directory_path': Invalid argument

159373

On occasion, when OneFS is shutting down the NFS server, a system call made bythe server does not return a response within the allowed 15-minute grace period.As a result, the NFS server is forcibly shut down and lines similar to the followingappear in the /var/log/messages file:

/lib/libc.so.7:syscall+0xc/usr/likewise/lib/lw-svcm/onefs.so:OnefsQuerySetInformationFile+0xa7

136358

Known issues

NFS 157

NFS known issues ID

/usr/likewise/lib/lw-svcm/onefs.so:OnefsSetInformationFile+0x3b/usr/likewise/lib/lw-svcm/onefs.so:OnefsIrpSpark+0x109/usr/likewise/lib/lw-svcm/onefs.so:OnefsIrpWork+0xfa/usr/likewise/lib/lw-svcm/onefs.so:OnefsAsyncStart+0x55/usr/likewise/lib/lw-svcm/onefs.so:OnefsDriverDispatch+0x6f/usr/likewise/lib/libiomgr.so.0:IopFmIrpStateDispatchFsdExec+0x9d/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x56c/usr/likewise/lib/libiomgr.so.0:IopIrpDispatch+0x1d0/usr/likewise/lib/libiomgr.so.0:IopQuerySetInformationFile+0x1fc/usr/likewise/lib/libiomgr.so.0:IoSetInformationFile+0x44/usr/likewise/lib/lw-svcm/nfs.so:Nfs4SetattrSetInfoFile+0x5a2/usr/likewise/lib/lw-svcm/nfs.so:Nfs4Setattr+0x3bd/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4ProcSetAttr+0x178/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4ProcCompound+0x87e/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4Dispatch+0x486/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4CallDispatch+0x3e/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7

The NFS process might fail if you attempt to shut down the NFS process while thecluster is under heavy NFS load.

135529

OneFS might report that NFS clients are still connected to the cluster after theclients have disconnected.

135376

The NFS process might core, causing all NFS clients to be disconnected. If thisoccurs, the following lines appear in the stack trace:

/lib/libc.so.7:thr_kill+0xc/lib/libc.so.7:__assert+0x35/usr/likewise/lib/libiomgr.so.0:IoFileSetContext+0x32/usr/likewise/lib/lwio-driver/onefs.so:OnefsStoreCCB+0x20/usr/likewise/lib/lwio-driver/onefs.so:OnefsNfsCreateFile+0xf4b/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreateInternal+0x1209/usr/likewise/lib/lwio-driver/onefs.so:OnefsSemlockAvailableWorker+0x92/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncUpcallCallbackWorker+0x1dd/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncUpcallCallback+0xe8/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xb9/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x56/usr/likewise/lib/liblwbase.so.0:EventThread+0x6dc/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee/lib/libthr.so.3:_pthread_getprio+0x15d

129684

If an SMB client has an opportunistic lock (oplock) on a file and the file is renamedor deleted by an NFS client, the SMB client does not relinquish its oplock, and thefile data on the SMB client is not updated. This issue is caused by an extremely rarerace condition that might occur in OneFS 6.0 or later.For more information, see article 88591 on the EMC Online Support site.

94168

After a node restarts, the mountd process starts before authentication. As a result,immediately after the node restarts, NFS clients might experience permissionproblems or receive the wrong credentials when they mount a directory over NFS.Workaround: On the NFS client, unmount and remount the directory.

73090

Moving files between exports in an NFSv4 overriding-exports scenario may causeunforeseen consequences.Workaround: Configure exports so that they are not exporting similar paths ormapping to two different credentials.

70616

When you add a node to the cluster, the master control program (MCP) loads thesysctl.conf file after the external interfaces have IP addresses. As a result, NFS

70413

Known issues



NFS known issues ID

clients that require 32-bit file handles might encounter issues connecting to newlyadded nodes.Workaround: On NFS clients that encounter this issue, unmount and then remountthe directory.

The default number of NFS server threads was changed to address a potential issuein which the NFS server monopolizes node resources. NFS performance might belower than expected.Workaround: Adjust the number of nfsd threads by running the followingcommands. Modify the minimum number of threads by running the followingcommand, where <x> is an integer:

isi_sysctl_cluster vfs.nfsrv.rpc.threads_min=<x>Modify the maximum number of threads by running the following command, where<x> is an integer:

isi_sysctl_cluster vfs.nfsrv.rpc.threads_max=<x>We recommend that you set threads_min and threads_max to the same value.Increasing the number of threads can improve performance, but can also causenode stability issues.

69917

OneFS APIOneFS API known issues ID

The lwswift process might fail if a large number of clients retrieve large files thathave not been previously accessed by Swift. If this occurs, the following linesappear in the stack trace:

/lib/libc.so.7:thr_kill+0xc/usr/likewise/lib/liblwbase_nothr.so.0:LwRtlMemoryAllocate+0x9e/usr/likewise/lib/liblwbase.so.0:LwIovecCreateMemoryEntry+0x22/usr/likewise/lib/liblwbase.so.0:LwIovecPullupCapacity+0x1ae/usr/likewise/lib/lwio-driver/lwswift.so:_Z12HttpProtocolPN5swift10_LW_SOCKETEP9_LW_IOVECiPvPj+0x165/usr/likewise/lib/liblwswift_utils.so.0:_ZN5swift12LwSocketTaskEP8_LW_TASKPv19_LW_TASK_EVENT_MASKPS3_Pl+0x634/usr/likewise/lib/liblwbase.so.0:EventThread+0x6dc/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee

135252

If you attempt to write to a read-only file, OneFS does not log an error message tothe /var/log/lwswift.log file.

134770

In the RESTful Access to the Namespace (RAN) API, when a file is created throughthe PUT operation, a temporary file of the same name with a randomly generatedsuffix is placed in the target directory. Under normal circumstances, the temporaryfile is removed after the operation succeeds or fails. However, the temporary filemay remain in the target directory if the server crashes or is restarted during thePUT operation.

104388

Known issues

OneFS API 159

OneFS web administration interfaceOneFS web administration interface known issues ID

If you run the isi devices fwupdate command on a node that contains SSDs

configured for use as L3 cache, and that node is in read-only mode, the node mightrestart unexpectedly and an error similar to the following appears inthe /var/log/messages file:

login: panic @ time 1436325569.184, thread 0xffffff01a8c175b0: Assertion Failurecpuid = 2Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: --------------------------------------------------kernel:isi_assert_halt+0x2ekernel:jio_journal_write_sync+0x60kernel:j_write_l3_super+0x104kernel:mgmt_finish_super+0x4bkernel:mgmt_remove_from_sb+0x18bkernel:l3_mgmt_drive_state+0x7eckernel:drv_change_drive_state+0x111kernel:ifs_modify_drive_state+0x16cbkernel:_sys_ifs_modify_drive_state+0x83kernel:isi_syscall+0xafkernel:syscall+0x325--------------------------------------------------*** FAILED ASSERTION j_can_continue_write(j) @ /b/mnt/src/sys/ifs/journal/journal_io.c:186: jio_journal_write_sync: attempt to write to read-only journal

155489

If you attempt to upload cluster information through the OneFS web administrationinterface and the upload fails, the web interface for uploading information ceasesto function. If you attempt to upload information again, OneFS will displayGather Succeeded. However, no cluster information will be uploaded.

133974

If you have not uploaded cluster information to Isilon Technical Support yet, on the

Cluster Management > Diagnostics Info page, the Gather Status bar appearsgray or black.

133972

The default SSL port (8080) for the web administration interface cannot bemodified.For more information, see article 88725 on the EMC Online Support site.

94026

If you use the SmartConnect service IP or hostname to log in to the OneFS webadministration interface, the session fails or returns you to the login page.Workaround: Connect to the cluster with a static IP address instead of a hostname.

75292

SecuritySecurity known issues ID

Beginning in OneFS 7.2.0.1, SSL v3 is no longer supported for HTTPS connectionsto the cluster. As a result, HTTP clients cannot connect to the OneFS webadministration interface through a connection that relies on SSL v3.Workaround: Enable TLS 1.x for HTTP connections to the web administrationinterface.

137904

Known issues



Security known issues ID

For more information, see ESA-2015-015 on the EMC Online Support site.

SmartQuotasSmartQuotas known issues ID

If you configured a storage quota on a directory with a pathname that contained asingle, multibyte character, and if a quota notification email was sent for thatdirectory, the multibyte character in the pathname that appeared in the quotanotification email was replaced with an incorrect character, such as a questionmark.

138115

Quota configuration import and export functionality is missing from the isiquotas command.

Workaround: To export or import quota configuration files, run the isi_classicquota list --export or the isi_classic quota --import --from-file <filename> command from the command-line interface, where <filename>

is the name of the file to be imported.

To export a file from the OneFS web administration interface, click SmartQuotas >

Generated Reports Archive > Generate a quota report.

94797

Writing files past the quota limit over NFSv4 generates an I/O error. 69816

SMBSMB known issues ID

On the Protocols > Windows Sharing > SMB Shares tab in the OneFS web

administration interface, if you click Reset or Cancel in the Add a User or Groupdialog box while adding or viewing an SMB share, the Add a User or Group dialogbecomes inoperable.Workaround: Refresh the OneFS web administration web page.

139712

If you shut down a node while a cluster is under heavy load, the following linesmight appear in the stack trace:

/lib/libc.so.7:recvfrom+0xc/usr/lib/libisi_gconfig_c_client.so.1:gconfig_connection_flush+0x375/usr/lib/libisi_gconfig_c_client.so.1:gconfig_connection_read_message+0x47/usr/lib/libisi_gconfig_c_client.so.1:gconfig_client_update_entries_count+0x799/usr/lib/libisi_gconfig_c_client.so.1:gconfig_client_wait_for_config_change+0x274/usr/likewise/lib/lwio-driver/srv.so:StoreChangesWatcherThreadRoutine+0xf3/lib/libthr.so.3:_pthread_getprio+0x15d

134661

If an application sends OneFS a request for alternate data streams, but specifies abuffer size that is too small to receive all of the alternate data streams, OneFS will

134299

Known issues

SmartQuotas 161


SMB known issues ID

report that the streams do not exist, instead of reporting that the buffer size wastoo small.

Alternate data streams might be inaccessible through Windows PowerShell. 134250

The isi_papi_d process might fail while there is a large amount of SMB traffic andmultiple threads call the same code at the same time. However, in rare cases, theport can suddenly become inactive.Workaround: If a port becomes inactive, you must reboot the node to resolve thisissue.

130692

Some SMB 1 clients send a Tree Connect AndX request using ASCII to specify apath. The cluster rejects the connection with STATUS_DATA_ERROR.

84457

When you add a new Access Control Entry (ACE) that grants run-as-root permissionsto an Access Control List (ACL) on an SMB share, OneFS adds a duplicate ACE ifthere is already an entry granting full control to the identity. The extra ACE grants noextra permissions.Workaround: Remove the extra ACE by running the isi smb permissionscommand.

72337

Upgrade and installationUpgrade and installation known issues ID

Beginning in OneFS 7.2.0.1, the network port range used for back-endcommunications was changed. As a result, in rare cases, if you perform a rollingupgrade from a supported version of OneFS to OneFS 7.2.0.1 or later, and if theupgrade process fails or is paused before all of the nodes in the cluster have beenupgraded, commands sent from nodes that have not yet been upgraded might besent to an upgraded node through an unsupported port.If this issue occurs, affected nodes are not upgraded, the command that was sentfails, and messages similar to the following might appear on the console:

ERROR Client connected from an unprivileged port number 50230. Refusing the connection[Errno 54] RPC session disconnected

Note

You can avoid this issue by performing a simultaneous upgrade. If you encounterthis issue, see article 198906 on the EMC Online Support Site.

143408

If you initiate a simultaneous upgrade through the OneFS web administrationinterface, OneFS incorrectly reports that a rolling upgrade is occurring through thefollowing message:

A rolling upgrade is currently in progress. Some changes to configuration may be disallowed.

133409

When running the sudo isi update command, you might encounter warnings

that the cluster contains unresolved critical events, that certain drives are ready tobe replaced, or that devices in the carrier boards are not supported boot disks. Youcan disregard these messages because they have no adverse affects.

131929

Known issues



Upgrade and installation known issues ID

After a rolling upgrade is complete, the OneFS web administration interface mightreport that a rolling upgrade is still in progress.Workaround: Restart the rolling upgrade.

For more information, see article 186845on the EMC Online Support site.

126799

If a node is rebooted during a rolling upgrade, and the node fails, the upgradeprocess might continue to run indefinitely, even after all other nodes have beenupgraded.

125320

If Collect or MultiScan jobs are in progress when either a rolling upgrade or clusterreboot is initiated, the job will fail instead of being cancelled.

Note

If the Collect or MultiScan jobs continue to fail after the rolling upgrade iscomplete, it is unlikely that the failure was caused by this issue.

123903

During a rolling upgrade, if you are logged in to a node that has not been upgradedyet, and you view job types, the system displays several disabled job types with IDsof AVScan.

These job types are new to OneFS 7.1.1 and have been mislabeled during therolling upgrade process. The IDs of the job types will resolve to the correct IDs afterthe rolling upgrade is complete.

123842

Jobs that are running when a OneFS upgrade is started might not continue runningafter the upgrade completes.Workaround: Cancel all running jobs before upgrading or manually restart jobs thatdid not restart automatically following the upgrade.

98341

If an upgrade job is started on a cluster containing a node with a degraded bootdrive, the upgrade engine crashes on initialization, preventing the upgrade fromproceeding.For more information, see article 88746 on the EMC Online Support site.

98072

Virtual plug-insVirtual plug-ins known issues ID

Adding an Isilon vendor provider might fail when you enable VASA support.Additionally, the VASA information that appears in vCenter might be incorrect.These issues can occur if you create a data store or virtual machine through theVMware vSphere PowerCLI.Workaround: You can resolve this issue by creating data stores through either theVMware vCenter graphical user interface or the VMware ESXi command-lineinterface.

97735

Known issues

Virtual plug-ins 163



Known issues


CHAPTER 9

OneFS Release Resources

Sources for information about and help with the OneFS operating system.

l OneFS information and documentation............................................................... 166l Functional areas in the OneFS release notes........................................................167l Where to go for support.......................................................................................171l Provide feedback about this document............................................................... 171

OneFS Release Resources 165

OneFS information and documentationEMC Isilon channelsYou can access OneFS information through the following channels.

Channel Description

EMC Isilon OneFSProduct Page

Visit the EMC Isilon OneFS product page on the EMC Online Support site todownload Isilon product documentation and current software releases.

Help on ThisPage

Select Help on this Page from the Help menu in the OneFS webadministration interface to see information from the OneFS Web

Administration Guide and the OneFS Event Reference. The Help on This Pageoption does not require internet connectivity.

Online Help Select Online Help from the Help menu in the OneFS web administrationinterface to see information from the OneFS Web Administration Guide and theOneFS Event Reference. The Online Help contains the latest available versions

for these guides. The Online Help option requires internet connectivity.

ISI Knowledge You can visit the ISI Knowledge blog weekly for highlights and links to Isilonsupport content we have to offer. Announcements of availability of content,product tips, and information about new ID.TV videos.

EMC IsilonYouTube playlist

You can visit the EMC Isilon YouTube playlist on the EMC Corporate YouTubechannel for Isilon how-to videos, information about new features,information about Isilon hardware, and technical overviews.

Available documentationOneFS documentation is available across the following channels.

Document Channel

OneFS 7.2.0 Release NotesInformation about new features, operational changes, enhancements, andknown issues for OneFS 7.2.0.

EMC Online Support

OneFS 7.2 Web Administration GuideInformation about the OneFS web administration interface, which enablesyou to manage an Isilon cluster outside of the command line interface orLCD panel.

EMC Online SupportHelp on this page

Online Help

OneFS 7.2 CLI Administration GuideInformation about the OneFS command-line interface (CLI), which includescommands that enable you to manage an Isilon cluster outside of the webadministration interface or LCD panel.

EMC Online Support

OneFS 7.2 Event ReferenceInformation about how to monitor the health and performance of your EMCIsilon cluster through OneFS event notifications.

EMC Online SupportHelp on this page

Online Help

OneFS 7.2 Backup and Recovery GuideInformation about backup and recovery procedures with NDMP and SyncIQ.

EMC Online Support

OneFS 7.2 API Reference EMC Online Support





http://isiblog.emc.com

http://bit.ly/Isilon-YouTube

http://bit.ly/Isilon-YouTube







Document Channel

Information about how to access cluster configuration, management, andmonitoring functionality, and also how to access directories and files on thefile system through an HTTP-based interface.

OneFS 7.2 Security Configuration GuideInformation about the security features in OneFS.

EMC Online Support

OneFS Site Preparation and Planning GuideInformation for system administrators and facility managers about how toplan and implement an Isilon cluster in an optimal data centerenvironment.

EMC Online Support

OneFS Upgrade Planning and Process GuideInformation that users should take into account when deciding to upgradethe OneFS operating system and information about tasks that users shouldperform to prepare the cluster for the upgrade.

EMC Online Support

OneFS CLI MappingsCommand syntax changes that were implemented between OneFS versions.

EMC Online Support

OneFS 7.2 Upgrade Readiness ChecklistA checklist to help users ensure that their cluster is ready to upgrade toOneFS 7.2.

EMC Online Support

OneFS 7.2 Migration Tools GuideInformation about how to migrate data to an Isilon cluster through OneFSmigration tools.

EMC Online Support

OneFS 7.2 iSCSI Administration GuideInformation about how to manage block storage on an Isilon clusterthrough the OneFS iSCSI software module.

EMC Online Support

OneFS 7.2 Swift TechnoteInformation about how to store content and metadata as objects on acluster through RESTful APIs.

EMC Online Support

Functional areas in the OneFS release notesThis section contains a list of the functional areas that are used to categorize content inthe OneFS release notes and descriptions of the types of content that each categorycontains.

AntivirusThis functional area is used to categorize new features, changes, and issues thataffect the way OneFS interacts with antivirus software.


Functional areas in the OneFS release notes 167









AuthenticationThis functional area is used to categorize new features, changes, and issues thataffect authentication on the cluster. This includes, but is not limited to:

l Access control lists (ACLs)

l LDAP

l NIS

l Role-based access control (RBAC)

Backup, recovery, and snapshotsThis functional area is used to categorize new features, changes, and issues thataffect backup, recovery, and snapshots. This includes, but is not limited to:

l NDMP

l Snapshots

l SyncIQ

l Symantec NetBackup

Cluster configurationThis functional area is used to categorize new features, changes, and issues thataffect cluster configuration. This includes, but is not limited to:

l CloudPools

l Licensing

l NTP

l OneFS registry (gconfig)

l SmartPools

Command-line interfaceThis functional area is used to categorize new features, changes, and issues thataffect the OneFS command-line interface.

Diagnostic toolsThis functional area is used to categorize new features, changes, and issues thataffect tools that are used to research and diagnose cluster issues. This includes, butis not limited to:

l EMC Secure Remote Services (ESRS)

l Gather info (isi_gather_info)

l Help in the OneFS web administration interface



Events, alerts, and cluster monitoringThis functional area is used to categorize new features, changes, and issues thataffect utilities that are used to detect and record system events and utilities that areused to monitor cluster health and general statistics. This includes, but is not limitedto:

l Alerts

l Protocol auditing

l Cluster event log (CELOG)

l File system analytics (FSA)

l Onsite Verification Test (OVT)

l Simple Network Management Protocol (SNMP)

l Statistics

l Status

File systemThis functional area is used to categorize new features, changes, and issues thataffect the OneFS file system. This includes, but is not limited to:

l Cluster group management

l File system coalescer

l File system events (not CELOG)

l FreeBSD

l L3 cache

l MCP

l Network Lock Manager (NLM)

l OneFS Kernel

File transferThis functional area is used to categorize new features, changes, and issues thataffect FTP and HTTP connections to the cluster.

HardwareThis functional area is used to categorize new features, changes, and issues thataffect Isilon hardware in a OneFS cluster .

HDFSThis functional area is used to categorize new features, changes, and issues thataffect the HDFS protocol.

iSCSIThis functional area is used to categorize new features, changes, and issues thataffect the iSCSI protocol and iSCSI devices connected to a OneFS cluster.

Note

Support for the iSCSI protocol is deprecated in this version of OneFS.


Functional areas in the OneFS release notes 169

Job engineThis functional area is used to categorize new features, changes, and issues thataffect the OneFS job engine and deduplication in OneFS.

MigrationThis functional area is used to categorize new features, changes, and issues thataffect migration of data from a NAS array or a OneFS cluster to a OneFS clusterthrough the isi_vol_copy utility or the isi_vol_copy_vnx utility.

NetworkingThis functional area is used to categorize new features, changes, and issues thataffect the OneFS external network and the OneFS back-end network. This includes,but is not limited to:

l Fibre Channel

l Flexnet

l InfiniBand

l SmartConnect

l TCP/IP

NFSThis functional area is used to categorize new features, changes, and issues thataffect NFS connections to the cluster.

OneFS APIThis functional area is used to categorize new features, changes, and issues thataffect the OneFS Platform API and SWIFT.

OneFS web administration interfaceThis functional area is used to categorize new features, changes, and issues thataffect the web administration interface.

PerformanceThis functional area is used to categorize new features, changes, and issues thataffect cluster performance.

SecurityThis functional area is used to categorize new features, changes, and issues that arerelated to security fixes and vulnerabilities.

Security ProfilesThis functional area is used to categorize new features, changes, and issues thataffect hardened profiles such as the security technical information guides (STIG).

SmartQuotasThis functional area is used to categorize new features, changes, and issues thataffect SmartQuotas.

SMBThis functional area is used to categorize new features, changes, and issues thataffect SMB connections to the cluster.

Upgrade and installationThis functional area is used to categorize new features, changes, and issues thataffect OneFS upgrades, installation of OneFS patches, and the reformatting andreimaging of Isilon nodes by using a USB flash drive.



Virtual plug-insThis functional area is used to categorize new features, changes, and issues thataffect virtual plug-ins. This includes, but is not limited to:

l Isilon for vCenter

l OneFS Simulator

l Storage Replication Adapter (SRA)

l vStorage APIs for Array Integration (VAAI)

l VMware vSphere API for Storage Awareness (VASA)

vOneFSThis functional area is used to categorize new features, changes, and issues thataffect vOneFS.

Where to go for supportYou can contact EMC Isilon Technical Support for any questions about EMC Isilonproducts.

Online Support Live Chat

Create a Service Request

Telephone Support United States: 1-800-SVC-4EMC (800-782-4362)

Canada: 800-543-4782

Worldwide: +1-508-497-7901

For local phone numbers in your country, see EMC CustomerSupport Centers.

Help with onlinesupport

For questions specific to EMC Online Support registration oraccess, email [email protected].

Provide feedback about this documentWe value your feedback. Please let us know how we can improve this document.

l Take the survey at http://bit.ly/isi-docfeedback.

l Send your comments or suggestions to [email protected].


Where to go for support 171

https://support.emc.com/servicecenter/liveChat/

https://support.emc.com/servicecenter/createSR/

http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf

http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf

mailto:[email protected]

http://bit.ly/isi-docfeedback

mailto:[email protected]

docu56010_isilon-onefs-7.2.0.0---7.2.0.4-release-notes

Documents

emc software

emc logo

emc isilon

trademarks of emc corporation

com2 onefs

onefs api

emc online support https

publication date