Interpreting SIM and NIM errpt entries

From Wikistix

The following information is taken from the Statistical Analysis and Reporting System User Guide Version 1.0 - 29 November 1999, Chapter 1. Service Information Message (SIM) and Media Information Message (MIM) may be generated by various IBM Magstar tape drives, like the 3570, 3590 and 3592.

What is SARS?

The Statistical Analysis and Reporting System (SARS) analyzes and reports on tape drive and tape cartridge performance to help you:

  • Determine whether the tape cartridge or the hardware in the tape drive is causing errors
  • Determine if the tape media is degrading over time
  • Determine if the tape drive hardware is degrading over time

The 3590 tape drive microcode contains a Volume SARS (VSARS) algorithm and a Hardware SARS (HSARS) algorithm. SARS reports the results of its analysis in the form of Service Information Messages (SIM) and Media Information Messages (MIM). These messages are the means by which SARS communicates problems in order to improve tape library productivity.

The SARS algorithms are executed in the 3590 just before a tape is unloaded. To distinguish error patterns and trends, the SARS volume algorithms require the tape to be mounted on different drives. The SARS hardware algorithms require different tapes to be mounted on one drive. If a tape drive performs poorly with different tape volumes, cleaning and service repair messages or error codes are presented. Similarly, if tape volumes continue to perform poorly on different drives, rewrite or discard-media messages are presented.

There are other SARS algorithms in the 3590 tape drive. A part of SARS has been running on base 3590 tape drives since the first drive shipment in 1995; it requests drive cleaning when necessary and does some checking of hardware performance. SARS has been enabled in base 3590 tape drives that were shipped after January 1999. New 3590 tape drives are being shipped with SARS enabled in the microcode.

Another algorithm in the tape drive is concurrent SARS. This algorithm is run when errors occur in the drive or when some diagnostic tests are run. Concurrent SARS is used to help isolate a problem between the drive and the media. You can find additional information about SIMs and MIMs in the Magstar 3590 High Performance Tape Subsystem Introduction and Planning Guideand the Magstar 3590 High Performance Tape Subsystem User’s Guide. You can access online versions of these documents at one of the following Web sites:

What Kinds of Information Does SARS Report?

SARS reports the following kinds of information:

  • Degraded media (MIM)
  • Bad media (MIM)
  • Degraded drive (SIM)
  • Bad drive (SIM)
  • Preventive maintenance actions needed, such as drive cleaning (SIM)

Why Should I Enable SARS?

SARS messages are helpful in media management, which allows you to remove marginal tape cartridges from the library. SARS messages also indicate degrading tape drive hardware performance, which allows a hardware repair action before the hardware actually fails. This results in improved library performance and higher reliability of the tape subsystem.

What Should I Know Before I Enable SARS?

You need to be aware of the following before you enable SARS by installing the updated 3590 drive microcode:

  • SARS is designed to detect the gradual degradation of the performance of media and hardware.
  • MIMs from the tape drives are recommendations. It is the responsibility of the software or the customer to take action on the messages. The 3590 drive will not actually write-protect the tape cartridge when a read-only message is presented. VTS and Tivoli Storage Management (formerly ADSM) products are exceptions to this; they mark the tape as read-only.
  • The number of tape cartridges recommended for read-only in VTS and Tivoli Storage Management products may increase temporarily (indicated by an increase in the number of MIM message codes 60).
  • As you remove tape cartridges that are performing marginally from the library, the number of read/write errors will decrease. The rate of removal will depend on the tape cycle in the library.
  • When a tape cartridge is recommended for read-only status, you will continue to be able to access the data on it.
  • You will need to copy the data from read-only tape cartridges, then eject them from the library.
  • You will need to follow existing vendor warranty procedures for evaluation and possible replacement of tape cartridges that SARS has marked read-only. For warranty information about IBM tape cartridges, call 1-800-IBM-MEDIA.

How Do I Configure SARS?

SIMs and MIMs can be reported multiple times. A drive configuration option allows SARS to report the same SIM or MIM more than once. The time between repeat SIMs and MIMs is eight hours. A SIM will be reported when an error occurs, and it will be repeated eight hours later. Then it will be repeated for the last time eight hours later. The default option is to not repeat SIMs and MIMs.

The SARS reporting of SIMs and MIMs can be disabled if your host software does not support SIMs and MIMs.

Depending on your software, you may be able to select the SIMs and MIMs that you want SARS to report. For example, you may want to see only the acute severity SIMs and MIMs, or you may prefer to see all SIMs and MIMs that SARS sends to the host. Software configuration options and drive configuration allow you to filter SIMs and MIMs by severity code.

SIM Severity Codes

The SIM severity codes are:

  • Severity 0 code indicates that the tape drive requires service, but normal operation is not affected.
  • Severity 1 code indicates that the problem is moderate. The tape drive is operating in a degraded condition.
  • Severity 2 code indicates that the problem is serious. The tape drive is operating in a degraded condition.
  • Severity 3 code indicates that the problem is acute. The tape drive requires immediate service attention.

MIM Severity Codes

The MIM severity codes are:

  • Severity 1 code indicates that high temporary read or write errors occurred (moderate severity).
  • Severity 2 code indicates that permanent read or write errors occurred (serious severity).
  • Severity 3 code indicates that tape directory errors occurred (acute severity).

What Is a Service Information Message (SIM)?

A SIM alerts you that an abnormal operational condition in a 3590 or 3570 tape drive requires service attention. Information in the SIM identifies the affected drive, the failing component, the severity of the fault condition, and the expected operational impact of the pending service action. A SIM is a SCSI Log Sense page (see Figure 1 for a graphic view of the SIM format). This information helps you to initiate and expedite the appropriate recovery and service procedures in order to restore normal operation with maximum efficiency and minimal disruption.

A SIM contains the machine type, machine serial number, and Field Replaceable Unit (FRU), which allows the dispatch of the appropriate service personnel, along with the replacement parts required to correct the machine fault. This improves service response time and reduces the time required for machine repair. A SIM also contains a severity code, which allows you to determine the urgency of the problem and a service message, which advises you of the service impact.

Figure 1. SIM Format
Bytes\Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00-0F Page Code 31 RSVD Length Parm Code Parm Ctrl Parm Length SIM or MIM
1
Reserved
10-1F Microcode and link Level
2
Message Code
3
Reserved Excp Msg
4
SRVC Msg
5
Sev
6
RSVD Exception Data FRU Identifier
7
20-2F FRU Ident (cont) First FSC
8
Last FSC
9
Product ID Manufacturer
30-3F Mfg (cont) Plant of Manufacture Dash Sequence Number (Drive Serial Number)
10
40-4F Device Type Device Model Number
11
 
  • 1 SIM or MIM: 00 = No SIM or MIM present, 01 = SIM present, 02 = MIM present
  • 2 Microcode and Link Level
  • 3 Message Code: See Table 1.
  • 4 Excp Msg (Exception Message): See “SIM Exception Messages” on page 43.
  • 5 SRVC Msg (Service Message): See “SIM Service Messages” on page 44.
  • 6 Sev (Severity): See “SIM Severity Codes” on page 3.
  • 7, 8 and 9 are presented in hex. Use the conversion chart in Table 17 on page 36.
  • 10 SEQUENCE NUMBER (Drive Serial Number)
  • 11 Device Model Number: 423141 = B1A (No ACF), 423131 = B11 (ACF), 443141 = E1A (No ACF), 443131 = E11 (ACF)

What Are the SIM Message Codes?

Table 1 shows the hex and ASCII forms and a description of the SIM message codes.

Table 1. SIM Message Code Descriptions
Message Code (Hex) Message Code (ASCII) Description
3030 00 No Message: This is the default message indicating that the device does not have an error to report.
3430 40 Operator Intervention Required: An operator action is required at the device. For example, a magazine is full and needs to be replaced or emptied. Check the device error log for possible repair action.
3431 41 Device Degraded: The device is performing in a degraded state but can be used. A FID is displayed with the error message. Check the device error log for possible repair action.
3432 42 Device Hardware Failure: The device can not be used. A FID is displayed with the error message. Check the device error log for possible repair action.
3433 43 Service Circuits Failed, Operations not Affected: This error does not affect the performance of the device. The failure affects only circuits used for non-operational testing. A FID is displayed with the error message. Check the device error log for possible repair action.
3535 55 Clean Device: Load a cleaning cartridge in the device. The drive returns the cleaning cartridge following the cleaning procedure.
3537 57 Device has been cleaned: A cleaning cartridge has cleaned the drive.

What Is a Media Information Message (MIM)?

A MIM alerts you that an abnormal condition in a media (tape) volume requires your attention. Information in the MIM identifies the tape that has the abnormal condition. A MIM is a SCSI Log Sense page (see Figure 2 for a graphic view of the MIM format). A MIM contains the volume serial number of the bad tape and specifies what is wrong with the tape. This allows you to do maintenance within the tape library and to prevent unnecessary service calls due to the tape.

Figure 2. MIM Format
Bytes\Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00-0F Page Code 31 RSVD Length Parm Code Parm Ctrl Parm Length SIM or MIM
1
Reserved
10-1F Microcode and link Level
2
Message Code
3
Engineering Data Excp Msg
4
SRVC Msg
5
Sev
6
Reserved First FSC
7
20-2F First FSC
(cont)
VOLSER (Volume Serial Number)
8
Valid Flag
9
RSVD Product ID Manufacturer
30-3F Mfg (cont) Plant of Manufacture Dash Sequence Number (Drive Serial Number)
10
40-4F Device Type Device Model Number
11
 
  • 1 SIM or MIM: 00 = No SIM or MIM present, 01 = SIM present, 02 = MIM present
  • 2 Microcode and Link Level
  • 3 Message Code: See Table 2.
  • 4 Excp Msg (Exception Message): See “MIM Exception Messages” on page 43.
  • 5 SRVC Msg (Service Message)
  • 6 Sev (Severity): See “MIM Severity Codes” on page 3.
  • 7 First FSC: Engineering data
  • 8 VOLSER (Volume Serial Number)
  • 9 Valid Flag: 00 = VOLSER not valid, 01 = VOLSER valid
  • 10 SEQUENCE NUMBER (Drive Serial Number)
  • 11 Device Model Number: 423141 = B1A (No ACF), 423131 = B11 (ACF), 443141 = E1A (No ACF), 443131 = E11 (ACF)

What Are the MIM Message Codes?

Table 2 shows the hex and ASCII forms and a description of the MIM message codes.

Table 2. MIM Message Code Descriptions
Message Code (Hex) Message Code (ASCII) Description
3630 60 Bad Media, Read-Only Permitted: The tape drive will not actually write-protect the cartridge when this message code is presented. If you want to write to the data on this tape, it is recommended that you first copy the data to another tape cartridge. Then, remove this tape cartridge from the library.
3631 61 Rewrite Data if Possible: The data on the tape cartridge is degraded. Attempt to copy the data to a new tape cartridge or rewrite the data.
3632 62 Read Data if Possible: The tape directory is degraded. Attempt to read the tape to rebuild the tape directory.
3634 64 Bad Media, Cannot Read or Write: Remove the tape cartridge from the library. Data is likely lost without special tools to recover it.
3732 72 Replace Cleaner Cartridge: Order a new cleaner cartridge (3570 drives only).

See Also