MAP4969 Recovery actions for special PCIe-related I/O enclosure errors (Model 961)

This MAP calls for SRCs that require special repair actions to be completed by the service representative or the next level of support.

MAP4969 Section-1

Procedure

  1. Does the FRU list, in the serviceable event, that sent you here contain a symbolic FRU similar to Invalid-MTMS-cpssebay**?
  2. When the FRU list contains a symbolic FRU similar to Invalid-MTMS-cpssebay** the location code is invalid and cannot be used to determine the failing I/O enclosure.
  3. To determine the cpssebay** value from the symbolic FRU of Invalid-MTMS-cpssebay**, use the first column of Table 1 or Table 2.
    Table 1. Symbolic FRU location to type-location translation(standard configuration)
    Symbolic FRU location code Location Type-location I/O enclosure number
    cpssebay00 1B1 1400-1B1 0
    cpssebay01 1B2 1400-1B2 1
    cpssebay02 1B3 1400-1B3 2
    cpssebay03 1B4 1400-1B4 3
    cpssebay04 2B1 1400-2B1 4
    cpssebay05 2B2 1400-2B2 5
    cpssebay06 2B3 1400-2B3 6
    cpssebay07 2B4 1400-2B4 7
    Table 2. Symbolic FRU location to type-location translation (eight I/O enclosures in single rack)
    Symbolic FRU location code Location Type-location I/O enclosure number
    cpssebay00 1B1 1400-1B1 0
    cpssebay01 1B2 1400-1B2 1
    cpssebay02 1B3 1400-1B3 2
    cpssebay03 1B4 1400-1B4 3
    cpssebay04 1B5 1400-1B5 4
    cpssebay05 1B6 1400-1B6 5
    cpssebay06 1B7 1400-1B7 6
    cpssebay07 1B8 1400-1B8 7
  4. Determine the location code in Table 1 or Table 2, second column for the symbolic FRU location code in the FRU list.
  5. Convert the three-character location code from the prior step to a physical location of the I/O enclosure in the rack. See Figure 1.
    Figure 1. I/O enclosure locations (front)
    I/O enclosure locations (front)
    Note: Locations 1B5, 1B6, 1B7 and 1B8 exist in DS8870 all-flash only
  6. Determine the serial number of the I/O enclosure by reading the MTMS label.
  7. Open the Advanced System Management (ASM) menu:
    1. From the navigation area, click Storage Facility Management > Server view.
    2. From the bottom Task area, click Operations > Launch Advanced System Management (ASM).
    3. On the launch ASM interface confirmation, click OK.
    4. The management console web browser is opened, and the ASM login panel is displayed.
  8. Log in as admin with a password of admin2107.
    Notes:
    1. If you are logged in and not active for 15 minutes, your session expires.
    2. If you make five invalid login attempts, your user account is locked out for five minutes and none of the other accounts are affected.
  9. Reset the I/O enclosure MTMS from the ASM menu:
    1. Expand System Configuration.
    2. Select Configure IO Enclosures.
    3. Observe the Type-Model column in the displayed Enclosure Configuration table.
    4. Find the row that contains the Type-Model determined from step 3.
    5. Select the radio button for that I/O enclosure.
    6. Click Change settings.
    7. Modify the Type-Model field to match and Type-Location field in Table 1 for the I/O enclosure.
    8. Modify the Serial number field to match the serial number read from the I/O enclosure machine/type/model/serial number label.
    9. Click Save Settings.
  10. Update the HMC microcode objects for the I/O enclosure machine/type/model/serial number by using a pseudo repair of the PCIe and SPCN card FRU. The update causes the I/O enclosure to be power-cycled.
    1. From the navigation area, click Storage Facility Management > storage facility.
    2. From the Task area, click Exchange Parts > Exchange IO Enclosure and Components.
    3. Click Show I/O Enclosures and select the enclosure location.
    4. Click Show FRUS.
    5. Select I/O Enclosure PCIe/SPCN Card and then click Exchange FRU.
    6. When prompted to replace the FRU, do not disconnect the PCIe and SPCN cables from the card. Do not remove the card.

      Continue with the repair.

    7. If the repair is successful, exit this MAP and ensure that any related serviceable events are closed.
    8. If the repair fails with the same error, replace the I/O enclosure PCIe / SPCN card.

MAP4969 Section-2

Procedure

  1. Find your SRC in Table 3.
    Table 3. Repair actions for special SRCs
    SRCs that require special repairs
    SRCs Action
    BE1E25AA A single CEC to I/O enclosure PCIe link fault was detected during a CEC service action. Go to MAP4969 Section-3.
    BE1E25AB Multiple CEC to I/O enclosure PCIe link faults were detected during a CEC service action. Go to MAP4969 Section-3.
    BE370012 PCIe I/O enclosure discovery failure (missing I/O enclosure). Go to MAP4969 Section-4.
    BE38256B PCIe enclosure discovery/configuration failure. Could not initialize path from local server to I/O enclosure. Go to MAP4969 Section-3.
    BE38256C I/O enclosure FPGA update image corrupted on local server. Contact your next level of support.
    BE38256D PCIe I/O enclosure FPGA error. Contact your next level of support.
    BE38256E PCIe I/O enclosure MTMS unknown/invalid. Contact your next level of support.
    BE38256F PCIe I/O enclosure mis-cabling detected. Go to MAP4969 Section-3.
    BE382572 Error occurred during I/O enclosure error data collection. Go to MAP4969 Section-3.
    BE38257B PCIe interface to PCIe I/O enclosure down. Go to MAP4969 Section-3.
    BE382563 Multi-PCIe link degraded detected on the local server. Contact your next level of support.
    BE382566 PCIe I/O enclosure discovery/configuration failure. Go to MAP4969 Section-3.
    BE382567 Invalid server config. Contact your next level of support.
    BE382574 One LPAR cannot communicate the I/O enclosure; a system failover is required. Go to MAP4969 Section-3.
    BE382575 PCIe I/O enclosure discovery failure (missing an I/O enclosure). Go to MAP4969 Section-4.
    Any other SRC Contact your next level of support.
  2. Use the Action column entry to continue the repair.

MAP4969 Section-3

About this task

The serviceable event FRU list that sent you here contains one or more cables and possibly more FRUs.
Important: Both ends of each PCIe cable are displayed in the FRU list. Only the first cable location code is available to select for repair or replace for each cable in the FRU list. The subsequent CBLCONT location code shows where a cable continues to connect to, but is not available to select for repair or replace.

Procedure

  1. Inspect both ends of each PCIe cable listed in the FRU list.
    1. Do not plug or unplug the cable.
    2. Refer to the following cabling diagrams based on the number of installed I/O enclosures in the machine. The CBLCONT location code that is listed is the port on the I/O enclosure where the cable is supposed to be connected.

      Based on the appropriate cable figure, check each end of the cable that is listed on the screen that sent you here to ensure that it is properly plugged into the correct connector.

    3. Observe the body of the cable to ensure that it is not damaged.
    Figure 2. Model 961, two I/O enclosures
    Model 961, two I/O enclosures
    Figure 3. Model 961, four I/O enclosures
    Model 961, four I/O enclosures
    Figure 4. Model 961, eight I/O enclosures
    Model 961, eight I/O enclosures
    Figure 5. Model 961, eight I/O enclosures in single rack
    Model 961, eight I/O enclosures in single rack
  2. Is the PCIe cable properly plugged and not damaged?
    • Yes, go to the next step.
    • No, go to step 5
  3. The cable is properly plugged and is not damaged. Did
    you reach this step after replacing both the I/O enclosure PCIe and SPCN card and the I/O enclosure backplane?
    • No, go to the next step.
    • Yes, a pseudo-repair of the PCIe and SPCN card might recover this condition. Continue with the following steps:
      1. Return to the screen that sent you here.
      2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
      3. To the question, "Did you exchange any parts?" click No and then click Next.
      4. To the question, "Did you isolate the problem,"? click Yes and then click Next.
      5. The current repair action ends, but the serviceable event remains open. Use
        the Exchange Parts menu to complete a pseudo-repair of the I/O enclosure PCIe and SPCN card:
        • Storage Facility Management > > storage facility > > Exchange Parts

        Remove I/O enclosure power when instructed to do so in the exchange procedure, but you do not need to uncable or remove the PCIe and SPCN card.
  4. The cable is properly plugged and is not damaged. The
    I/O enclosure PCIe and SPCN card and the I/O enclosure backplane were not both replaced.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts?" click No and then click Next.
    4. To the question, "Did you isolate the problem,"? click No and then click Next.
    5. The next FRU in the list is displayed. Continue the repair by replacing the remaining FRUs until the problem is fixed. Exit this MAP.
  5. The cable is incorrectly plugged or damaged. Did a failed IO enclosure installation lead you to this MAP?
    • Yes
      1. Exit this repair.
      2. Retry the original MES installation with cables properly connected.
    • No, the incorrect plugging of the cable or damage to the cable occurred during a repair.
      1. Return to the screen that sent you here.
      2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
      3. To the question, "Did you exchange any parts?" click No and then click Next.
      4. To the question, "Did you isolate the problem?" click No and then click Next.
      5. When the next FRU in the list is displayed, pretend that the other FRUs in the previous FRU list are not available onsite to be replaced.
      6. When asked if the FRU is available to be replaced, answer no. This answer causes each FRU in the list to be displayed until the incorrectly plugged cable or the damaged cable is displayed.

        When the incorrectly plugged cable or the damaged cable is displayed, do a normal FRU replace.

      7. When the repair is complete, exit this MAP.

MAP4969 Section-4

Procedure

  1. Observe the FRU list in the serviceable event details that sent you here. It should include one or more of the following FRUs:
    • I/O enclosure PCIe/SPCN card
    • I/O enclosure backplane
  2. Display open serviceable events that need repair. Is there any other serviceable event with either FRUs determined in step 1 or with other FRUs such as power supply or fan from this I/O enclosure?
    • Yes, exit this MAP and attempt to repair that serviceable event first.

      If that repair does not correct this problem, return here and continue with the next step.

      If that repair does correct this problem, remember to also close this serviceable event.

    • No, go to the next step.
  3. Inspect both ends of both PCIe cables that are associated with the I/O enclosure listed in the FRU list, that is, intended to be connected to this I/O enclosure.
    1. Do not plug or unplug the cables.
    2. Refer to Figure 2, Figure 3, and Figure 4 cabling diagrams based on the number of installed I/O enclosures in the machine. Based on the appropriate cable figure, check each end of both cables that are intended to be connected to this I/O enclosure to see if they are properly plugged into the correct connector.
    3. Observe the body of the cable to ensure that it is not damaged.
  4. Are the PCIe cables to the I/O enclosure properly plugged and not damaged?
    • Yes, go to the next step.
    • No, go to step 6.
  5. The cables are properly plugged and are not damaged.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts?" click No and then click Next.
    4. To the question, "Did you isolate the problem?" click No and then click Next.
    5. The next FRU in the list is displayed. Continue the repair by replacing the remaining FRUs until the problem is fixed.

      Exit this MAP.

  6. At least one cable is incorrectly plugged or damaged. Did a failed IO enclosure installation lead you to this MAP?
    • Yes
      1. Exit this repair.
      2. Retry the original MES installation with cables properly connected.
    • No, the incorrect plugging of the cable or damage to the cable occurred during a repair.
      1. Return to the screen that sent you here.
      2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
      3. To the question, "Did you exchange any parts?" click No and then click Next.
      4. To the question, "Did you isolate the problem?" click No and then click Next.
      5. The next FRU in the list is displayed. Continue the repair on this FRU, but when instructed to replace the FRU, do not replace that FRU, but instead replace the damaged cables connected to the I/O enclosure.
      6. If the repair completes successfully, exit this MAP. Otherwise, contact your next level of support.

x