Juniper EX – Switch boots from backup root partition after file system corruption on the primary root partition

Juniper Networks – [EX] Switch boots from backup root patition after file system corruption on the primary root partition

Issue / Symptom:
this issue commonly occurs after a power outage or mostly if you interrupt the boot / reboot phase after a power outage (power flap).

root@switch-1> show chassis alarms

1 alarms currently active
Alarm time               Class  Description
2012-11-19 10:31:37 GMT  Minor  Host 0 Boot from backup root

or the banner message

--- JUNOS 11.4R1.6 built 2011-11-15 10:11:59 UTC
***********************************************************************
**                                                                   **
**  WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE      **
**                                                                   **
**  It is possible that the primary copy of JUNOS failed to boot up  **
**  properly, and so this device has booted from the backup copy.    **
**                                                                   **
**  Please re-install JUNOS to recover the primary copy in case      **
**  it has been corrupted.                                           **
**                                                                   **
***********************************************************************

better possibility: request system snapshot media internal slice alternate

Resilient Dual-Root Partitions behavior:
juniper switches will ever reboot from the “active” partition, you can restore the alternate (commonly the primary partition) without downtime with the command:
request system snapshot media internal slice alternate

verify whether primary partition is restored with the command

show system storage partitions


Backup Partition: da0s2a
 <– this is the backup slice 
Currently booted from: backup (da0s2a) <– shows booted from that slice

or with
show system snapshot media internal

… Information for snapshot on internal (/dev/da0s2a) (backup) <– provides info for this slice/partition the switch booted off of and the date the file system was created

Creation date: Feb 14 05:42:42 2012    <– if less than alarm date then customer should snapshot (it is a good way to confirm

To go back to the Primary partition, you can use the command 

request system reboot slice alternate media internal 

to reboot immediately from the primary partition.

SUMMARY:

This article describes the issue of an EX Switch booting from the backup root partition, after a file corruption occurs on the primary root partition.

PROBLEM OR GOAL:

EX switches running Junos Release 10.4R3, or later, have added resiliency based on the “resilient dual-root partition”, which if the switch detects a corruption on the primary root file system, it boots from the alternate root partition. 

When this occurs, you are notified in two ways: Alarm and Warning Banner

Alarm: 

The following alarm message is generated:

user@switch> show chassis alarms
1 alarms currently active
Alarm time Class Description
2011-02-17 05:48:49 PST Minor Host 0 Boot from backup root


Warning: 

****************************************************************************************
** **
** WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE **
** ** 
** It is possible that the primary copy of JUNOS failed to boot up **
** properly, and so this device has booted from the backup copy. **
** **
** Please re-install JUNOS to recover the primary copy in case **
** it has been corrupted. **
** **
****************************************************************************************

CAUSE:

It is likely that the file system became corrupted due to a sudden power loss, or ungraceful shutdown of the EX Switch.

SOLUTION:

Repairing the primary partition when it is corrupted:

  • When the primary partition detects a corrupt, the device boots from the backup partition; which then becomes the active partition. Remember that after every successive reboot, the switch will try to reboot from the current active partition.
  • You can repair the primary partition, by using request system snapshot media internal slice alternate without any downtime. No reboot is required after running this command.  However the Alarm and Banner will be displayed.

Note: As long as both of the partitions are healthy, there is no issue with running the switch on either of them. You only have to ensure that both the partitions are healthy, so that fail over can be done transparently between the two partitions, in case of any file corruption.


Verification:

To verify if the primary partition is rebuilt, run one of the following show commands. The same commands also inform about which partition is the current active partition.

show system storage partitions

Sample output:

root> show system storage partitions 
fpc0:
--------------------------------------------------------------------------
Boot Media: internal (da0)
Active Partition: da0s1a
Backup Partition: da0s2a <-- this is the backup slice 
Currently booted from: backup (da0s2a) <-- shows booted from that slice

Partitions information:
Partition Size Mountpoint
s1a 184M altroot 
s2a 184M / 
s3d 369M /var/tmp 
s3e 123M /var 
s4d 62M /config 
s4e unused (backup config)


                                                                                        OR

show system snapshot media internal

Sample output:

root> show system snapshot media internal 
Information for snapshot on internal (/dev/da0s1a) (primary)
Creation date: Feb 24 11:32:07 2012
JUNOS version on snapshot:
jbase : 10.4I20120224_1123_bshekar
jcrypto-ex: 10.4I20120224_1123_bshekar
jdocs-ex: 10.4I20120224_1123_bshekar
jkernel-ex: 10.4I20120224_1123_bshekar
jroute-ex: 10.4I20120224_1123_bshekar
jswitch-ex: 10.4I20120224_1123_bshekar
jweb-ex: 10.4I20120224_1123_bshekar
jpfe-ex42x: 10.4I20120224_1123_bshekar
Information for snapshot on internal (/dev/da0s2a) (backup) <-- provides info for this slice/partition the switch booted off of and the date the file system was created
Creation date: Feb 14 05:42:42 2012    <-- if less than alarm date then customer should snapshot (it is a good way to confirm 
JUNOS version on snapshot:
jbase : 11.2-20120214.0
jcrypto-ex: 11.2-20120214.0
jdocs-ex: 11.2-20120214.0
jkernel-ex: 11.2-20120214.0
jroute-ex: 11.2-20120214.0
jswitch-ex: 11.2-20120214.0
jweb-ex: 11.2-20120214.0
jpfe-ex42x: 11.2-20120214.0


To go back to the Primary partition, you can use the request system reboot slice alternate media internal command. If you do not use this command, the switch will then boot from the backup partition, which is the current Active partition, on successive reboots.

The switch will automatically reboot from the primary partition, which is now the active partition, only when the backup partition gets corrupted. When a primary partition gets corrupted, you will receive the alarm as mentioned above. 

Note: This alarm does not get cleared, even if you repair the primary partition. The purpose of this alarm is to inform the users that the device is rebooted from the backup partition, so tthat he administration should take necessary actions to repair the primary partition. 

Step-by-step recovery procedure for this situation:

  1. Copy the Junos image from the backup partition to the primary partition, by using the following snapshotcommand:

    request system snapshot media internal slice alternate


    Note: This step ensures that you have consistent images on both the primary and backup partitions.

  2. The above command ensures that the alternate partition is repaired, without requiring a reboot. You can verify both the partitions by using the following command:

    show system storage partitions

  3. The command used in step 1 will only repair the partition and not clear the alarm. So, you will still see the following alarm:

    root> show system alarms 
    2 alarms currently active
    Alarm time Class Description
    2012-03-02 13:01:03 UTC Minor Host 0 Boot from backup root <-- shows date stamp of alarm

  4. To get rid of the above alarm, use the following command to ensure that the switch boots from the primary partition:

request system reboot slice alternate media internal

The system, after the above command is executed, will reboot from the primary partition. The alarm or the warning message will no longer be displayed.

      5.    The following commands are issued to verify the Junos image installed on each slice:

                   user@switch>show system snapshot media internal slice 1 
        user@switch>show system snapshot media internal slice 2

RELATED LINKS: 

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s