Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Sun-Oracle blog link fix, name blog authors

...

There are some things you can do (in advance, sadly), that can help to narrow this down

/etc/system settings

Add these lines to /etc/system file:

...

On SPARC systems, apic_panic_on_nmi of course does not exist, in that case the easiest thing to do is to break to the PROM console and force a panic with sync.

More on the deadman timer in Solaris

When enabled, the deadman timer will cause a level 15 interrupt to fire on each CPU every second, which will in turn cause the kernel lbolt variable to be updated.

If the deadman timer detects that that lbolt variable hasn’t changed clock() hasn't run on that CPU for a period of time, it will induce a panic, which will cause a core file to be written to /var/crash (or the location you configured with dumpadm).

If you would like the deadman to wait more (or less) than the default timeout prior to inducing a panic, you can set the “snoop_interval” variable to the desired number of seconds * 100000 (the following example line in /etc/system file will induce a panic if the lbolt variable hasn’t been updated clock hasn't ticked after 90 seconds):

Code Block
set snoop_interval=9000000

This is a great feature, and can help isolate nasty bugs that result in system hangs. Since this feature CAN result in a system panic, you should take this into account prior to using it. The author is not liable for misuse. ;)

While hung

If your system is hung, and you ideally have the above two settings in place, consider waiting to see if the deadman timer will trigger. If it does not, remote management on your system may allow you to inject an NMI and force a dump that way.

Sending NMI from IPMI remote management console

Sourced from:

...

Code Block
-> cd /SP/diag
/SP/diag

-> show

 /SP/diag
    Targets:
        snapshot

    Properties:
        generate_host_nmi = (Cannot show property)
        state = disabled

    Commands:
        cd
        set
        show

-> set generate_host_nmi=true
Set 'generate_host_nmi' to 'true'

Sending NMI to a VirtualBox VM

Sourced from:

Wiki Markup
I've recently starting using VirtualBox instead of physical machines for some of my basic functional testing. When doing some types of kernel development it is often necessary to force the system into {{kmdb}}.

The F1-A keystroke does this on OpenSolaris x86 systems by default, however that isn't going to work with VirtualBox because that keystroke will be grabbed by some very low level kernel routines in the (OpenSolaris-based) host and never reaches the guest.
So we need an alternate way of getting a break to the guest OpenSolaris from the host one.
I was sure someone else must have worked this out before. I didn't get the full answer from a quick google search but I did find all the parts.
The CLI for VirtualBox can send an NMI (Non Maskable Interupt) to any running guest. OpenSolaris can be configured to drop into {{kmdb}} or force a panic when receiving an NMI.

In the guest put this into {{/etc/system}} and reboot:

{code:title=/etc/system addon}
set pcplusmp:apic_kmdb_on_nmi=1
{code}

Or to set it interactively do:
{code}
# echo apic_kmdb_on_nmi/W1 | mdb -kw
# mdb -K
{code}

Then with the VirtualBox CLI we can send an NMI to our guest:
{code}
$ VBoxManage controlvm _ZFS_Crypto_Test_ injectnmi
{code}

Nice easy solution.

Though I do now wonder why we don't have some default action for when an NMI is received - but then, not everyone cares about getting a dump or getting into {{kmdb}}!

Keyboard breakout into the kernel debugger

If neither is applicable, boot your system with the -k flag on the kernel command line, and while hung press F1-A (that is, press a while holding the F1 key, as if it were a shift); may be STOP-A on Sun keyboards with Sun boxes. This will in theory enter the kernel debugger (kmdb) on the console (if X is running, you will not be able to see the console, and should then type $<systemdump and press return – kmdb may be listening, even though you can't see it.)

...

For SPARC systems an equivalent option may be to break out to PROM with STOP-A keypress (or get access to PROM remotely via Serial LOM or RSC/ALOM/ILOM) and send a break command to suspend the OS and enter the kernel debugger, if loaded. You can resume the OS (if not hung) by the go command.

More info

See blogs:

...