kill -9

A commentator at hackernews asked how i think about -9. In my opinion: It’s widespread use is a similar plaque like the –f switch. And this is pretty easy to explain (I’m simplifying things a bit). -9 is a shorthand for SIGKILL. When you send a SIGKILL to a process, the process is terminated immediately. You can’t catch this signal, you can’t ignore it. A kill with -9 sends this SIGKILL to a process. A kill without -9 sends a SIGTERM to process. It terminates the process like SIGKILL. However a process is allowed to catch it in order to execute a signal handler … or just ignores to ignore it. A signal handler is nothing more than a code path that is executed when the process receives a signal. So when you kill a process with a normal kill you give the process the chance to clean up behind itself, to make files consistent, to roll back changes in the case the process isn’t using some transactional mechanisms when changing data, to delete temporary files … and so on … It’s a good style to write such signal handlers and in many programming languages it’s pretty easy. For example in perl:

$SIG{TERM}=\&signalhandler_TERM; 

sub signalhandler_TERM{ 
     $SIG{TERM} = \&signalhandler_TERM; 
     print "leaving process\n";
     #some code to clean up behind you 
     exit(1); 
}

When you send a -9 to a process you take away this chance from the process. It’s killed instantly … even if it just started to modify your files, fscking up your data in order to put it in a new form, even when you have created dozens of temporary files filling up /tmp. Things like that … Killing a process with -9 is the last possibility. However I see people using it too often too early. A second after the normal kill is send a pgrep on the process follows. Still there and the sword of -9 is falling down. When a process doesn’t disappear immediately after sending the SIGTERM, it may be just busy to follow your order of terminating itself and is cleaning up things. When your application is dependent to precious resources at cleaning up (for example IOPS on your rotating rust) the process of cleaning up may take a while. The implicit question in any process, that doesn’t react to a normal kill via SIGTERM is the question why it doesn’t react to the signal. Just sending a -9 when a normal kill didn’t worked is like “Do not care”. Monitoring the process with truss or strace what the heck the process is doing after getting the SIGTERM is a good first step. Perhaps you see some cleanup work and know that you just have to wait a little bit longer. Writing a core dump of the process with gcore is often a good second step to save evidence for future research why the process didn’t reacted. And then … and only then … a kill -9 may be feasible. In short: <ul

  • Always try a normal kill first. There is just one execption.
  • The exception: You know that the integrity of data is depending on the fact, that the process has no chance to write anything for example by executing a signal handler. Consider it as something like a forced panic on process level. The panic is the admins friend as it ensures that nothing is written to disk as soon as the OS is detecting something worth to trigger a panic. A panic isn't a "insult the admin"-thing … it's about protecting data.
  • kill -9 is your emergency stop switch and shouldn't be part of your normal administrative procedures in any way
  • Just sending a -9 after a normal kill without thinking about reasons is bad administrative style.
  • </ul>