On the Importance of Safe Interfaces
The recent Amazon S3 outage is a good reminder that humans make errors, and sometimes issue commands that they didn't mean to issue. On February 28th, Amazon's cloud offering S3 was partially offline for over four hours, because a human ran a routine command and got some of the arguments wrong. It's important to realize that it's not the human that is to blame, but rather the tool that did not sufficiently check the human's arguments. "Be more careful" is not a viable strategy to ensure that a software system works reliably, as humans will always eventually make errors. Instead, we need to design software with safe interfaces that check for bad arguments. In this post, I'll show another example of a badly designed interface, and how to fix it.
Safe Interface
I don't have a bullet-proof definition of what safe interfaces look like and it depends a lot on the context. But some guiding principles might be:
- User inputs are validated, and invalid ones rejected. For valid inputs that are uncommon, the user might be asked to confirm.
- Actions that have big effects have to be confirmed, and the expected effects are shown to the user. For instance, if a command takes all of your data center offline, then asking for confirmation is the least that should be required.
- Whenever possible, actions can be undone, so that errors can be easily recovered from.
Deleting Files on the Command Line
Another example of a tool without a safe interface is rm
. It is an ancient tool on Unix systems to delete files, and programmers and system administrators often use it on a daily basis. However, files deleted this way are not meant to be recovered1, but instead deleted permanently. This leaves the door wide open for data loss if the arguments to rm
are ever entered incorrectly. Unsurprisingly, the internet is full of people asking how to recover files inadvertently deleted this way. I don't blame any of these people, as rm
is poorly designed to be used on a regular basis; its operation is permanent, and there are no safeguards against deleting even the most important files. Using rm -rf /
, you can even delete everything on your computer (including rm
).
So, we need a better alternative, and trash-cli
is just that. It provides the command trash-put
with roughly the same interface as rm
, but moves files to the trashcan rather than permanently deleting them. This allows inadvertently deleted files to be easily restored. To avoid having to remember not to use rm
, we can instead create an alias by adding the following line to our ~/.bashrc
or ~/.zshrc
:
alias rm='trash-put'
Possibly an even better solution is to use a new name, and override rm
to remind you not to use it:
alias rm='echo "use d instead to avoid losing files."; false'
alias d='trash-put'
If for some reason we really need to run rm
instead of trash-put
, we can still do so via \rm
to bypass our alias. But doing so should only be done in very rare cases and after carefully checking the command. And because the alias only shadows the rm
binary in the shell, we can still run scripts that rely on rm
.
No more lost files, as they can be easily restored with trash-restore
. While this solution still leaves room for further improvement (e.g., checking against deleting files that aren't meant to be deleted, such as system files), it's a big step towards a safe interface to deleting files.
-
Sometimes, files can still be recovered with specialized recovery tools, though this is unreliable and only works well if the error is detected relatively quickly, before the deleted files get overwritten. ↩
Questions or comments? Send me an email or find me on twitter @stefan_heule.