The recent Amazon S3 outage is a good reminder that humans make errors, and sometimes issue commands that they didn't mean to issue. On February 28th, Amazon's cloud offering S3 was partially offline for over four hours, because a human ran a routine command and got some of the arguments wrong. It's important to realize that it's not the human that is to blame, but rather the tool that did not sufficiently check the human's arguments. "Be more careful" is not a viable strategy to ensure that a software system works reliably, as humans will always eventually make errors. Instead, we need to design software with safe interfaces that check for bad arguments. In this post, I'll show another example of a badly designed interface, and how to fix it.
I don't have a bullet-proof definition of what safe interfaces look like and it depends a lot on the context. But some guiding principles might be:
- User inputs are validated, and invalid ones rejected. For valid inputs that are uncommon, the user might be asked to confirm.
- Actions that have big effects have to be confirmed, and the expected effects are shown to the user. For instance, if a command takes all of your data center offline, then asking for confirmation is the least that should be required.
- Whenever possible, actions can be undone, so that errors can be easily recovered from.
Deleting Files on the Command Line
Another example of a tool without a safe interface is
rm. It is an ancient tool on Unix systems to delete files, and programmers and system administrators often use it on a daily basis. However, files deleted this way are not meant to be recovered1, but instead deleted permanently. This leaves the door wide open for data loss if the arguments to
rm are ever entered incorrectly. Unsurprisingly, the internet is full of people asking how to recover files inadvertently deleted this way. I don't blame any of these people, as
rm is poorly designed to be used on a regular basis; its operation is permanent, and there are no safeguards against deleting even the most important files. Using
rm -rf /, you can even delete everything on your computer (including
So, we need a better alternative, and
trash-cli is just that. It provides the command
trash-put with roughly the same interface as
rm, but moves files to the trashcan rather than permanently deleting them. This allows inadvertently deleted files to be easily restored. To avoid having to remember not to use
rm, we can instead create an alias by adding the following line to our
Possibly an even better solution is to use a new name, and override
rm to remind you not to use it:
alias rm='echo "use d instead to avoid losing files."; false' alias d='trash-put'
If for some reason we really need to run
rm instead of
trash-put, we can still do so via
\rm to bypass our alias. But doing so should only be done in very rare cases and after carefully checking the command. And because the alias only shadows the
rm binary in the shell, we can still run scripts that rely on
No more lost files, as they can be easily restored with
trash-restore. While this solution still leaves room for further improvement (e.g., checking against deleting files that aren't meant to be deleted, such as system files), it's a big step towards a safe interface to deleting files.
Sometimes, files can still be recovered with specialized recovery tools, though this is unreliable and only works well if the error is detected relatively quickly, before the deleted files get overwritten. ↩