laarctags | new | comments | ask | show | place | submitlogin
Paper Review: Empirical Study on Crash Recovery Bugs in Distributed Systems (
by nickpsecurity to programming reliability distributed on Jan 21, 2019 | 3 comments

2 points by shawn on Jan 21, 2019

This is really good. It’s also a reminder to double check laarc’s backup strategy by spinning up a dev server conjured up solely from a backup and verifying that everything works immediately. Most companies don’t, and it amost killed Gitlab.

Laarc’s failure handling is kind of cool, and I’ve been meaning to write a bit about how it works. Essentially there is a top level shell script called It does “while true: sleep 1; git pull; start the sever.”

In the olden days of one month ago, this was enough for deploying new features. I would push new code up and then tell the sever to shut itself off by setting (= quitsrv* t). About 10 seconds later the site would be back up and running with the new feature. If anything goes wrong, it’s a simple matter of reverting back to an older commit.

But lisp lets you do quite a lot better, and there are some neat aspects of the new system. Un momento while I run back to an actual keyboard to type it out with actual fingers.


I'm off to work now. I"ll definitely read it on my breaks. :)


I wrote a bit about this at


Welcome | Guidelines | Bookmarklet | Feature Requests | Source | API | Contact | Twitter | Lists

RSS (stories) | RSS (comments)