laarcnew | comments | discord | tags | ask | show | place | submitlogin
Paper Review: Empirical Study on Crash Recovery Bugs in Distributed Systems (muratbuffalo.blogspot.com)
by nickpsecurity to programming reliability distributed 679 days ago | 3 comments


2 points by shawn 679 days ago

This is really good. It’s also a reminder to double check laarc’s backup strategy by spinning up a dev server conjured up solely from a backup and verifying that everything works immediately. Most companies don’t, and it amost killed Gitlab.

Laarc’s failure handling is kind of cool, and I’ve been meaning to write a bit about how it works. Essentially there is a top level shell script called news.sh. It does “while true: sleep 1; git pull; start the sever.”

In the olden days of one month ago, this was enough for deploying new features. I would push new code up and then tell the sever to shut itself off by setting (= quitsrv* t). About 10 seconds later the site would be back up and running with the new feature. If anything goes wrong, it’s a simple matter of reverting back to an older commit.

But lisp lets you do quite a lot better, and there are some neat aspects of the new system. Un momento while I run back to an actual keyboard to type it out with actual fingers.

-----


I'm off to work now. I"ll definitely read it on my breaks. :)

-----


I wrote a bit about this at https://www.laarc.io/item?id=672.

-----




Welcome | Guidelines | Bookmarklet | Feature Requests | Source | Contact | Twitter | Lists

RSS (stories) | RSS (comments)

Search: