Erlang & Home Automation

Plum’s Lightpad[1] is unique from other lighting control and IoT offerings because it is a clustered and peer-aware device, the user is able to control any Lightpad on the same network from any other Lightpad (and smartphone): pinching to turn off your house or swiping with two fingers to dim a whole room.

Erlang is now a critical part of the Lightpad’s software and because of that choice we have eliminated many man-months of development time re-inventing a wheel that has already been expertly engineered. The ugly and noisy language syntax[2] aside, Erlang/OTP as a platform truly shines with tools general enough to be used in many unique circumstances (release handling, hot-code loading, static analysis, “OTP applications”); features that are not present or are not mature enough in other solutions (distributed RPC, a distributed / replicated key value database, intra-node connection management); and, most important, it’s opensource – with a vibrant ecosystem and community.

The inspiration for using Erlang – instead of C, C++, or Java – was borne of the inadequate and anxiety-inducing firmware upgrade process in the old solution; that is a disquisition I will write at a later date, but be assured it was not robust.

I used Erlang successfully at my previous startup WhooshTraffic and thought the built in release management and hot-code loading features were perfectly suited for upgrading a home’s lighting control systems; the lights should never turn off, even if briefly, when upgrading. In contrast, the previous firmware upgrade process required a whole system reboot, toggling the load it was controlling for a brief amount of time.

Another feature of the language, robust process supervision, is an invaluable tool and idiom that we use pervasively within the software to make certain that individual components always restart if something unexpected happens. The Lightpad is installed inside the wall and devoid of any physical means to connect a laptop or phone (bluetooth and wifi only), it must be tolerant of failure.

Distributed RPC and Clustering

Inversion of control is a key feature of the Lightpad, what I mean by that is: any Lightpad can control any other Lightpad on the same network in a mesh topology; any Lightpad can be controlled from any authorized smartphone; and, any Lightpad can be controlled via an AMQP connection to the Plum servers (remote control which is what the smartphone switches to when not on the same LAN as the devices). This is an inversion from the previous and crufty industry standard: centralized control in which everything must be wired to a central computer that can be manipulated via a control panel.

Pre-Erlang, we were trying to write the intra-node connection management logic ourselves (when it was in Ruby) but quickly learned how difficult that task is! Actually, it’s really hard. We had no built-in DNS node referencing system that hooks in effortlessly into a distributed RPC system (that we were also trying to write), no mature RPC semantics to effectively capture if we wanted to simply broadcast a command the node(s) or also wait for a reply back, connection management in the face of flaky WiFi was also a problem.

It’s Dead Simple

We got these core features for free and in a mature state once the move to Erlang was complete – the re-write was far from easy and it consumed weekends and nights for many months with trailing regression bugs over a couple more months before we ironed out many of the edge-cases or incomplete logic to reach feature-parity and above with the previous software. A few months of a re-write compared to the years spent by many intelligent programmers beating on these fundamental features, has already paid for itself with the confidence in our Lightpad application software and the ease with which we introduce new features that appear intuitive but are deeply complex and difficult under the hood.

%% `NodeRing` is a list of addresses in the form: plum@<uuid>
case C of
    {gesture, G}    ->
        rpc:eval_everywhere(NodeRing, gesture, perform, [G, remote]);
    {set_dimmer, Value, FadeTime} ->
        rpc:eval_everywhere(NodeRing, psoc, set_dimmer, [Value, FadeTime])
end

Builds and Release Management

Our build and release system is also something I’m quite proud of. The target is an ARM chip so Erlang/OTP had to be cross-compiled in order to build the Lightpad application with an automated build service[3]. It took me a few weeks to hone in on the solution but I ended up using Docker and building a container[4] with an x86 host Erlang installation and a cross-compiled ARM Erlang installation with all of OTP pre-dialyzed. I now have a consistent and self-contained build container that I use for development on my Linux laptop and the same container is used by CircleCI to do automated builds of our application when I push to production.1

CircleCI also runs the Dialyzer against the base OTP PLT to give us strong static analysis of the codebase before emitting an artifact.

If a build succeeds for production, CircleCI automatically uploads the package to S3 and notifies our administrative web application of the new version available. The administration application is the tool we use for targeting deployments to specific devices or auto-upgrading if the version is set to the minimum version.

Don’t forget, the package is built by Erlang’s release management tooling so it also contains all of the necessary hot-code loading upgrade commands for any of the changed sub-applications that ship with the package. Erlang, from there, takes care of unpacking and validating the package, versions, upgradeability, and actually upgrading the running system without ever taking it down.

Thoughts on Production Quality Erlang Code

This section will be brief because there are plenty of articles already written on this topic but I did want to offer a few of my notable observations:

Erlang’s motto is “let it fail”, if you’re using supervision trees idiomatically, it is good practice to simply pattern-match against what you expect a call to return and let it fail if you get something exceptional (the supervisor should restart your process) – there are exceptions, of course
If you have important and long-running processes that may be too small to warrant using a behavior, don’t attach them to your parent process! Move them out into their own module with an init/1 function and attach it to your supervisor tree – this will take care of cleaning up the process and restarting it (rogue processes attached to the gen_server that didn’t “cleanup” were one of the largest sources of hidden-effects and head banging for me)
Use the OTP patterns, idioms, and paradigms – they have been designed over a long period of time to enable the scalable and manageable construction and maintenance of large and complex applications
Use the dialyzer and type-annotations! It’s a good form of documentation and you get a lot of good static analysis that’s invaluable for large applications

I could not be more thankful for the existence of Erlang/OTP and it is a clear indicator to me of how much good opensource software does – this product would probably not be possible given the time and financial constraints of the business. It’s also a joy to work with.

Plum’s Lightpad: http://plumlife.com/products/
Elixir: A new language that targets the Erlang BEAM virtual machine, so it does have all the system features of Erlang but with a much cleaner syntax.
CircleCI: We use circleci for all of our builds (Haskell, Erlang, iOS, Android…)
Docker Erlang ARM: Base Erlang/OTP ARM container; Extended Erlang ARM container w/ rebar, rebar3, relx, and gpb