Part 1 of ?: Building BIG serverless systems

Long prelude to conclusions about scaling big systems fully built through Serverless tools

April 26, 2021

This is going to be a short post, with the idea of being an intro for a future serie of more extended ones if seems interesting and if I get some time.

I wrote a post weeks ago sharing some thoughts about serverless architectures and their tradeoffs. I had a previous bias on the use of Serverless solutions (specifically on serverless functions).

I was thinking of them just as a way to trigger simple tasks without caring too much of managing resources, but I've never thought of build an entire system based on them. Things and thoughts change and now I see a big advantage on them. But serverless architectures and the frameworks that allow to work with them don't tell anything about how to build stuff, they just provide a way to build and deploy easily the systems. Without a proper architectural design, things can get really messy and the advantages of serverless functions could turn into cons.

At my current job, we're having some funny time refactoring properly part of our core codebase. With several years of tech debt accumulated, tech leaderships change of thoughts and employee turnovers and turnarounds, some of the services have developed an (ironic) interesting coupling to the serverless way of thinking and turning into real issues:

Serverless functions can be directly invoked inside another serverless function, instantaneously coupling lambdas.
For some time, AWS lambdas could not be triggered through message queueing systems (like SQS). Try to follow an event-driven approach on a Serverless based microservice mesh could be a nightmare.
Ephemeral existence of serverless containers (they execute their duty and close the process) can lead to consistency issues if async processes are not really well handled.

Yep, this is giving us some hard time while refactoring things.

But, the solution to our problem was quick to find and is quite simple to understand (or at least easy we know the problem we are handling). It is not as simple to implement, but we're doing it little by little.

Classic DDD strategic patterns basically told us to drop out the idea of serverless while we're modeling our system. We started rethink about our current domain, what are the contexts related to the business areas of the company, what we need to implement, and, after that, expose that through serverless.

With inspiration good old classic hexagonal architecture. , we're starting to rebuild things without caring on how the entrypoints are. Some conclusions that we arrived were:

Treat serverless solutions as another framework and just avoid coupling to them.
Thinking on each lambda as a controller or as an event handler that will call to internal layers, just as another part of the infrastructure layer. Not caring about which type of trigger will call our services, just defining contracts to use them.
Build the app from the domain to the external layers even if it's a simple solution . Once done that, you won't care if the entrypoints are done via Lambda, Azure function or via expressJS routes and they will scale easily.

About the internal calls to other lambdas... we're refactor them to be more event-driven. Or, for tricky cases, wrapping adapters over them (we're having in parallel a lot of conversations about domain modeling lately)

And, for the major architecture... Abstract also to the idea that services are run serverless (keeping an eye to performance/cold starts and similars, though). Build services as if they're another microservice more.

Other issues that we're having related to monitoring or latency are a tale for another day.