AWS serverless debacle

This week Amazon released a blog stating how they have managed to reduce their AWS bill by 90% shifting from microservices to a monolith. Its a very surprising read given that Lambda functions are a huge revenue stream for AWS and no doubt the first service a lot people think about when they hear serverless. This blogs seems to have kicked off the age old microservices vs monolith debate again!

The goal of their implementation was to create software which monitored every stream viewed by customers and detect for quality issues and then execute a process to fix it. You can imagine — given its amazon — they need their software to be scalable, so they took the distributed route:

They used multiple Step functions and Lambdas all with different responsibilities. The AWS Lambda entry point was used to kick off the process, which calls the Media Conversion Service which converts input audio/video streams to buffers that are then stored in the S3 bucket. These converted audio/video files are then used by the detectors which use machine learning to analyse for issues. The results of the analysis are then aggregated by another Lambda which then pushes the data to a second S3 bucket. The detectors send real-time notifications using SNS when the defect is found.

Their main issue was due to the overhead in sending data between the services, given that this process needs to execute once per frame of audio/video. Each frame of audio/video would effectively execute 5 network hops:

  1. Entry Point calls Media Conversion Service
  2. Media Conversion Service persists in S3
  3. Detector requests latest frame
  4. Detector sends to aggregator
  5. Aggregator sends to S3

That’s 5 times the data is having to be serialised/deserialised, multiple times per second, for each customer.

They realised their mistake and decided to pull it all into a single process, a monolith. In doing this they no longer needed the temporary S3 bucket where the audio/video frames were being persisted.

The architecture is essentially the same as it was in distributed example, it’s just now tucked up in a single ECS task. For each frame of audio/video they are now only making two network hops:

  1. Initial request to start analysis
  2. Persisting the aggregated results

Much better.

Thoughts on the AWS approach

My initial thought after reading the article was that their heart was in the right place when designing the system, they wanted to make it scalable. They mention that the reason they took this initial approach was to build the service quickly, which is a testament to the simplicity of serverless.

I personally would have thought creating the service as a monolith to begin with would have been a quicker solution, and then consider splitting it based on reliable metrics. As a monolith you’d only need to worry about the infrastructure for a single ECS task — which is quite harrowing, having written some Terraform to do just that — and the aggregated results bucket, whereas with a distributed approach you’re having to worry about the infrastructure for all the different services and communication between those services; a much bigger undertaking.

I don’t believe this post by Amazon should take any credibility away from the serverless approach. It definitely has its use cases, this just wasn’t one of them.

As of writing this I’m currently undertaking a big project at work converting microservices into a monorepo. With the idea that we can get the benefits of both monoliths and microservices. Services will be independently deployable whilst still allowing us to easily share modules. We’re currently in the midst of a big migration from Azure to AWS and in doing so has given us the freedom to reconsider some of the earlier architecture choices.

One problem that tends to occur with microservices is code duplication. Let’s say you have 8 microservices, all with their own persistence. All of these services are going to need some layer to communicate with the database. What do you do in this situation? Duplicate the code, copy it from project to project? Create a Nuget package? Create another microservice which handles all the persistence? Maybe this is a bit of a noddy example, but it’s something that has cropped up recently, we’ve just got a separate Dynamo client that we reference by all the projects that require it, and it’s lovely. We now know that if some modifications happen to the Dynamo client all the modules referencing it will reap the rewards. There’s many more examples like this in the microservice world:

  • Authentication/authorisation – each service needs to know how to authenticate/authorise requests
  • Error handling – exception handling, error logging, error reporting, error responses
  • Configuration management – each service has its own code for managing configuration data

A monorepo should also allow for faster development cycles, making a change to a single repository; raising a single PR is far better than having to create 4 separate PRs for each project you’ve touched.

On-boarding devs should also be much simpler: they don’t have to worry about 42 different repositories and their purpose, it’ll allow for easier maintenance too. Hopefully another benefit will be code quality, enforcing standards and best practices in a single repository should be easier than having to manage multiple repos with no doubt varying levels of quality.

Conclusion

In conclusion I want to reiterate that both the serverless and the microservices approach have their advantages and disadvantages and the architecture choice should be done based on the specific requirements of the project. Whilst the serverless approach is often idyllic it’s not always the best choice, especially if you want to get something to market quickly.

Leave a Reply