If you are building serverless, event-driven architectures, you should know about AWS EventBridge. It is a fully managed event bus service that makes publishing and subscribing to events at any scale easy. Nearly all the serverless applications I build/design use EventBridge somewhere.
Using events to reduce coupling between applications is quite popular. However, it has its challenges. One of those challenges is testing. How do you verify that you publish what you want when you want it?
In this article, I will discuss some approaches to testing I've taken with different teams over the years and explain why you might choose one over the others. And what does Serverless have to do with any of this? Let's take a look.
Testing Approaches
I want to consider four strategies we can take.
Don't test
Mock the publish call
Intercept the publish call during testing
Create a listener application
Option 1: Don't Test
This is not a bad strategy. I know what you're thinking, but hear me out. An effective test for publishing events is non-trivial, and you may decide the value is not worth the cost. I can respect this conclusion. I have released applications that publish millions of messages without the kind of testing I will describe in this article. You must decide if the value is there for yourself.
Option 2: Mock The Publish Call
This approach is the standard, code-level mock (e.g., jest.fn()
) that you would drop in place of the EventBridge client.send()
call. Then you would assert you passed in what you passed in. Don't do this. This is worse than not testing. You've now coupled your implementation guts (EventBridge, AWS-SDK v3, etc.) to the test, and you are no longer black-box testing. Over the years, I have reduced my mock use from heavy to zero. I now consider mocks harmful, always. Resist.
Option 3: Intercept The Publish Call During Testing Only
This is the classic case of:
if (isTest()) return testResponse;
else return realResponse;
I have seen this in many forms, from the simple (above) to the fancy1, and I have the same response: Don't do it. This approach tells you very little you didn't already know and adds complexity to your code. You are much better off with option #1.
Option 4: Create A Listener Application
With this approach, we publish a real message to a real event bus. We strategically place a listener on this bus and record the events. Then, you can poll the listener's data store to see if your event was received and, logically, published. If this sounds appealing (and worth the trouble), I will explain how you can create one in the rest of the article.
Ephemerality
The Cambridge English Dictionary defines ephemeral as "lasting for only a short time."
I first encountered the concept of ephemeral applied to software when I started working with AWS Lambda in 20152. Lambda is an "ephemeral compute" in that you don't "configure, launch, or monitor EC2 instances." They spin up and disappear independently, running your code on demand, and then they're gone3.
Using the Serverless Framework, you can easily deploy environment-specific versions of your app by changing a single CLI argument:Â stage
. In his blog4, Yan Cui started advocating for this approach to segregate your changes from others while you are working on a feature. He used the term "ephemeral environment" for this, but I've often called it an "ephemeral stack." We can do this because spinning up new AWS Serverless components like Lambda, API Gateway, and S3 is easy and fast.
This technique can be used to build a "test stack" that listens to our published events. We can deploy it before we run the tests and destroy it afterward. Let's look at what this ephemeral stack does and how we can create it.
Test Stack
Our test "stack" (or "app") is composed of two components: a Lambda and a DynamoDB table. We'll wire up the event trigger for our Lambda to EventBridge using a rule that matches our events in the system under test.
The DynamoDB table should be set up so that its key structure matches the event you want to save. For example, if you publish a Message entity whose primary identifier is an id, you will want to structure your DDB table to use an id partition/hash key, like so:
The Serverless Framework makes the EventBridge-to-Lambda trigger easy to configure. We pass in the ARN of the event bus and the pattern we'd like to use and then specify the pattern we'd like to match. For my sample code, I have this:
This will trigger my Lambda on any event with a source of "com.your-app.test" and a detail type of "new." In the "References" section below, you can read more about Serverless Framework's EventBridge triggers for Lambda.
The Test
After I run my normal unit, integration, and end-to-end tests on my app, my CI/CD pipeline will deploy this test stack. Once complete, I'll run a set of tests that interact with the system under test in ways that will generate events. Then, my test will look for a record in the test stack's DDB table.
Since the record won't show up in the test stack DDB table until my test stack receives the message, I'll have to use a backoff/retry in my test. For JavaScript/TypeScript, my favorite library of this sort is async-retry. It's quite good. Here's what a test could look like:
Improvement Idea #1
Let's say your system under test can create, update, and delete an entity. Suppose you want to run tests on the events from a full lifecycle of your entity. You'd need to distinguish a create
 event from an update
 or delete
. Currently, the update
 event will overwrite the create
 event because they share the same ID. The same thing will happen when we receive the delete
.
One way to solve this is to concatenate the source type of the event onto its ID before storing it in DynamoDB. For example, if you receive a create
 event for id: xyz
, you can store the record with id: xyz_CREATE
 as its key. Then, when you receive the delete
 event for that record, you can store another table item with id: xyz_DELETE
.
Then, in your test, you can getMessageFromTestDb()
 with an additional event type parameter, like getMessageFromTestDb(id, EventType.Delete)
, and verify that you published the delete event.
Improvement Idea #2
What if you change what is ephemeral? Instead of the entire stack, you could make each database record ephemeral using DynamoDB's Time-To-Live (TTL) functionality. Then, you can always leave your test stack up, and you don't have to wait for the test stack to be deployed during each pipeline run. If you want to optimize for deployment speed, consider this.
TTL is a one-line change to your CloudFormation for your DynamoDB table, and it looks like this
You specify a property (in this case, I use ttl
) that will act as your expiration date. Set this property to one day (or one hour) from receiving it, and then DynamoDB will automatically delete the record for you. The deletion isn't treated as "write throughput" like a standard delete. It's essentially free! Try it out.
Review and Decide
OK, now that you know all that would be involved in testing your events, should you do it? In other words, should you take "Option 4: Create A Listener Application" or "Option 1: Don't Test"? You will have to weigh the complexity of the testing solution against the risk of catching errors only in a higher environment. As I mentioned earlier, I have taken both approaches in the past and successfully gone into production with each. Consider it a tool to use if it makes sense. Have fun with it.
References
GitHub:Â Example code for event-bridge-test-stack
Serverless Land:Â EventBridge
Serverless Framework Documentation:Â EventBridge triggers for Lambda
Yan Cui: Why you should use ephemeral environments when you do serverless
NPM:Â async-retry
Amazon DynamoDB Developer Guide:Â Using time to live (TTL) in DynamoDB
It takes a while to dig through this recent article, but at its root, it’s just another if (isTest())
.
Jeff Barr’s official announcement of Lambda
I suppose the analogy to "cattle, not pets" concerning VMs is even earlier, but it didn't leave as big an impression on me as Lambda did.