Load Testing with k6
A bunch of load testing capabilities with only one tool
Testing things is full of fascination. Especially if we are testing our own creation. At least, after we code something, we test it first manually with some set of common cases. If it works well, then we submit the code.
But there is a time when we have to make sure that we made a reliable creation. Let’s say it works well in a certain condition, like the way we’ve tested it manually. Yet, how can we make sure that our code behaves the same way in any other conditions, like when it’s bombarded by a lot of requests for example? There must be a better way to test it without asking your friends to help you manually (and in parallel) make requests to your application. Luckily, there is. We’ll learn how to load test our application using k6 under several conditions.
Installation
There are several ways to install k6 depending on your operating system or system environment. But in this article, I will only touch two of them, Linux/Ubuntu and Docker.
Directly quoted from k6 documentation. You could install k6 on Linux Ubuntu by running this command on your terminal:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.listsudo apt-get updatesudo apt-get install k6
With Docker, things get a little bit easier. You can install it using this command:
docker pull loadimpact/k6
Since using Docker makes our life so much easier. I’ll use Docker frequently in this article. But the essence would still be the same. You can find the full explanation here https://k6.io/docs/getting-started/installation/.
The Basics
For the purpose of this article, I’ll load test one of my practice projects. But I’ll also explain the detail of the API, so you can still follow along.
If you want to exactly follow along with this article, you can find the application that will be tested here. The application itself is a simple financial manager, where you can register, sign in, create income or expense categories, and write down your income or expense. You’ll get the detail on how to setup up the server in the readme file.
Let’s start testing the base URL. Assume that the base URL is http://localhost:3000
and returns a message ‘Welcome to the API’. The test script would be:
First, we imported the dependencies on top. Note that in the background, k6 doesn’t run in NodeJS, since in general JavaScript is not well suited for high performance. It’s written in Go to achieve the desired high-performance testing.
The test itself is running inside the exported default function. This part of the code is what we usually called as VU Code. So, the test is running once and use only one virtual user (think of this as a real user, but simulated) by default, but you can change that using options
. We’ll talk about VU Code and options later on.
If you’re facing some problem like connection refused when using Docker. You need to change localhost
to the IP address of the host. You can check that using docker inspect <container_id_or_name>
. Usually, the address host is the real address of the service name when being referenced. Or if you’re using a Linux system, you can use hostname -I
.You’ll get something like 192.xxx.xxx.xx
. Then replace localhost
with that IP address.
You can run this test by running the command:
// CLI
k6 run script.js// Docker
docker run -i loadimpact/k6 run - <script.js
You’ll see the result of the test right away on the terminal. Something similar to this.
See that if we’re not providing options to this test, it runs once and uses only one virtual user. A bunch of numbers at the bottom are the built-in metrics, such as data_received
, http_req_duration
, http_req_failed
, vus
, etc. For example, http_req_failed
is the rate of failed requests according to setResponseCallback. By default requests with status codes between 200 and 399 are considered “expected”. We’ll see how to make custom metrics later on.
Life Cycle
There are four stages in a test. The first one is init
, then setup
, VU
, and teardown
.
In general, the pattern is like this
The init
code is running once per virtual user. You can think that a virtual user is like a real user. But it’s simulated and all of them doing the same thing, their behaviour is defined inside the default function (VU Code). You can import modules here to be used later on inside setup
, VU
or teardown
.
The setup
and teardown
are pretty much the same as in any other testing tool. You’ll need them if you want to do something before and after the test is running. setup
is called after the init
but before the VU
, and teardown
are called after the last VU
iteration.
Then we have VU code. We defined the behaviour of the virtual user here; calling the API, checking if the response we get is correct, saving the result in a metric, etc. Basically, this section is where the real test happens. It’s running in a loop for every virtual user for a defined duration or stages (You can set this in options)
The Real Tests
In this section, we’ll talk about options, metrics, thresholds, and life-cycle in real testing implementation.
Start with the setup function.
In this setup function, we register our dummy user then log in to get the access token. Because all APIs that we want to test later are protected with authentication middleware.
Before making any income or expense history, we need to create its categories, e.g, shopping, investment, taxes, etc.
After we get the token (including id and email for demonstration) and the newly created income/expense types, then we can return these values. So it can be used later to access the to-be-tested API.
Note that we have to stringify the payload and provide params content-type JSON, otherwise by default the data that will be sent would be in form-data format.
Next is the teardown function.
The teardown function is relatively simple. What we’re doing here is truncate the injected table so it can be used later for the next session of testing. But to clear the database, we need an access token. Luckily we have that inside the data object.
Now it’s time to test the core APIs.
Okay, we have something new here. Just like I said, we would touch options and metrics. Note that I left out the setup and teardown functions, they’re still there in the real test script file.
First, we have options:
vus
: maximum number of virtual users to be simulatedduration
: duration of the testthresholds
: criteria of the test
We say that a test is passed if the result of the test is still inside the boundary of our defined thresholds. From the above example, all HTTP request durations must be 75% below 2 seconds. If not, the test would be aborted.
Obviously, there are more options you could use. You can get the full list here.
Next, the check
function. By looking at lines 46 to 49, can you guess what it does? Yes, it’s about checking some requirements. If the returned response is not successful and the HTTP call duration is above 2 seconds, then the checks will fail. The result you’ll get would be similar to this.
See that 298 checks are failed, which means the HTTP request duration takes longer than 2 seconds.
As you might know, after the test is completed, we get the result. Such as http_req_duration
, http_req_failed
, http_req_waiting
and so on. These are the built-in metrics, but you can make your own metrics too, using Trend
and Rate
(there are also Gauge
and Counter
).
In lines 6 and 7, we instantiated the metrics and give them names. Then in lines 52 and 56, we add the result of the HTTP request duration to Rate and Trend. But wait! What are Rate and Trend?
Rate is the percentage of added values that are non-zero. Imagine it like when you add a number zero or one. If you repeat adding this random number within a loop, then you divide the result by the number of loops. Then you get the rate.
Meanwhile, Trend is like the statistics of the test. It gives you average, min, max, and percentile.
Load Test Variations
In this section, we’ll talk about some variations of load tests, they are load testing, smoke testing, stress testing, spike testing, and soak testing.
Load Testing
What we’ve done so far is load testing. Because we want to assess the performance of our system. Typically, we need load testing to determine how our system will behave under two conditions, normal and peak traffic. Also, it’s pretty common to continuously perform load testing to make sure the performance of our system is still within the desired value.
Generally, you only have to change the options
to do load testing variations. For example, options
you’d need is something like this:
We haven’t talked about stages
up until now. stages
is like the intended traffic. For example, for 5 minutes there are only 100 users. Then for the next 10 minutes, it stays in 100 users. Finally, it ramps down to 0 users to simulate recovery.
Smoke Testing
When writing a test script, there will be some sanity checks. Is the test script already correct? Is it doing what we want it to do?
Obviously, you don’t want to set the test duration to be one hour when you write your test script. It’s plain wasting time to wait for one hour to see if we were writing a correct test script.
That’s why generally options
for smoke testing would be like this
You should keep the number of users and duration to minimal.
Stress Testing
Let’s say that you are working in an e-commerce company and you want to know how your system behaves under high-sale traffic. What you need to do is to perform stress testing on your system.
When you are doing stress testing, you’ll go beyond your typical traffic. So, it’s certainly risky to do stress testing in a production environment. It’s okay to test it on your local machine or staging environment.
Spike Testing
Spike testing is similar to stress testing, we want to test our system under extreme conditions. The difference is while stress testing goes through longer stages to ramp up the target, spike testing just goes all the way through to the extreme condition. Simulating a sudden surge of traffic.
Soak Testing
Soak testing is assessing the reliability of the system over a long period of time. The soak test uncovers performance and reliability issues stemming from a system being under pressure for an extended period.
Reliability issues typically relate to bugs, memory leaks, insufficient storage quotas, incorrect configuration or infrastructure failures. Performance issues typically relate to incorrect database tuning, memory leaks, resource leaks or a large amount of data.
Conclusion
There a more to k6 than we’ve talked about in this article. There are so many options you could use, scenarios if you need advanced user behaviour, saving the test result in a CSV or JSON file, having a dashboard for presentation, etc.
I’d say that k6 documentation is easy to navigate and comprehensible. All things are neatly written for us. So, don’t hesitate to read directly on the official documentation.
Lastly, I always talked about this when I was talking about performance. We should not do performance tweaking before the code is completely running. Make sure the code is clean and maintainable first, then it’s our time to tweak and find better solutions to improve the performance. Otherwise, as Donald Knuth said we’ll be trapped in premature optimization.
You can have the full test script here: https://github.com/agusrichard/javascript-workbook/tree/master/k6-article-material
Thank you for reading and happy testing!