Load Testing with k6

A bunch of load testing capabilities with only one tool

Agus Richard
Level Up Coding

--

Photo by Todd Quackenbush on Unsplash

Testing things is full of fascination. Especially if we are testing our own creation. At least, after we code something, we test it first manually with some set of common cases. If it works well, then we submit the code.

But there is a time when we have to make sure that we made a reliable creation. Let’s say it works well in a certain condition, like the way we’ve tested it manually. Yet, how can we make sure that our code behaves the same way in any other conditions, like when it’s bombarded by a lot of requests for example? There must be a better way to test it without asking your friends to help you manually (and in parallel) make requests to your application. Luckily, there is. We’ll learn how to load test our application using k6 under several conditions.

Installation

There are several ways to install k6 depending on your operating system or system environment. But in this article, I will only touch two of them, Linux/Ubuntu and Docker.

Directly quoted from k6 documentation. You could install k6 on Linux Ubuntu by running this command on your terminal:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.listsudo apt-get updatesudo apt-get install k6

With Docker, things get a little bit easier. You can install it using this command:

docker pull loadimpact/k6

Since using Docker makes our life so much easier. I’ll use Docker frequently in this article. But the essence would still be the same. You can find the full explanation here https://k6.io/docs/getting-started/installation/.

The Basics

For the purpose of this article, I’ll load test one of my practice projects. But I’ll also explain the detail of the API, so you can still follow along.

If you want to exactly follow along with this article, you can find the application that will be tested here. The application itself is a simple financial manager, where you can register, sign in, create income or expense categories, and write down your income or expense. You’ll get the detail on how to setup up the server in the readme file.

Let’s start testing the base URL. Assume that the base URL is http://localhost:3000 and returns a message ‘Welcome to the API’. The test script would be:

First, we imported the dependencies on top. Note that in the background, k6 doesn’t run in NodeJS, since in general JavaScript is not well suited for high performance. It’s written in Go to achieve the desired high-performance testing.

The test itself is running inside the exported default function. This part of the code is what we usually called as VU Code. So, the test is running once and use only one virtual user (think of this as a real user, but simulated) by default, but you can change that using options . We’ll talk about VU Code and options later on.

If you’re facing some problem like connection refused when using Docker. You need to change localhost to the IP address of the host. You can check that using docker inspect <container_id_or_name> . Usually, the address host is the real address of the service name when being referenced. Or if you’re using a Linux system, you can use hostname -I .You’ll get something like 192.xxx.xxx.xx . Then replace localhost with that IP address.

You can run this test by running the command:

// CLI
k6 run script.js
// Docker
docker run -i loadimpact/k6 run - <script.js

You’ll see the result of the test right away on the terminal. Something similar to this.

Author’s Image

See that if we’re not providing options to this test, it runs once and uses only one virtual user. A bunch of numbers at the bottom are the built-in metrics, such as data_received, http_req_duration, http_req_failed, vus, etc. For example, http_req_failed is the rate of failed requests according to setResponseCallback. By default requests with status codes between 200 and 399 are considered “expected”. We’ll see how to make custom metrics later on.

Life Cycle

There are four stages in a test. The first one is init, then setup , VU , and teardown .

In general, the pattern is like this

The init code is running once per virtual user. You can think that a virtual user is like a real user. But it’s simulated and all of them doing the same thing, their behaviour is defined inside the default function (VU Code). You can import modules here to be used later on inside setup, VU or teardown.

The setup and teardown are pretty much the same as in any other testing tool. You’ll need them if you want to do something before and after the test is running. setup is called after the init but before the VU, and teardown are called after the last VU iteration.

Then we have VU code. We defined the behaviour of the virtual user here; calling the API, checking if the response we get is correct, saving the result in a metric, etc. Basically, this section is where the real test happens. It’s running in a loop for every virtual user for a defined duration or stages (You can set this in options)

The Real Tests

In this section, we’ll talk about options, metrics, thresholds, and life-cycle in real testing implementation.

Start with the setup function.

In this setup function, we register our dummy user then log in to get the access token. Because all APIs that we want to test later are protected with authentication middleware.

Before making any income or expense history, we need to create its categories, e.g, shopping, investment, taxes, etc.

After we get the token (including id and email for demonstration) and the newly created income/expense types, then we can return these values. So it can be used later to access the to-be-tested API.

Note that we have to stringify the payload and provide params content-type JSON, otherwise by default the data that will be sent would be in form-data format.

Next is the teardown function.

The teardown function is relatively simple. What we’re doing here is truncate the injected table so it can be used later for the next session of testing. But to clear the database, we need an access token. Luckily we have that inside the data object.

Now it’s time to test the core APIs.

Okay, we have something new here. Just like I said, we would touch options and metrics. Note that I left out the setup and teardown functions, they’re still there in the real test script file.

First, we have options:

  • vus : maximum number of virtual users to be simulated
  • duration: duration of the test
  • thresholds: criteria of the test

We say that a test is passed if the result of the test is still inside the boundary of our defined thresholds. From the above example, all HTTP request durations must be 75% below 2 seconds. If not, the test would be aborted.

Obviously, there are more options you could use. You can get the full list here.

Next, the check function. By looking at lines 46 to 49, can you guess what it does? Yes, it’s about checking some requirements. If the returned response is not successful and the HTTP call duration is above 2 seconds, then the checks will fail. The result you’ll get would be similar to this.

Author’s Image

See that 298 checks are failed, which means the HTTP request duration takes longer than 2 seconds.

As you might know, after the test is completed, we get the result. Such as http_req_duration, http_req_failed, http_req_waiting and so on. These are the built-in metrics, but you can make your own metrics too, using Trend and Rate (there are also Gauge and Counter).

In lines 6 and 7, we instantiated the metrics and give them names. Then in lines 52 and 56, we add the result of the HTTP request duration to Rate and Trend. But wait! What are Rate and Trend?

Rate is the percentage of added values that are non-zero. Imagine it like when you add a number zero or one. If you repeat adding this random number within a loop, then you divide the result by the number of loops. Then you get the rate.

Meanwhile, Trend is like the statistics of the test. It gives you average, min, max, and percentile.

Load Test Variations

In this section, we’ll talk about some variations of load tests, they are load testing, smoke testing, stress testing, spike testing, and soak testing.

Load Testing

What we’ve done so far is load testing. Because we want to assess the performance of our system. Typically, we need load testing to determine how our system will behave under two conditions, normal and peak traffic. Also, it’s pretty common to continuously perform load testing to make sure the performance of our system is still within the desired value.

Generally, you only have to change the options to do load testing variations. For example, options you’d need is something like this:

We haven’t talked about stages up until now. stages is like the intended traffic. For example, for 5 minutes there are only 100 users. Then for the next 10 minutes, it stays in 100 users. Finally, it ramps down to 0 users to simulate recovery.

Smoke Testing

When writing a test script, there will be some sanity checks. Is the test script already correct? Is it doing what we want it to do?

Obviously, you don’t want to set the test duration to be one hour when you write your test script. It’s plain wasting time to wait for one hour to see if we were writing a correct test script.

That’s why generally options for smoke testing would be like this

You should keep the number of users and duration to minimal.

Stress Testing

Let’s say that you are working in an e-commerce company and you want to know how your system behaves under high-sale traffic. What you need to do is to perform stress testing on your system.

Stress Testing is a type of load testing used to determine the limits of the system. The purpose of this test is to verify the stability and reliability of the system under extreme conditions.

When you are doing stress testing, you’ll go beyond your typical traffic. So, it’s certainly risky to do stress testing in a production environment. It’s okay to test it on your local machine or staging environment.

Spike Testing

Spike testing is similar to stress testing, we want to test our system under extreme conditions. The difference is while stress testing goes through longer stages to ramp up the target, spike testing just goes all the way through to the extreme condition. Simulating a sudden surge of traffic.

Soak Testing

Soak testing is assessing the reliability of the system over a long period of time. The soak test uncovers performance and reliability issues stemming from a system being under pressure for an extended period.

Reliability issues typically relate to bugs, memory leaks, insufficient storage quotas, incorrect configuration or infrastructure failures. Performance issues typically relate to incorrect database tuning, memory leaks, resource leaks or a large amount of data.

Conclusion

There a more to k6 than we’ve talked about in this article. There are so many options you could use, scenarios if you need advanced user behaviour, saving the test result in a CSV or JSON file, having a dashboard for presentation, etc.

I’d say that k6 documentation is easy to navigate and comprehensible. All things are neatly written for us. So, don’t hesitate to read directly on the official documentation.

Lastly, I always talked about this when I was talking about performance. We should not do performance tweaking before the code is completely running. Make sure the code is clean and maintainable first, then it’s our time to tweak and find better solutions to improve the performance. Otherwise, as Donald Knuth said we’ll be trapped in premature optimization.

You can have the full test script here: https://github.com/agusrichard/javascript-workbook/tree/master/k6-article-material

Thank you for reading and happy testing!

Learn more about load testing

Resources

--

--

Software Engineer | Data Science Enthusiast | Photographer | Fiction Writer | Freediver LinkedIn: https://www.linkedin.com/in/agus-richard/