# 09-05 Reducing CPU load

# Problem: CPU under heavy load

I found that Docker has enough memory (only 1.5% is used), the CPU load is the culprit here. Under minimum load (4 beacons, 1 clients) 50 measurements per second are ingested in the API and processed. When ramped up to a full load for multiantenna support (3 clients, 8 beacons), there are up to 250 measurements / s to be handled.

Here are various outputs of the docker stat command, which shows the resource usage of all containers. Note that the CPU is overclocked to 4.25 GHz and 8 threads are alotted to docker, therefore extending the hardware setup is not feasible because the setup is already very beefy.

When one client is active only four beacons, all jobs pass

CONTAINER ID     NAME        CPU %       MEM USAGE / LIMIT     MEM %
291ad2e05d43     ips-api     72.61%      57.7MiB / 3.848GiB    1.46%

Two active clients and only four beacons, most jobs pass

CONTAINER ID     NAME        CPU %       MEM USAGE / LIMIT     MEM %
291ad2e05d43     ips-api     105.86%     69.95MiB / 3.847GiB   1.78%

Three active clients and only four beacons, every single job fails

CONTAINER ID     NAME        CPU %       MEM USAGE / LIMIT     MEM %
291ad2e05d43     ips-api     129.85%     93.85MiB / 3.847GiB   2.38%

When submitted to a full load of up to 245 measurements per second but with the queuing system turned off, the maximum load on the API is considerably lower. Disconnecting the queue worker that averages values certainly reduced the overhead of spawning new entries in the queue that are saved in Redis and even in SQL. I suspect a few things that massively slow down everything: Failed jobs are saved in MySQL which takes quite long to write to (with 250 measurements / s incoming, every ms counts). Secondly, the queue worker tries to work every single task, regardless of the job's duration. If the duration is longer than the amount of time till the next measurement is input (on avg: 1000ms / 250ms = 4ms), the queue grows to an obscene length and the worker is permanently overloaded.

CONTAINER ID     NAME        CPU %       MEM USAGE / LIMIT     MEM %
291ad2e05d43     ips-api     46.05%      49.27MiB / 3.848GiB   1.25%

# Possible solutions

# Client Side Preprocessing

Before anything is every sent out to the API, one can average the measurements in the pool. The API does less processing as a result. The reason why I do not want this is because it forces the client to do something it isn't responsible for, because its sole purpose is to pass measurements on to the API as fast as possible. It should not have to think at all. What also is mathematically incorrect is that one first averages on the client side and then has to average again after combining data from all clients because that is the whole point of spatial diversity.

# Disable SQL logging

Something that might considerably slow down the system is the write time to the database when something fails. The slowdown only makes the overloading of the server even worse. Setting the database driver in config to null as suggested in this laracasts post, the handling is possibly faster.

# Scale containers or use a process supervisor

Another possible solution is to scale the application using the built in docker up --scale command. This would let me spin up a few API containers with many laravel instances (and many queues + workers). This is worth a try, because I already am using a load balancer and docker takes care of DNS to the containers. Requests get load balanced and are automatically distributed to one of the containers that hopefully can use a new thread.

Problem: I am not sure if this is even going to help, because the host machine is still the same and has the same amount of resources. I hope that having many containers on the same system lets every worker use resources more easily. As far as I know, containers already have access to all cores The main would probably be that there are many queue stacks and many workers, so the Laravel side would be more efficiently used because each queue is a single threaded process. The same could be achieved when using supervisor in order to spawn multiple processes.

# Move averaging to trilaterator

Currently a moving average is calculated on every single incoming measurement, which means 245 averages are calculated by a single thread. What is weird about this is that the API does this work, since the main function of the API is only to save raw values in redis. When this functionality is moved to the python side, there is a large potential to remove redundant work. The trilaterator only needs to know about 4 times a second what the current RSSI average is, so only 32 moving averages (4 readings * 8 beacons * 1 client) have to be calculated every second. This is considerably better than the original 250. Most of these averages are never even used by the trilaterator.

# Next steps

I think I will first try to scale the amount of queue workers using a process supervisor or simply by tripling the amount of containers because it seems like the least amount of work. It is still unknown whether multiclient systems even improve accuracy of positioning, so I do not want to redo the entire backend. If this does not work, I will migrate the averaging functionality to the trilaterator. This will probably require quite a bit of time though.

Last Updated: 11/23/2020, 2:52:08 PM