Benchmarking OS primitives

Lately I have taken some interest in the hardware and software of C++ build servers. One of the things that I have noticed is that there is a significant performance difference between Windows and Linux machines for common build tasks, such as cloning a git repository, running CMake and caching build results.

Some of these differences can obviously by explained by the fact that the software was originally designed and optimized for Linux (which is especially true for git), but surely there must be underlying differences in the operating systems that contribute to this too.

To get a clearer picture, I set out to benchmark some typical core primitives of operating systems. This includes:

  • Process and thread creation.
  • File creation.
  • Memory allocation.

These are things that are used heavily by software build tool chains, as well as by many other softwares (e.g. VCS clients/servers, web servers, etc).

The benchmark suite

I wrote a set of simple micro benchmarks in plain C. You can find the source code on GitHub. I have built and run the benchmark programs on Linux, Windows and macOS, and they should be very portable and run on most Unix-like operating systems.

Pre-built binaries for Windows (64-bit, compiled with GCC): osbench-win64-20170529.zip

The test systems

Name OS CPU Disk
Linux-i7x4 Ubuntu 16.04 i7-6820HQ, 4-core, 2.7GHz 256GB SSD (SATA)
Linux-i7x8 Ubuntu 16.10 i7-6900K, 8-core, 3.2GHz 1TB SSD (NVMe)
Linux-AMDx8 Fedora 25 Ryzen 1800X, 8-core, 3.6GHz 250GB SSD (NVMe)
RaspberryPi Raspbian Jessie ARMv7, 4-core, 1.2GHz 32GB MicroSD
MacBookPro macOS 10.12.4 i5-6360U, 2-core, 2GHz 250GB SSD (NVMe)
MacMini macOS 10.12.5 i7-3615QM, 4-core, 2.3GHz 1TB HDD (SATA)
Win-i7x4 Win 10 Pro i7-6820HQ, 4-core, 2.7GHz 256GB SSD (SATA)
Win-AMDx8 Win 10 Pro Ryzen 1800X, 8-core, 3.6GHz 250GB SSD (NVMe)

Except for the Raspberry Pi3, most of these systems are fairly high end. Also to be noted is that Linux-AMDx8 and Win-AMDx8 have identical hardware. Same thing with Linux-i7x4 and Win-i7x4.

Other than that, all systems use stock configurations (without any particular tuning), except for the Win-i7x4 that had some (unfortunately unknown) 3rd party anti virus software installed.

The results

Creating threads

In this benchmark 100 threads are created. Each thread terminates immediately without doing any work, and the main thread waits for all child threads to terminate. The time it takes for a single thread to start and terminate is measured.

Create threadApparently macOS is about twice as fast as Windows at creating threads, whereas Linux is about three times faster than Windows.

Creating processes

This benchmark is almost identical to the previous benchmark. However, here 100 child processes are created and terminated (using fork() and waitpid()). Unfortunately Windows does not have any corresponding functionality, so only Linux and macOS were benchmarked.

Create processAgain, Linux comes out on top. It is actually quite impressive that creating a process is only about 2-3x as expensive as creating a thread under Linux (the corresponding figure for macOS is about 7-8x).

Launching programs

Launching a program is essentially an extension to process creation: in addition to creating a new process, a program is loaded and executed (the program consists of an empty main() function and exists immediately). On Linux and macOS this is done using fork() + exec(), and on Windows it is done using CreateProcess().

Launch programHere Linux is notably faster than both macOS (~10x faster) and Windows (>20x faster). In fact, even a Raspberry Pi3 is faster than a stock Windows 10 Pro installation on an octa-core AMD Ryzen 1800X system!

Worth noting is that on Windows, this benchmark is very sensitive to background services such as Windows Defender and other antivirus software.

The best results on Windows were achieved by Win-AMDx8*, which is the same system as Win-AMDx8 but with most performance hogging services completely disabled (including Windows Defender and search indexing). However this is not a practical solution as it leaves your system completely unprotected, and makes things like file search close to unusable.

The very poor result for Win-i7x4 is probably due to third party antivirus software.

Creating files

In this benchmark, >65000 files are created in a single folder, filled with 32 bytes of data each, and then deleted. The time to create and delete a single file is measured.

Here is where things get silly…

Create fileAgain, Win-AMDx8* has Windows Defender and search indexing etc. disabled.

Two tests were performed for the Raspberry Pi3: with a slow MicroSD card (RaspberryPi) and with a RAM disk (RaspberryPi-RAM). For the other Linux and macOS systems, a RAM disk did not have a significant performance impact (and I did not try a RAM disk for Windows).

Here are some interesting observations:

  • The best performing system (Linux-AMDx8) is over one thousand times faster than the worst performing system (Wini7x4)!
  • Only with Windows Defender etc. disabled can an octa-core Windows system with a 3GB/s NVMe disk compete with a Raspberry Pi3 with a slow MicroSD memory card (the Pi wins easily when using a RAM disk though)!
  • Something is absolutely killing the file creation performance on Win-i7x4 (probably third party antivirus software).
  • Creating a file on Linux is really fast (over 100,000 files/s)!

Allocating memory

The memory allocation performance was measured by allocating 1,000,000 small memory blocks (4-128 bytes in size) and then freeing them again.

Memory allocationThis is the one benchmark where raw hardware performance seems to be the dominating factor. Even so, Linux is slightly faster than both Windows and macOS (even for equivalent hardware).

Conclusions

Some of the differences between the operating systems are staggering! I suspect that the poor process and file creation performance on Windows is to blame for the painfully slow git and CMake performance, for instance.

Obviously each operating system has its merits, but in general it seems that Linux > macOS > Windows when it comes to raw kernel and file system performance.

As a side note, I was quite surprised to find that Windows does not even offer anything similar to the standard Unix fork() functionality. This makes certain multi processing patterns unnecessarily cumbersome and expensive on Windows.

17 thoughts on “Benchmarking OS primitives

  1. Johnny

    Any chance you could run the same tests on *BSD? OpenBSD and FreeBSD are of particular importance.

    1. Marcus Geelnard

      Unfortunately I don’t have a BSD setup, but if someone else does it would be very interesting to see the results. It should be easy to build and run the benchmarks (https://github.com/mbitsnbites/osbench).

    1. Marcus Geelnard

      Very interesting! I will at least try it on our build machines at work.

      Update: I disabled 8.3 names but it had no effect on the benchmark results. The machine is still 20x slower than a nearly identical Ubuntu machine on creating files.

  2. AdamK

    I’m sorry to say, but this kind of benchmark is useless. What are you benchmarking? OS? CPU Architecture? Hardware? Compiler? Filesystem? API? Disk? If you want to benchmark specific primitive of an OS, you need to use the same hardware in each case, and the same compiler (if possible, obviously, you can’t compile Windows kernel by yourself). Without it, there is no objective way to compare results.

    1. That’s exactly what I was going to write! The Benchmark is useless because not a single parameter is constant except for the function/operation itself. You definitely have to test such things on the same hardware. You compare really outdated macOS hardware with relative new hardware for Linux and Windows. The idea of the benchmark is really good but please test it on the same hardware and repeat the test. That would be reallly interesting

      1. Marcus Geelnard

        I realize that most of the machines have different hardware, so in a way it’s a case of apples and oranges. The main reason is of course that I used whatever hardware I had access to. However, I believe that there is enough information in these benchmark results to deduct certain things. For instance, the slowest possible hardware (the Raspberry Pi) often outperforms both macOS and Windows.

        Also note that two pairs of identical hardware are included in the test (“Win-i7x4″/”Linux-i7x4” and “Win-AMDx8″/”Linux-AMDx8”). macOS is trickier since it pretty much requires installing Linux and Windows on the Mac (which I’m not prepared to do on my private Macs).

    2. Victor

      Hi AdamK,

      Useless? I don’t think so. Most time you evaluate system as whole (SO+Hardware), i.e: VPS. I use some benchmarks like these to test if the VPS is better or worse (general performance).

      Of course that if you have a specific scenario is more accurate to test it than general benchmark but most of the time I’m looking for versatile systems where I can dedicate to a complete different task.

      Regards

  3. The next blog post should be about making Windows 10 perform at the same level as MacOS & Linux! Please?

  4. Nico

    From HN: ‘”Launching Programs” should use posix_spawn at least on macOS, it’s a distinct syscall there and faster than fork + exec.’

    1. Marcus Geelnard

      I was more interested in fork + exec, since that’s what’s used by CMake, Apache httpd, nginx etc.

  5. Neo

    the process modeling in windows is different from the *nix world. so CreateProcess is expenssive and unchangable

  6. For these benchmarks to be most useful, you’d have to hold everything constant, as others have said. Also, we’d need to know a lot more about the SSDs. Just giving its capacity and interface isn’t enough. SSDs degrade over time, and can be very different depending on the amount of free space. Knowing the performance characteristics and situation with the SSDs will be critical.

    And of course, the brand and model of SSD is important to know. A factor like that could easily explain these results. On Windows and Linux, it’s also important to know whose NVMe driver you’re using – the operating system’s default NVMe driver or the SSD vendor’s.

    And you said: “The very poor result for Win-i7x4 is probably due to third party antivirus software.” That means you’ve got to get rid of that third-party AV software. That’s a major confound. I would just have a clean Windows 10 Pro system with Windows Defender as the AV.

    Also, are the Linux installs server or desktop editions? If they’re servers, I’d compare to Windows Server 2016 if you can get access to a box or a cloud instance. In fact, maybe just testing all the server OSes (Ubuntu, Fedora, Windows 2016, the BSDs) on the same cloud service would be a good idea, because you could hold the hardware and hypervisor constant, probably.

    I agree with the guy proposing BSD inclusion. That would be awesome. Here are FreeBSD optimization tips: https://calomel.org/freebsd_network_tuning.html

    1. Marcus Geelnard

      Thanks for the feedback Jose,

      This post was really just a dump of my hobby measurements and findings since I thought that it could be of interest for more people. There are a lot of things going on in these figures, and sorting them out definitely needs more specific testing.

      For instance, I find the “poor result for Win-i7x4” very interesting, since it indicates that AV software can have severe performance impacts. However we can’t really tell how much of an impact from these figures alone, but would need to dig deeper.

      On other axes we have the impact of different file systems, different CPU performance, different disk performance, drivers and kernel versions etc, etc, all of which are interesting on their own, but would require more specific tests.

      It would be very interesting to see people do more detailed investigations of these different aspects – unfortunately I do not have the time nor the resources to do it myself.

      BTW, if you ignore all the results but the Linux-AMDx8 and Win-AMDx8 results, you will actually have identical hardware and age and a very clean/stock software setup.

      As a side project I recently set up a clean dual boot Win 10 + Ununtu 18.04 machine with stock OS configurations to compare performance (this time I was looking at Git, CMake and GCC, and depending on the task Windows was 10%-1000% slower than Linux on the exact same hardware with the exact same software versions).

Leave a Reply

Your email address will not be published. Required fields are marked *