Lately I have taken some interest in the hardware and software of C++ build servers. One of the things that I have noticed is that there is a significant performance difference between Windows and Linux machines for common build tasks, such as cloning a git repository, running CMake and caching build results.
Some of these differences can obviously by explained by the fact that the software was originally designed and optimized for Linux (which is especially true for git), but surely there must be underlying differences in the operating systems that contribute to this too.
To get a clearer picture, I set out to benchmark some typical core primitives of operating systems. This includes:
- Process and thread creation.
- File creation.
- Memory allocation.
The benchmark suite
I wrote a set of simple micro benchmarks in plain C. You can find the source code on GitHub. I have built and run the benchmark programs on Linux, Windows and macOS, and they should be very portable and run on most Unix-like operating systems.
Pre-built binaries for Windows (64-bit, compiled with GCC): osbench-win64-20170529.zip
The test systems
|Linux-i7x4||Ubuntu 16.04||i7-6820HQ, 4-core, 2.7GHz||256GB SSD (SATA)|
|Linux-i7x8||Ubuntu 16.10||i7-6900K, 8-core, 3.2GHz||1TB SSD (NVMe)|
|Linux-AMDx8||Fedora 25||Ryzen 1800X, 8-core, 3.6GHz||250GB SSD (NVMe)|
|RaspberryPi||Raspbian Jessie||ARMv7, 4-core, 1.2GHz||32GB MicroSD|
|MacBookPro||macOS 10.12.4||i5-6360U, 2-core, 2GHz||250GB SSD (NVMe)|
|MacMini||macOS 10.12.5||i7-3615QM, 4-core, 2.3GHz||1TB HDD (SATA)|
|Win-i7x4||Win 10 Pro||i7-6820HQ, 4-core, 2.7GHz||256GB SSD (SATA)|
|Win-AMDx8||Win 10 Pro||Ryzen 1800X, 8-core, 3.6GHz||250GB SSD (NVMe)|
Except for the Raspberry Pi3, most of these systems are fairly high end. Also to be noted is that Linux-AMDx8 and Win-AMDx8 have identical hardware. Same thing with Linux-i7x4 and Win-i7x4.
Other than that, all systems use stock configurations (without any particular tuning), except for the Win-i7x4 that had some (unfortunately unknown) 3rd party anti virus software installed.
In this benchmark 100 threads are created. Each thread terminates immediately without doing any work, and the main thread waits for all child threads to terminate. The time it takes for a single thread to start and terminate is measured.
This benchmark is almost identical to the previous benchmark. However, here 100 child processes are created and terminated (using fork() and waitpid()). Unfortunately Windows does not have any corresponding functionality, so only Linux and macOS were benchmarked.
Again, Linux comes out on top. It is actually quite impressive that creating a process is only about 2-3x as expensive as creating a thread under Linux (the corresponding figure for macOS is about 7-8x).
Launching a program is essentially an extension to process creation: in addition to creating a new process, a program is loaded and executed (the program consists of an empty main() function and exists immediately). On Linux and macOS this is done using fork() + exec(), and on Windows it is done using CreateProcess().
Here Linux is notably faster than both macOS (~10x faster) and Windows (>20x faster). In fact, even a Raspberry Pi3 is faster than a stock Windows 10 Pro installation on an octa-core AMD Ryzen 1800X system!
Worth noting is that on Windows, this benchmark is very sensitive to background services such as Windows Defender and other antivirus software.
The best results on Windows were achieved by Win-AMDx8*, which is the same system as Win-AMDx8 but with most performance hogging services completely disabled (including Windows Defender and search indexing). However this is not a practical solution as it leaves your system completely unprotected, and makes things like file search close to unusable.
The very poor result for Win-i7x4 is probably due to third party antivirus software.
In this benchmark, >65000 files are created in a single folder, filled with 32 bytes of data each, and then deleted. The time to create and delete a single file is measured.
Here is where things get silly…
Two tests were performed for the Raspberry Pi3: with a slow MicroSD card (RaspberryPi) and with a RAM disk (RaspberryPi-RAM). For the other Linux and macOS systems, a RAM disk did not have a significant performance impact (and I did not try a RAM disk for Windows).
Here are some interesting observations:
- The best performing system (Linux-AMDx8) is over one thousand times faster than the worst performing system (Wini7x4)!
- Only with Windows Defender etc. disabled can an octa-core Windows system with a 3GB/s NVMe disk compete with a Raspberry Pi3 with a slow MicroSD memory card (the Pi wins easily when using a RAM disk though)!
- Something is absolutely killing the file creation performance on Win-i7x4 (probably third party antivirus software).
- Creating a file on Linux is really fast (over 100,000 files/s)!
The memory allocation performance was measured by allocating 1,000,000 small memory blocks (4-128 bytes in size) and then freeing them again.
Some of the differences between the operating systems are staggering! I suspect that the poor process and file creation performance on Windows is to blame for the painfully slow git and CMake performance, for instance.
Obviously each operating system has its merits, but in general it seems that Linux > macOS > Windows when it comes to raw kernel and file system performance.
As a side note, I was quite surprised to find that Windows does not even offer anything similar to the standard Unix fork() functionality. This makes certain multi processing patterns unnecessarily cumbersome and expensive on Windows.