Hi. I’m Niels and I’m a sucker for benchmarks!
We had a recent chat on how trivial bits of code and the pro’s and con’s of optimizing for speed versus optimizing for readability. We came up with a very simple problem; counting vowels. The vowels are obviously just a list of characters; a, e, i, o and u. Counting the amount of vowels in a sentence is simply a matter of iterating through the characters of the text and compare them to each vowel in the string.
So simple that it’s in fact trivial to figure out five different approaches to the problem. But which one is the fastest? Time to break out one of the most awesome tools in our toolbox: the benchmark. I’ve been using JMH a lot and it’s very helpful to quickly answer these essential questions. The 5 approaches are a search through an array of vowels, a binary search through the same array, a hash-set lookup, a String.contains() and a simply if-block with 5 comparison expressions. The code (boilerplate omitted):
public int withIf() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(c == 'a' || 'c' == 'e' || c == 'i' || c == 'o' || c == 'u') {
vowels++;
}
}
return vowels;
}
public int withArray() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
for(int j = 0;j < VOWELS.length;j++) {
if(VOWELS[j] == c) {
vowels++;
}
}
}
return vowels;
}
public int withHashSet() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(VOWEL_SET.contains(c)) {
vowels++;
}
}
return vowels;
}
public int withArrayBinSearch() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(Arrays.binarySearch(VOWELS, c) >= 0) {
vowels++;
}
}
return vowels;
}
public int withString() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
if(VOWEL_STRING.contains(TEST_STRING.substring(i, i + 1))) {
vowels++;
}
}
return vowels;
}
Can you guess which one is the fastest? The results aren’t very surprising:
Benchmark Mode Samples Score Score error Units
m.IfVsArray.withArray thrpt 5 17137,384 2295,428 ops/ms
m.IfVsArray.withArrayBinSearch thrpt 5 4037,655 344,203 ops/ms
m.IfVsArray.withHashSet thrpt 5 7014,505 591,107 ops/ms
m.IfVsArray.withIf thrpt 5 22911,273 2570,412 ops/ms
m.IfVsArray.withString thrpt 5 1246,593 56,895 ops/ms
The if block is fastest with 22911 ops/second on average. The array search (O(n)) is second at 17137. The binary search (O(log n)) and hash set (O(1)-ish) obviously suffer a lot from the method calling overhead and, in the case of the set, autoboxing. The String.contains method is by far the worst; suffering from object creation, method call overhead and an O(n) complexity.
So, is this relevant? In most cases it isn’t. Obviously the benchmark is very synthetic. But if you’re working on some low level code that’s running very hot just figuring out a few approaches and throwing them into a quick benchmark might be worth it! And heck, it’s fun!