Loris Cro

Personal Website
About   •   Twitter   •   Twitch   •   YouTube   •   GitHub

Zig's Curious Multi-Sequence For Loops

February 27, 20239 min read • by Loris Cro

Zig has just gained new for loop syntax that allows you to iterate on multiple slices / arrays at the same time. In this blog post I'm going to explain in detail the rationale behind this choice, while also introducing you to a couple useful patterns that the syntax is meant to encourage.

If you want to try it out, you will need an unstable build of Zig, which you can get from the downloads page.

Basic syntax

The most basic for loop syntax in Zig is still the same as before.

const elems = [4]usize{ 10, 20, 30, 40 };

for (elems) |x| {
   std.debug.print("{} ", .{x});
}

This prints:

10 20 30 40 

If you're new to Zig, you might be surprised by the |x| syntax. That's called a capture in Zig and in the case of for loops is how you can, well, capture the iteration value and give it a name.

Ranges

The new syntax also supports ranges, which are a new construct in Zig.

for (0..4) |n| {
   std.debug.print("{} ", .{n});
}

This prints:

0 1 2 3 

Ranges can also start from something other than zero.

for (1..5) |n| {
   std.debug.print("{} ", .{n});
}

This prints:

1 2 3 4 

Ranges can only exist as an argument to a for loop. This means that you can't store them in variables, but you can use variables to specify their bounds.

var a: usize = 10;
var b: usize = 15;

for (a..b) |n| {
   std.debug.print("{} ", .{n});
}

This prints:

10 11 12 13 14

Multi-sequence syntax

The new multi-sequence syntax allows you to loop over two or more arrays or slices at the same time:

var elems = [_][]const u8 { "water", "earth", "fire", "air" };
var nats = [_][]const u8 { "tribes", "kingdom", "nation", "nomads" };

for (elems, nats) |e, n| {
   std.debug.print("{s} {s}\n", .{e, n});
}

This prints:

water tribes
earth kingdom
fire nation
air nomads

There's only one simple rule when it comes to the length of the sequences: all lengths must match. Passing arrays of different length is safety-checked UB (i.e. you will get a panic in safe release modes).

Ranges as indexes

To iterate over a sequence and also keep track of the element's index, you can add a range to the list of sequences you want to iterate. Since all sequences must have the same length, you can omit the upper end of the range and let Zig automatically infer it from the other sequences.

var elems = [_][]const u8 { "water", "earth", "fire", "air" };
var nats = [_][]const u8 { "tribes", "kingdom", "nation", "nomads" };

for (elems, nats, 0..) |e, n, idx| {
   std.debug.print("{} - {s} {s}\n", .{idx, e, n});
}

This prints:

0 - water tribes
1 - earth kingdom
2 - fire nation
3 - air nomads

Other properties of for loops

Up until now we saw the new changes to for loops, but if you're new to Zig you might not know all the other things they support, so I'll quickly recap them in this section.

Pointer to the element

Value captures in Zig should always be understood as immutable copies. To ask for a pointer you can add a * before the capture name.

var good_digits: [3]usize = .{4, 2, 0};

for (&good_digits) |*d| {
   d.* = 6;
}

// for (good_digits) |d| {
//    d = 6;
// }
//
// error: cannot assign to constant
//     	d = 6;
//     	^

Labels, break and continue

You can give labels to loops, which helps breaking and continuing iteration at the right level.

const vowels = "aeiou";
const text = "lorem ipsum";
var missing = false;

outer: for (vowels) |v| {
   for (text) |x| {
      if (x == v) continue :outer;
   }
   missing = true;
   break :outer;
}

else for for loops

In Zig you can give an else branch to a for loop. The else branch triggers when the loop ends naturally, as opposed to breaking from it.

This models beautifully searching for an element in a sequence: if the element is found, you will break from the loop, while if it's not found then the loop will end naturally, at which point the else branch will allow you to implement the "not found" case.

for loops can also be used as expressions, which works particularly well in this case.

const text = "abcdef";
const needle = 'e';

const match: ?usize = for (text, 0..) |x, idx| {
   if (x == needle) break idx;
} else null;

Inlined for loops

It's possible to operate on heterogeneous sequences of values with for loops when doing comptime metaprogramming. You can learn more in this old blog post of mine.

Multi-sequence for loops and data oriented design (DOD)

Say that you have a game where each monster has an element type, a counter for hit points, and a unique "dna" string used to procedurally generate stats for each monster's offspring (and to give an early taste of how it feels to play with slot machines to young kids).

const Monster = struct {
   elem_type: enum{ fire, water, wind, earth },
   hp: usize,
   dna: [33]u8, // gambling department demands 
                // we use exactly 33 bytes
};

First of all, you would probably want the in-memory representation of this struct to place hp at the top of the struct in order to avoid the need for padding inside the struct to maintain its natural alignment (because its type is usize, which has 8 byte alignment on common 64bit machines).

Luckily, this is done automatically by Zig (you can use a extern struct if you want field ordering to work like in C), but even then, the struct has alignment 8 and size 42, which means that it needs 6 bytes of padding at the end to keep the alignment consistent in an array (ie @sizeOf([2]Monster) == 96), so in the end some padding is inevitable given the size of our fields. 6 bytes might not seem much, but it does mean that each monster instance wastes roughly 15% of its size just for padding.

One technique that lets us prevent that waste is to avoid representing our monsters as an array of structs (AoS), and instead "deconstruct" them into multiple arrays, one per field (also known as struct-of-arrays, SoA):

monster_hps: []usize,
monster_dnas: [][33]u8,
monster_elem_types: []enum{ fire, water, air, earth },

This memory layout wastes no bytes and also lets us operate more efficiently on our data.

Let's say that fire monsters gain one hit point every tick of our game. This means that every tick we want to look at each monster's elem_type and based on that we increment its hp by one.

If we were to do this with the original array-of-structs layout, for each monster we would have to load from memory 39 bytes of data that we don't care about (33 from the dna field, plus 6 of padding) over the 9 bytes that we do need. That's a waste rate of more than 400%!

With the struct-of-arrays layout we only load from memory data that we do care about, which can have a tremendous effect on performance.

for (monster_elem_types, monster_hps) |et, *hp| {
   if (et == .fire) hp.* +|= 1; // saturating addition
} 

The Zig standard library has MultiArrayList, a data structure that helps make DOD style programming even more ergonomic. Here you can read more about it.

Hoisting safety checks

In low-level programming languages, accessing an array item corresponds to adding an offset to a pointer value and then dereferencing it. This operation is very fast but, if the logic is wrong, one could end up reading past the end of an array and the program wouldn't even notice.

In Zig out-of-bounds array accesses are safety-checked in safe build modes which means that the compiler adds a hidden assertion whenever an array access is about to happen.

var idx: usize = 5;
assert(idx < my_slice.len); // secretly added by the compiler
_ = my_slice[idx];

If we were to implement the previous game feature (fire monsters getting 1 hp every tick) without multi-sequence for loops, we would have to do something like this:

var idx: usize = 0;

while (idx < monster_count) : (idx += 1) {
   const et = monster_elem_types[idx]; // potential oob
   const hp = &monster_hps[idx]; // potential oob

   if (et == .fire) hp.* +|= 1;
} 

Unfortunately, the Zig compiler would have to insert two hidden assertions with this version of the code: one before the assignment to et, and one before the assignment to hp.

In the multi-sequence for loop version it's only necessary to test once at the beginning of the loop that the two arrays have equal size, instead of having 2 assertions run every loop iteration. The multi-sequence for loop syntax helps convey intention more clearly to the compiler, which in turn lets it generate more efficient code.

Of course, with sophisticated-enough static analysis the compiler could prove that monster_count is always equivalent to monster_elem_types.len and monster_hps.len, and thus it could elide the assertions, but static analysis slows compilation times and tends to be a fragile thing, like this amazing blog post about loop optimizations in C# shows.

Multi-sequence for loop syntax doesn't slow down compilation times and guarantees that you get good performance also in debug builds, where advanced optimizations are disabled and compilation times matter the most.

Conclusion

Zig is already a compelling programming language and toolchain, but there's more design space to explore ahead of us before we can tag v1.0.

We recently self-hosted the compiler and optimized our development process in order to make it as smooth as possible to explore new design ideas like multi-sequence for loops.

Zig describes itself as a programming language for maintaining robust, optimal and reusable code, and multi-sequence for loops are a shining example of how the language tries to strike a compelling balance between clarity, performance and safety.

Comptime metaprogramming allows us to have std.MultiArrayList, a userland implementation of AoS/SoA transformation, which makes it easier to do DOD-style programming, while multi-sequence for loops ensure that we get all the safety of out-of-bounds checks without impacting runtime performance nor compromising on compilation times.

If you like where we're going, please consider sponsoring the Zig Software Foundation.


Zig Is Self-Hosted Now, What's Next?   •   A Note About Zig Books for the Zig Community   or   Back to the Homepage