Avoiding accidental array mutation when iterating arrays in PHP
Pepperminty Wiki is written in PHP, and I've posted before about the search engine I've implemented for it that's powered by an inverted index. In this post, I want to talk about an anti-feature of PHP that doesn't behave the way you'd expect, and how to avoid running into the same problem I did.
To do this, let's introduce a simple example of the problem at work:
<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
$key = random_int(0, 2000);
$arr[$key] = $i;
echo("[init] key: $key, i: $i\n");
}
foreach($arr as $key => &$value) {
// noop
}
echo("structure before: "); var_dump($arr);
foreach($arr as $key => $value) {
echo("key: $key, i was $value\n");
}
echo("structure after: "); var_dump($arr);
?>
The above code initialises an associative array with 3 elements. The contents might look like this:
Key | Value |
---|---|
469 | 0 |
1777 | 1 |
1685 | 2 |
Pretty simple so far. It then iterates over it twice: once referring to the values by reference (that's what the &
there is for), and the second time referring to the items by value.
You'd expect the array to be identical before and after the second foreach
loop, but you'd be wrong:
Key | Value |
---|---|
469 | 0 |
1777 | 1 |
1685 | 1 |
Wait, what? That's very odd. What's going on here? How can a foreach
loop that's iterating an array by value mutate an array? To understand why, let's take a step back for a moment. Here's another snippet:
<?php
$arr = [ 1, 2, 3 ];
foreach($arr as $key => $value) {
echo("$key: $value\n");
}
echo("The last value was $key: $value\n");
?>
What do you expect to happen here? While in Javascript with a for..of
loop with a let
declaration both $key
and $value
would have fallen out of scope by now, in PHP foreach
statements don't create a new scope for variables. Instead, they inherit the scope from their parent - e.g. the global scope in the above or their containing function if defined inside a function.
To this end, we can still access the values of both $key
and $value
in the above example even after the foreach
loop has exited! Unexpected.
It gets better. Try prefixing $value
with an ampersand &
in the above example and re-running it - note that both $key
and $value
are both still defined.
This leads us to why the unexpected behaviour occurs. For some reason because of the way that PHP's foreach
loop is implemented, if we re-use the same variable name for $value
here in a subsequent loop it replaces the value of the last item in the array.
Shockingly enough this is actually documented behaviour (see also this bug report), though I'm somewhat confused as to how it happens on the last element in the array instead of the first.
With this in mind, to avoid this problem in future if you iterate an array by reference with a foreach
loop, always remember to unset()
the $value
, like so:
<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
$key = random_int(0, 2000);
$arr[$key] = $i;
echo("[init] key: $key, i: $i\n");
}
foreach($arr as $key => &$value) {
// noop
}
unset($key); unset($value);
echo("structure before: "); var_dump($arr);
foreach($arr as $key => $value) {
echo("key: $key, i was $value\n");
}
echo("structure after: "); var_dump($arr);
?>
By doing this, you can ensure that you don't accidentally mutate your arrays and spend weeks searching for the bug like I did.
It's language features like these that catch developers out: and being aware of the hows and whys of their occurrence can help you to avoid them next time (if anyone can explain why it's the last element in the array that's affected instead of the first, I'd love to know!).
Regardless, although I'm aware of how challenging implementing a programming language is, programming language designers should take care to avoid unexpected behaviour like this that developers don't expect.
Found this interesting? Comment below!