Starbeamrainbowlabs

Stardust
Blog

Avoiding accidental array mutation when iterating arrays in PHP

Pepperminty Wiki is written in PHP, and I've posted before about the search engine I've implemented for it that's powered by an inverted index. In this post, I want to talk about an anti-feature of PHP that doesn't behave the way you'd expect, and how to avoid running into the same problem I did.

To do this, let's introduce a simple example of the problem at work:

<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
    $key = random_int(0, 2000);
    $arr[$key] = $i;
    echo("[init] key: $key, i: $i\n");
}

foreach($arr as $key => &$value) {
    // noop
}

echo("structure before: "); var_dump($arr);

foreach($arr as $key => $value) {
    echo("key: $key, i was $value\n");
}

echo("structure after: "); var_dump($arr);
?>

The above code initialises an associative array with 3 elements. The contents might look like this:

Key Value
469 0
1777 1
1685 2

Pretty simple so far. It then iterates over it twice: once referring to the values by reference (that's what the & there is for), and the second time referring to the items by value.

You'd expect the array to be identical before and after the second foreach loop, but you'd be wrong:

Key Value
469 0
1777 1
1685 1

Wait, what? That's very odd. What's going on here? How can a foreach loop that's iterating an array by value mutate an array? To understand why, let's take a step back for a moment. Here's another snippet:

<?php

$arr = [ 1, 2, 3 ];

foreach($arr as $key => $value) {
    echo("$key: $value\n");
}

echo("The last value was $key: $value\n");
?>

What do you expect to happen here? While in Javascript with a for..of loop with a let declaration both $key and $value would have fallen out of scope by now, in PHP foreach statements don't create a new scope for variables. Instead, they inherit the scope from their parent - e.g. the global scope in the above or their containing function if defined inside a function.

To this end, we can still access the values of both $key and $value in the above example even after the foreach loop has exited! Unexpected.

It gets better. Try prefixing $value with an ampersand & in the above example and re-running it - note that both $key and $value are both still defined.

This leads us to why the unexpected behaviour occurs. For some reason because of the way that PHP's foreach loop is implemented, if we re-use the same variable name for $value here in a subsequent loop it replaces the value of the last item in the array.

Shockingly enough this is actually documented behaviour (see also this bug report), though I'm somewhat confused as to how it happens on the last element in the array instead of the first.

With this in mind, to avoid this problem in future if you iterate an array by reference with a foreach loop, always remember to unset() the $value, like so:

<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
    $key = random_int(0, 2000);
    $arr[$key] = $i;
    echo("[init] key: $key, i: $i\n");
}

foreach($arr as $key => &$value) {
    // noop
}
unset($key); unset($value);

echo("structure before: "); var_dump($arr);

foreach($arr as $key => $value) {
    echo("key: $key, i was $value\n");
}

echo("structure after: "); var_dump($arr);
?>

By doing this, you can ensure that you don't accidentally mutate your arrays and spend weeks searching for the bug like I did.

It's language features like these that catch developers out: and being aware of the hows and whys of their occurrence can help you to avoid them next time (if anyone can explain why it's the last element in the array that's affected instead of the first, I'd love to know!).

Regardless, although I'm aware of how challenging implementing a programming language is, programming language designers should take care to avoid unexpected behaviour like this that developers don't expect.

Found this interesting? Comment below!

Sources and further reading

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael