Generators

06 Oct, 2022 (updated: 02 Sep, 2023)
2046 words | 10 min to read | 6 hr, 8 min to write

If your background is python, it’s most likely you know what generators are and how are they useful, but if you’ve been doing mostly php or js you may be slightly confused.

I’ve found generators are not very popular in php community, and I want to fix it. So let’s get into (not only) php generators explained:

My acquaintance with generators

I first heard about generators when I started using python. At first, generators looked a little weird to me, but once I got used to using them, I started to realise how badly I wanted those in php. And, finally, in php 5.5.0 generators made their way into php. I was quite surprised the php community wasn’t big on that, and even the opposite is true - some very experienced php devs aren’t using them even today even though generators bring so much value!

What generators are?

In simple words, generators are simplified iterators

In fact, Generator implements Iterator:

final class Generator implements Iterator {
public current(): mixed
public getReturn(): mixed
public key(): mixed
public next(): void
public rewind(): void
public send(mixed $value): mixed
public throw(Throwable $exception): mixed
public valid(): bool
public __wakeup(): void
}
interface Iterator extends Traversable {
public current(): mixed
public key(): mixed
public next(): void
public rewind(): void
public valid(): bool
}

As you can see the difference is generators also implement a few additional methods: getReturn, send, throw, and __wakeup.

Methods send and throw are there for bi-directional communication with the generator. getReturn allows getting the return of the generator-function.

__wakeup simply throws an exception since generators can not be serialised.

Wait a second! You said generators are simplified iterators. How do additional methods of the interface make them simpler?

Well spotted, we will discuss it a little later ;)

Iterators

To understand how generators are useful, we need to understand how iterators are useful because it’s not the most intuitive to see for a beginner developer. I bet a lot of devs think iterators are some fancy technique that allows iterating over an object and can be replaced with a simple array.

Consider the following example:

class NumbersIterator implements Iterator
{
protected int $from;
protected int $to;
protected int $current;
public function __construct(int $from, int $to)
{
$this->from = $this->current = $from;
$this->to = $to;
}
public function rewind()
{
$this->current = $this->from;
}
public function current()
{
return $this->current;
}
public function valid()
{
return $this->current <= $this->to;
}
public function next()
{
$this->current++;
}
public function key()
{
return $this->current;
}
}

The way it’s used is quite obvious, but you might wonder how is it different from just iterating over an array of numbers:

$numbers = new NumbersIterator(0, 10);
foreach ($numbers as $number) {
// prints number from 0 to 10
echo "$number\n";
}
$numbers = range(0, 10);
foreach ($numbers as $number) {
// prints number from 0 to 10
echo "$number\n";
}

If you revisit the implementation of the iterator you will notice, - we are not storing the whole array of numbers at any given time, and as such we are not using as much space in memory as the array implementation. Iterator implementation only stores 3 integers: $from, $to, and $current. Then new $current is calculated from the current $current.

$memSnapshot = memory_get_usage();
$numbers = new NumbersIterator(0, 100000);
foreach ($numbers as $number) { }
echo "memory used: " . (memory_get_usage() - $memSnapshot) . "\n";
> memory used: 96

As we can see, the iterator implementation only uses 96 bytes of memory. Let’s check how much memory uses the array implementation:

$memSnapshot = memory_get_usage();
$numbers = range(0, 100000);
foreach ($numbers as $number) { }
echo "memory used: " . (memory_get_usage() - $memSnapshot) . "\n";
> memory used: 4198480

If you ran the code above you might have noticed the iterator implementation takes much longer to complete and this is the drawback of iterators.

Iterators are great for optimising for memory usage but could be slow depending on what exactly you are doing.

Now, you may wonder, how the iterator implementation is different from just a for loop?

$from = 0;
$to = 10;
for ($current = $from; $current <= $to; $current++) {
echo "$current\n";
}

As you can see, we are also storing the same $from, $to, and $current. So, why the heck would you use an iterator over this approach?

Well, firstly, we are comparing apples to bananas here. In many cases, you definitely wanna use just a loop over an iterator. Iterators provide you with a generic interface for iterating things. If you think broadly, the NumbersIterator could’ve been implemented in many different ways. We could even use something like a strategy pattern to swap out the implementation. The underlying data structure could also be different. We could even use an array. Please note, that I’m not encouraging the following implementation, treat it as just a showcase:

class NumbersIterator implements Iterator
{
protected array $items;
protected int $currentIndex = 0;
public function __construct(int $from, int $to)
{
$this->items = range($from, $to);
}
public function current()
{
return $this->items[$this->currentIndex];
}
public function next()
{
$this->currentIndex++;
}
public function valid()
{
return isset($this->items[$this->currentIndex]));
}
public function rewind()
{
$this->currentIndex = 0;
}
}

So, iterators being a generic interface, are far more flexible. It doesn’t mean you never use loops again of course. It means when there is a need to be more flexible you have the perfect option of using iterators.

As a software engineer, you have to have a sense of what’s preferable to use in specific cases.

Iterators can also be passed around, which may be quite handy:

$numbers = new NumbersIterator(0, 1000);
function counter(NumbersIterator $numbers) {
// make sure we reset the iterator from possible previous use.
$numbers->rewind();
foreach ($numbers as $numer) {
// do something
}
}

And finally, since iterator is calling Iterator::next to know what the next element is, we can calculate or even fetch the next object (let’s say via an HTTP call or a database query) without knowing it beforehand.

A perfect example of it is something that I do use from time to time. Let’s say you need to iterate over a really large list of data that you get from the database. Let’s assume you need to read the whole database to validate data in a once-off script. What do you do?

$record = $db->query('SELECT * FROM really_big_table')->fetchAllAssoc();
foreach ($records as $record) {
validate($record);
}

Isn’t it a great option, is it? What if have 100s of megabytes of data in there?

What you could do is read data in chunks using an iterator:

class DatabaseIterator implements Iterator
{
protected db $db;
protected string $table;
protected int $chunkSize;
protected array $currentChunk;
protected int $currentChunkIndex = 0;
protected int $offset = 0;
public function __construct(db $db, string $table, int $chunkSize = 1000)
{
$this->db = $db;
$this->table = $table;
$this->chunkSize = $chunkSize;
}
public function next()
{
if (isset($this->currentChunk[$this->currentChunkIndex + 1])) {
$this->currentChunkIndex++;
return;
}
// Fetch next chunk.
$this->currentChunk = $this->db
->select("*")
->from($this->table)
->limit($this->chunkSize)
->offset($this->offset)
->fetchAllAssoc();
$this->offset += $this->chunkSize;
$this->currentChunkIndex = 0;
}
public function current()
{
return $this->currentChunk[$this->currentChunkIndex];
}
public function valid()
{
$cnt = count($this->currentChunk);
return $cnt !== 0 && $cnt <= $this->chunkSize;
}
public function rewind()
{
$this->offset = 0;
}
}

Now you can use it the same way, but at any given time only 1000 items will be fetched from the database and stored in memory:

$dbIter = new DatabaseIterator($db, 'really_big_table');
foreach ($dbIter as $record) {
validate($record);
}

Pretty cool technique, isn’t it? You may wonder how is it different from unbuffered reads (i.e. when you use MYSQLI_USE_RESULT instead of MYSQLI_STORE_RESULT)? Unbuffered reads will do many more round-trips to the database, so the above achieves both optimisation for memory AND network.

Let’s sum this all up.

Advantages of using iterators

  • provide a generic interface for iterating over a set of data
  • the underlying implementation and data structure may be swappable
  • can be used to optimise memory usage
  • can be passed around
  • next element of the iteration may not be known beforehand

How generators are simplified iterators

So far we’ve talked about how iterators (yet aren’t a silver bullet) have many advantages, but we didn’t talk about disadvantages much. One big disadvantage that is quite obvious is how much code you have to write for even very simple cases. Take a look at the first implementation of the NumbersIterator. That’s 36 lines of code just to implement numbers iterator.

Let’s re-implement it using a generator instead:

function numbersGenerator(int $from, int $to): Generator
{
$current = $from;
while ($current <= $to) {
yield $current++;
}
}

The above is the like-to-like implementation of the NumbersIterator using a generator. Since generator can not be rewound once the first value is yielded (this draws the Generator::rewind method practically useless), we can even improve it a little. Since $current = $from, we can simplify the above implementation to:

function numbersGenerator(int $from, int $to): Generator
{
while ($from <= $to) {
yield $from++;
}
}

Yes, there is a disadvantage of not being able to rewind the generators once the first value is yielded (and why would you want to rewind otherwise?), but if you don’t need it, compare how much simpler the generator is.

Let’s even have some fun and rewrite the DatabaseIterator to use a generator. I’m pretty confident we won’t need a rewind, so it’s perfectly fine to swap out an iterator for a generator in this case.

function databaseIteratorGenerator(db $db, string $table, int $chunkSize = 1000): Generator
{
$offset = 0;
do {
$currentChunk = $db->select("*")->from($table)->limit($chunkSize)->offset($offset)
->fetchAllAssoc();
foreach ($currentChunk as $record) {
yield $record;
}
$offset += $chunkSize;
$cnt = count($currentChunk);
} while ($cnt !== 0 && $cnt <= $chunkSize);
}

I think it should go without saying how much easier the generator implementation is compared to iterator. If you have loads of data in a database table, yes it will take a lot of time to complete, but at least you won’t exhaust the memory, or the network, and won’t put too much strain on the database server as well. And all in just 12 lines of reusable code that provides a very simple interface:

$records = databaseIteratorGenerator($db, 'big_table');
foreach ($records as $record) {
validate($record);
}

There are some other interesting cases for generators. For instance, infinite generators. Let’s say you need to generate an infinite sequence of alternating 0s and 1s:

function flippingBits(): Generator
{
$i = 1;
while (true) {
yield ($i ^= 1);
}
}
// Let's print some 0 1 0 1 0 1 0 ...
$bits = flippingBits();
$runs = 10;
while ($runs-->0) {
echo $bits->current() . "\n";
$bits->next()
}

I’m sure once you get a hang of generators, you will soon see where they can be applicable!

Generators in other programming languages

Generators are implemented in many popular programming languages.

  • C++
  • Java
  • Python
  • PHP
  • Javascript
  • C#
  • Ruby
  • Perl
  • … and some others …

Here is an example of a generator in python:

def numbers(start, end):
while start <= end:
yield start
start += 1
for i in numbers(0, 10):
print(i)

As you can see the python implementation isn’t vastly different to the php implementation.

The above is just an example. You’d use range in python 3 and xrange in python 2. These are both generators in respective versions (but who uses python 2 these days?)

As you can see generators are not a feature that only exists in a particular language, so it’s a transferrable knowledge and skill, and very much worthful to learn.

Conclusion

I hope the above demonstrates the power of generators. Yet they can’t replace loops or iterators, they are definitely worth the time you invest into learning. They are inheriting most of the advantages of iterators and they are arguably easier to write and read.

  1. Generators
  2. Simple code