Efficiently creating data chunks in PHP
While I was working on a library to build sitemaps and sitemap indexes, I identified the need to aggregate several iterators and then build chunks of a precise size from this aggregate.
To be clearer: given a list of URL providers for a sitemap index, I wanted to iterate over all the data exposed by these providers in chunks of 50.000 (the maximum number of URL allowed in a sitemap).
Each provider being an Iterator, I can chain them using nikic’s iter
library:
1$firstUrlProvider = …;
2$secondUrlProvider = …;
3
4$aggregatedProvider = iter\chain($firstUrlProvider, $secondUrlProvider);
I now have a way to iterate over all the URLs exposed by my providers, with a
single foreach
statement.
My next issue comes from the need to build as much sitemap as I need, with a maximum number of 50.000 URL in each sitemap and without knowing how much URL I have nor storing anything unnecessary in memory.
To put it in other words: I want to be able to generate chunks of 50.000 URL
from my $aggregatedProvider
, without storing the whole chunks in memory.
Looks like it’s a job for PHP generators!
1function chunk(\Iterator $iterable, $size): \Iterator
2{
3 while ($iterable->valid()) {
4 $closure = function() use ($iterable, $size) {
5 $count = $size;
6 while ($count-- && $iterable->valid()) {
7 yield $iterable->current();
8 $iterable->next();
9 }
10 };
11 yield $closure();
12 }
13}
In the previous function, I delegate the iteration over my provider to a Generator, that uses both the provider itself and the desired size of the chunk to efficiently yield my URLs.
When put together, I obtain a nice way to iterate and make chunks from a large amount of data.
As I said in another blog post, Generators are awesome!