DirectoryIterator vs FilesystemIterator

I was working on a project that required iterating over a directory of files recently. Whenever I do this I reach for my old friend RecursiveDirectoryIterator. In this case I only needed the full path to each file. I was using the fact that RecursiveDirectoryIterator returns the full path as the key on each iteration step by default. I was ignoring the value, which is an SplFileInfo object. Looking through the documentation I saw there was a way to make RecursiveDirectoryIterator return just the full path as the value. However I ran into a bug that caused me to dig into the internals of the SPL and figure out how things were working.

Besides RecursiveDirectoryIterator there are two other main SPL classes for iterating over files in a directory: DirectoryIterator, and FilesystemIterator. When SPL was first introduced in PHP 5.0 FilesystemIterator did not exist. It was added in PHP 5.3. I never understood what the difference between it and DirectoryIterator was until now. The documentation does not give many clues. How this relates to RecursiveDirectoryIterator is that class used to extend from DirectoryIterator. When FilesystemIterator was introduced it was changed to extend from that class instead. The reasons for this will be clear later. So what are the differences between DirectoryIterator and FilesystemIterator? Here is what I found out.

DirectoryIterator

When you iterate using DirectoryIterator each value returned is that same DirectoryIterator object. The internal state is changed so that when you call isDir(), getPathname(), or similar methods the correct information is returned. If you were to ask for a key when iterating you will get an integer index value.

<?php
$files = new DirectoryIterator(/*...*/);
foreach ($files as $index => $iterator) {
    /*...*/
}

FilesystemIterator

FilesystemIterator (and thus RecursiveDirectoryIterator) on the other hand returns a new, different SplFileInfo object for each iteration step. The key is the full pathname of the file.

<?php
$files = new FilesystemIterator(/*...*/);
foreach ($files as $fullPath => $info) {
    /*...*/
}

This is by default. You can change what is returned for the key or value using the flags argument to the constructor. Your choices are:

  • CURRENT_AS_PATHNAME
  • CURRENT_AS_FILEINFO
  • CURRENT_AS_SELF Note that this makes FilesystemIterator and RecursiveDirectoryIterator behave like DirectoryIterator
  • KEY_AS_PATHNAME
  • KEY_AS_FILENAME

The bug I ran into has to do with the CURRENT_AS_PATHNAME option. Using it will cause PHP to throw a fatal exception. I made a pull request to fix this and submitted it via Github but as of the date of this blog post has yet to be merged.

I'm not sure about all of the histroy, why DirectoryIterator returned itself but RecursiveDirectoryIterator returned new SplFileInfo objects when SPL was created, but it is clear the FilesystemIterator class was introduced to make the API a little bit cleaner.

Comments