Filesystems Directory Entry Order and Reproducible Testing

File systems are a ubiquitous abstraction in computing: they provide a natural way to manage raw storage and provide a mapping from file names to stored objects. Frequently we can leave the messy details of semantics to the poor soul implementing the file system.  The basic interface is very simple: a file system consists of a tree of files and directories.  A directory contains a set of files, which may in turn be plain files with data or other directories.  Directories support at least list, create, and remove operations and files support read, write, open, close, etc.  The hard part of implementing a file system is actually carving up the disk into blocks and allocating them to files and directories.  The allocator has to be fast, crash recoverable, and lay out files in a way that yields good read and write performance.  In practice, there are many edge cases where different file systems provide slightly different semantics.  In most every day uses, those edge cases don’t matter too much. There is one use where the edge case matters that we do see with some frequency here at Solano Labs: the order in which files are stored in a directory.

“What?!” you say.  Yes, the order in which files are stored in a directory.

Most Unix file systems do not define the order in which directory entries are stored.  Take a look through the ext3 source code in the Linux kernel or any of the variants of UFS/FFS in BSD-land and you will see that not much has changed since the 1970s: directory entries are stored as an unsorted list.  As a result, when you read a directory, the order in which file names are returned is dependent upon the preceding sequence of file create and remove operations performed on that directory. By default, MacOS X does not use UFS, but instead its own modern, journalled file system, HFS+.  Unlike UFS and extN, HFS+ does guarantee the order in which directory entries are retrieved.  This is a happy by-product of using a B+-tree internally; you can read the gory details here.

So why does this matter for testing in the cloud with Tddium, you ask?  Well, many developers use Mac OS X.  Therefore, their test suites inadvertently depend on the order guarantees of HFS+.  This fact most commonly rears its ugly head in two places: loading helpers from spec/support and in Jasmine Headless Webkit tests.  A common idiom in RSpec spec/spec_helper.rb files is to load the files in spec/support:

[sourcecode lang=”ruby”]
Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}
[/sourcecode]

If there are interdependencies among the support files, the order in which they are loaded matters and so what works on Mac OS X may not work under Linux unless the code that loads files in spec/support first sorts the contents of the directory.

This same order difference causes trouble with Jasmine Headless Webkit when it processes its configuration file, jasmine.yml.  When Jasmine reads the configuration file, it constructs a list of directories to search for Javascript code.  At run time, it loads the Javascript files that it finds in each of these directories whose names match the globs in the configuration file.  The order in which Jasmine discovers and loads Javascript is dependent on the order in which the operating system lists the contents of directories.  This can result in inscrutable and hard-to-debug load-order induced problems — what works out of the box on MacOS X need not work under Linux, but depending on the directory operations performed, the results may, by chance, be the same!  The simple solution is to sort the results of expanding the glob patterns in the configuration file so that the results are consistent from one platform to another.  I’m pleased to say that John Bintz, the Jasmine Headless Webkit maintainer, has been fabulously responsive and merged the patch, so if you’re using JHW, this is a solved problem!

The moral of the story?  If you need to be platform agnostic, don’t depend on the operating system to return directory entries in a consistent, let alone sorted, order.

Post a Comment