Git repositories hold a wealth of interesting metadata in addition to the code itself. The number, frequency, authorship, longevity, etc. of commits reveals a great deal about software and its development. Depending on the content of commits and commit messages you may be able to infer the life cycle of software defects: when are bugs introduced, how long do they last, how disruptive are they and so on. If the bugs are significant or interesting enough to warrant detailed analysis, we can investigate by hand. One of my favorite examples of such an investigation that spans over fifteen years of development by some of the best systems guys out there is the study of sleep and wakeup in Plan 9.
Closer to home, I wanted to know when we’re committing to our core repositories. Internally, we can correlate commits with bugs, deploys, and individual contributors. Eventually, I’d like to be able to answer questions such as “Is there a material difference in the quality of commits at 2 PM and 2 AM local time”? The first step, shown below, was to compute the smoothed histogram of commits over the last 90 days. In the first cut, we ignore weekends and holidays (we’re a start-up, after all!). I did smooth the histogram by replacing the number of commits in each bucket with the average of the three surrounding buckets. The results for a busy repository show that we don’t really do much between the time we get to bed around 6AM US east coast time and when the coffee starts to kick in mid-morning.
I don’t think there a lot of deep conclusions to draw here, but it does look as though we work in four hour episodes. And once in a while, we’re up to all hours making the service better for our customers.
Not everyone is a caffeine fiend, however. The following is a plot compute in the same way as before but for a somewhat larger development team keeping saner hours. I guess lunch is at 1 PM.
Once I’ve had a chance to clean up the script used to generate these graphs I’ll post it. I think it would be interesting to share the commit histograms across a range of applications, development teams, and organization sizes.