Thursday, January 16, 2014

Let's at least start to consider killing the plain old file system

Electronic files have been an integral part of how we use computers for a very long time. I think it is time to rethink how software developers use files and the file system. A single file always has a single state on the disk. When the user chooses to ‘save’ the file, the old state is gone forever. There is no way to go back and look at any old state of a file (without the help from some tools). Electronic files made sense in the dawn of computing. Disk space was expensive and we could not afford to store every single change on our small, expensive disk drives.

Version control systems are an acknowledgment that the ‘plain old file system’ does not work for programmers. The first ones were built decades ago when disk space was still expensive and as a result they are full of compromises. They are optimized to be disk efficient. What is very surprising to me is that modern version control systems are built with the same constraints. Git and mercurial were created within the last ten years but they are designed with 1980's disks in mind.

My big complaint about the file system is that, even though disk space has gotten incredibly cheap, we don't store more information about the changes to our files. Since the cost per bit of disk space is ridiculously cheap (and getting cheaper) we should record all the changes to a set of files and make that data available to be manipulated.

Complex electronic creative artifacts like code, novels, and electronic art evolve over time. Using the file system alone makes it impossible to capture that evolution. Using version control tools help capture some of the history but they require that the user actively think about using them. Good luck getting an author or artist to use git, the CLI is terrible and only a software developer would put up with it. In addition, animating multiple consecutive changes in a version control system to show long term evolution is difficult. It is going to be a challenge for future scholars to see how modern authors work if the authors are not vigilant about storing intermediate versions of it. Compare this with hand written manuscripts from one hundred years ago with edits in line.

You might be thinking, “it is good that some of that file history is lost forever, it represents work that turned out to be not very good” but I argue that there are some lessons that can be learned from those experiences. It is very difficult to see how others do creative work when it is done on a computer. Since we can't see how people work, we can't learn from their experiences. We need to open up those experiences so others can learn.

If we stored all of the changes all of the time and we stored some additional data like who made each change, when was it made, and where the author was when they made the change, we would start to open up some learning opportunities. Google docs and other cloud based tools are recording more and more of this kind of historical information. They are not worried about disk space.

The project I am working on, Storyteller, seamlessly records all file based interactions for software developers by extending their IDEs. My hypothesis is that the most valuable metadata that can be stored is an author supplied commentary on a set of changes. This commentary will follow the animated changes and offer an explanation, a hint, or a lesson learned from the author. These can be shared between developers to open up the programming process so that we can learn from each other.

The real issue is that there are some files that are meant to be consumed left to right, top to bottom but are rarely created that way. The compiler reads source code files in this way but we all know that code is never written that way. The process of creating those artifacts is often lost because of our use of electronic files and supporting tools. I can envision a new abstraction over the file system that does this for all files like this. Perhaps Storyteller will be the answer for software developers, or maybe it won't. Regardless, I believe we need to do something to move the electronic file into the 21st century.

Tuesday, January 7, 2014

Forget pair programming, I like pair debugging

I am generally a fan of pair programming. Having said that, though, I rarely feel at a loss for good ideas on how to solve problems when I am working on small, incremental additions by myself. Making small, incremental changes is what most of us do most of the time. Often, I write code without any input from team members at all and do pretty well. Other times I will have a short discussion with other developers to discuss ideas and weigh options. For me, this seems to be enough to generate good code. Most of the work that I do without the input from a pair programmer seems to turn out well. So, I tend not to pair that often.

The only time I ever really feel any kind of despair while programming is when I am dealing with a tricky bug. When I am working on a change that breaks something unexpectedly I often feel a sense of panic. Something I have done has had an effect on another part of the system. Many times, I will not know a lot about the broken part of the system. This is when I feel a strong need to sit down with someone else who knows about it and talk through what I did and how it might have affected it. In fact, I often wish I had more than one additional person to talk to.

Of course, one might argue that if I had pair programmed with the 'right' person in the first place I wouldn't have made the error or felt the panic. However, it is difficult to anticipate what is going to go wrong and, therefore, who to pair with. So, at least for me, I don't feel like pair programming all of the time is the best use of my partner's time. I feel like having short discussions with my team members and pair debugging on demand is the most effective for me.

I am all for pairing when I don't have a good, strong idea on how to solve a problem or either I lack some experience or my partner does. I like to pair when I am learning a new system or subsystem. I also choose to pair in the very early stages of a system's development when everything is brand new and just being built.

Mostly, though, I like how knowledge spreads from more informed developers to less experienced ones. I don't insist that every single line of code be written in a pair for my projects. I tend to use pair programming as a learning tool more than a creativity tool. I feel it is most valuable when I know I am lacking some knowledge, that is, after I have broken something and can't figure out how to fix it.