LTFS Data File Format and Size Considerations

Deciphering the good, the bad and the ugly

LTFS 2 Technology: Segment 7


Jay Livens:
We talked about how LTFS relies on LTO tape, which, of course, is linear type of medium. What I'm wondering about is, does that mean that there's some data types that are better for LTFS and hence, LTO? What did you find in your testing that if I wanted to use LTFS, is there some kind of types of information, file size, that might be better for using on this medium than, perhaps, alternatives?

Michael Richmond:
That's an interesting question. At the very high level, there's nothing about LTFS that cares, one way or another, about the data that is written to it. The only constraint there is that an LTFS volume has a maximum capacity, which is constrained by the data capacity on an LTO cartridge. Writing files to a LTFS volume, you need to ensure that your files will physically fit on the cartridge that you're writing it to. This is the same constraint that we have with hard drives and thumb drives. If you try to drag a file that is larger than your thumb drive to your thumb drive, you'll get an error. You get exactly the same error from LTFS when you try to do that.

At a usability level, there are some data file formats that can lead to different user experiences than one might expect initially. For example, I mentioned earlier that one of our demos was being able to play back high-definition 1080p video directly from data tape. This a demonstration that we have shown multiple times in public. This works, but it's highly dependent on the format of the video file itself. Some video file formats lay down important data that is required to be accessed during playback or before playback at the end of the file.

With these formats, if you're trying to play back video, it tends to be jumpy because the software performing the playback is constantly seeking from the end of the file to the middle of the file to continue playback and possibly going back to the end of the file. That data access probably does not provide the expected user experience. And that's just a limitation of the linear tape media itself. Using the same video clip, but writing that video clip in a different video format can achieve completely smooth playback if this new format that you use doesn't require seeking backwards and forwards through the file during playback.

These are differences between existing, well-established video file formats. This is just about how the existing, well-known formats need to be accessed in order to perform playback. That's one example of – you can't get away from the fact that your data is stored on tape so there may be delays. But the delays are well within the bounds of the expected behavior of the LTO media.

Just to give listeners some clear data points here, an LTO-5 cartridge contains approximately 995 meters of data tape. That's approximately one kilometer. That tape is spooled around a spool within the cartridge. When you access the data tape, you don't just read from the beginning to the end. Because the tape is actually laid out in multiple tracks across the width of the tape.

To do a complete read of an LTO-5 data tape, you read from the beginning to the end. The tape drive reverses direction and then reads from the end to the beginning. That pair is known as a wrap and there are 80 wraps across the half-inch width of the data tape. When you do a complete read of an LTO-5 data tape, you are actually traversing over 160 kilometers of tape in your drive. Those are the distances involved, approximately. I'm not a hardware guy. I'm a software guy, so the details might be slightly off but it's on that order.

Worst-case seek time from the beginning of the cartridge to the end of the cartridge is 90 seconds with IBM LTO drives. I understand that HP and Quantum drives have similar seek time. With the 90-second, worst-case seek time from one end of the cartridge to the other, your average seek time is 45 seconds. If you've written a file to LTFS, all of the data associated with that file is, generally speaking, in roughly the same place on the tape.

Seeking within the file is maybe delays of a few seconds, maybe 10s of seconds, if you're talking about files that are hundreds of gigs in length. In addition, the tape drive mechanism allows lateral movement between wraps. That movement time is on the order of a second or two. In practice, we see that average case for random seeks is 45 to 50 seconds and real world seeks are more along the lines of less than 10 seconds, maybe 20 seconds, as an outside case.

These access times are long compared to hard drive access times but in terms of user experience of people sit and waiting for web pages to download for longer than 10 seconds at a stretch or 20 seconds at a stretch. The real key to working out whether it makes sense to write a particular file onto LTO or onto LTFS is the length of time it will take to write it. Because LTO tape drives actually read and write data faster than current hard drives once you get to the point where you want to store the data or you need to read the data from.

There's this inflection point that exists when you get to large enough files, and large enough is in the order of 10s of gigabytes, if memory serves. This inflection point is when the faster data transfer time offsets the seek time penalty that you'll incur accessing LTFS. When the data transfer time is significantly larger than the seek time, then the seek time becomes irrelevant because you're pushing the drive as fast as it possibly can go.

The Speakers:

Michael Richmond
Jay Livens