SiteMap
 USMS History & Archives Committee

File format for durable storage

In the near future, APL files in APL (4211-4227) will not be needed because the new import procedure (written when Pieter put a comma between FNAME LNAME) can read CSV files, reconcile them with the SwimmerID registry (assigning the SwimmerID), and can write out the HTML files which in turn can be directly uploaded. That means the USMS archives can use as their durable archival format files the very same format that is generated by Pieter Cath, Walt Reid, and/or John Bauman.

The Windows facility being developed by Bill Parke can be the means by which we correct comma delimited Top Ten files and can also be used as one of the means for data entry to create new files. There may be many other means as well, as Pieter will attest 3 times yearly.

Under this approach, it will be important that all Top Ten files be complete for the new information released at any given point in time. That means we will have 3 archive files for each year (SCY, LCM, SCM). The very same file sent to the H&A Committee by Walt, Pieter, or John could be the permanent durable archive file if our Swimmer ID system were foolproof.

However, it is not. For example, the only swimmer information available for recognizing someone in the Top Ten files is Name and Age. We have many cases of two different swimmers with the same Name and Age, so the SwimmerID system cannot assign the SwimmerID with 100% accuracy. When a swimmer complains that he/she has not gotten proper recognition, and when we see that he/she is probably correct, then we can manually assign the correct SwimmerID. Therefore we do need to save an "interim" file, though it could be a comma delimited file with SwimmerID and related information included.

The RFP that I write will pose CSV files as a possible solution for those firms responding. Some firms may not think that is a good idea and may propose a different solution, but this will be a possibility and it has advantages. The most obvious advantage is that archives can be completely independent of any particular database format. Should a new approach ever be desired, all Top Ten files are in their pristine original (comma delimited) state plus an additional version with SwimmerID added.

Once we make any specific database format choice, we've left out a great many potential contributors. And if it is a format with the greatest ease of use (e.g. Excel), we've sacrificed professional database standards and capabilities. If we choose a high end database (SQL Server, Oracle, or even Access), we limit our most effective workers to a small group who have professional skills with whatever high end database we choose.

By having our permanent, durable archive format be comma delimited, we make it possible for the broadest set of skills to be available because virtually any software can import comma delimited files. This does not mean we should automatically trust everybody's work. The need for professional and careful discipline is needed to get good performance out of a database. But it does make possible creative work that we who work with the data most might not even think of, like work in human performance laboratories of/and/or universities.

Another solution will be to put all Top Ten data into a modern, powerful format and make very flexible retrieval from on-line databases possible. That possibility must be fully explored as well. Many people would say this is the most logical strategy. They would say that a modern powerful on-line database can deliver anything anyone wants in whatever format they want it in including comma delimited.

I think our choice will be a choice between our desires for flexibility, avoiding unconventional technology, and a desire to stay within our $18,000 budget. And, remember that the real cost here is not putting data into a new format, it is administering the SwimmerID.

Thanks to Pieter Cath for persisting over the years in insisting that comma delimited files were logical as an archive format.

©Copyright 1997-2002 USMS.
All Rights Reserved
horizontal line
What's New Page to home page e-mail Page Top