[Geysers] Time code

Gordon Bower taigabridge at hotmail.com
Thu Jan 17 17:38:20 PST 2013



I am one of the several who sent comments privately when this first surfaced, but I will repeat a few of the comments here since this has become the preferred place for it to be.

First, "0000?"-for-overnight has to go. The "?" is doubt about whether there was an eruption at all, or doubt about which geyser it was (the most common time I see these is for something like Oblong, where a steam cloud comes up but the reporter isn't 100% sure what it is from.)

Second, as many others have said, being able to see the comments more readily in Geysertimes would be a big help, and in the short term, solve most of the problem for human readers: just enter "overnight" as a comment the next morning.

But that leaves the general question of how to represent uncertain times in the database, especially if you want to be able to use it for any automatic analysis. Sometimes the logbook says "overnight", sometimes it says "marker replaced at XXXX," sometimes it says "erupted between XXXX and YYYY." And, really, NS and IE times are in the same boat, as far as what the start time was.

I floated the suggestion that the internal storage for some events -- perhaps for every event -- might include an "event happened after" time and an "event happened before" time. If the logbook says "erupted between December 14th and December 20th," fill them in with 0000 14 Dec and 2359 20 Dec. If it says "erupted between 2330 and 0400," fill them in. If it says "overnight," fill in appropriate conservative placeholders. (If Grand is 1234IE, fill in 1222 and 1234. If Old Faithful is 1234IE, fill in 1229 and 1234.)
The actual text of the logbook entry, in the case of paper entries and in the case of Geysertimes comments, needs to be preserved. But the authors of the database may find it useful to have an internal means of representing uncertaintly about the time an event happened. My suggestion was simply that if they are going to represent uncertainty at all within the database they should handle the problem in as general of a way as possible. "0000A" feels like a bandaid rather than a permanent solution.

Ultimately they should do whatever works for their own purposes, and the rest of us will make use of such portion of their data as suits our purposes. A usable database of eruption times is really a very hard thing to build well. My own old logbooks I simply transcribed into Word, not into a spreadsheet or a database, because I wasn't happy with how I could represent things -- when I do a research project, I can sometimes cut and paste, but more often, I retype the data, into whatever software I am using to crunch the numbers. Similarly,  I've never been able to cut and paste data from Lynn's transcribed logbooks easily either, let alone directly import it. For my purposes they are searchable text files, from which I recopy numbers into a spreadsheet by hand. They are a prime example of how the format works for some people and not for others.

Finally, there is a lot of interpretation needed to decipher the old records. That is unavoidable. Deciding what's a single vs. a double interval, and that sort of thing. At least in the 1990s there are clues lurking in the spacing and punctuation (and use of whiteout!) in the paper logbook. Some of which is preserved in lynn's transcriptions and some of which isn't.

If there is going to be a big project to "make old data available" I wish that project included making scans of the logbook pages available. I would actually rate publishing the scans as a more important thing to do than typing the transcriptions into the database. Perhaps storage is cheap enough to post them online; there are, for instance, tens of thousands of pages of old railroad timetables scanned in and available for free online now. If not, perhaps GOSA can sell DVDs with a few thousand scanned logbook pages on each one. (I gather at least some of the scans already exist and are what Will and his helpers are passing back and forth and working from.)

GRB


















 		 	   		  


More information about the Geysers mailing list