Re: Where Should The Data Reside?
Written in reply to “Where Should The Data Reside?” on Centernetworks:
There was some great discussion about this at the recent “Glue” conference but no clearcut answers.
At a technical level there’s a couple of problems: it’s trivial to syndicate the data, but non-trivial to synchronize actions on the data. If the feed is an Atom feed there’s a notion of stubs to reflect that a given bit of content has been unpublished, but this concept doesn’t exist in RSS which is what most sites use for data exchange.
There’s also no notion of what I’m going to call “contractual use of data”. There’s no way to obligate a subscribing party to either update a given element of data (maybe I published something in error and I want to push out the correction) or remove it (for whatever reason).
An author/publisher I know had a hell of a time getting bad data out of “the system” for a book he wrote. Initially (years ago) he’d talked to O’Reilly about getting it published. For whatever reason that didn’t go through. For reasons even O’Reilly admits were in error, the book appeared in a database update of upcoming titles. For the next several years the title showed up as an O’Reilly title complete with erroneous ISBN even though the author and O’Reilly quickly cleaned up the original bad data source. It flowed out to Amazon, then other sites and even to this day resurfaces years later.
The problem with establishing some sort of contractual obligation on data flow is …isn’t that DRM? And it is in a way I guess, but not in the sense of preventing copies or use but in requiring some sense of fidelity to the original data.
Atom tried to achieve a first cut at this both with the stub idea for deletions as well as the requirement of a unique identifier for each chunk of content — the idea being that even if you republish my blog post from my personal site over here on CN, the original id is maintained, but in practice no one does this and the tools don’t really support or enforce it.


