
- Image by dekstop via Flickr
Late last month, The Economist set off a little thought bomb modestly titled “New rules for big data”. The article laid out all the various entrenched assumptions standing in the way of thoughtful, relevant information policy in the age of ever larger and more critical data sets. Put more simply, in a world where a Library of Congress is a fungible measure of data, we have to rethink how we protect and traffic information.
Backupify doesn’t yet handle multiple Libraries of Congress’ worth of data, but we want to get there. As such, we want to be forward-looking. Thus, the provisional crowdsourcing of our privacy policy, which begins to confront some of these areas of concern.
One of the more interesting points in the article was an argument against data retention:
“Current rules on digital records state that data should never be stored for longer than necessary because they might be misused or inadvertently released. But Viktor Mayer-Schönberger of the National University of Singapore worries that the increasing power and decreasing price of computers will make it too easy to hold on to everything. In his recent book ‘Delete’ he argues in favour of technical systems that ‘forget’: digital files that have expiry dates or slowly degrade over time.
“Yet regulation is pushing in the opposite direction. There is a social and political expectation that records will be kept, says Peter Allen of CSC, a technology provider: ‘The more we know, the more we are expected to know—for ever.’ American security officials have pressed companies to keep records because they may hold clues after a terrorist incident. In future it is more likely that companies will be required to retain all digital files, and ensure their accuracy, than to delete them.”
Sci-fi author and quasi-futurist Charles Stross has written repeatedly on the danger of reliance on databases for predictive analysis, if only because databases are routinely riddled with data errors. Bruce Schneier has written on the difficulty of purging your data from cloud-based systems, largely because the cloud companies don’t want to give up on all that delicious data-mining fodder.
This would seem to make the case that retaining data is more dangerous than deleting it. But is this an argument against databases, or an argument for better databases — and better data policies? If government is going to require companies to maintain data indefinitely, shouldn’t those companies be required to maintain the accuracy and integrity of those databases. Shouldn’t the government require the same of themselves? If we’re going to maintain a no-fly list, or e-mail blast address books, shouldn’t the agencies and organizations using them being under legal remit and obligation to make those databases at least 95 percent (or, to my mind, 99.999 percent) accurate?
More to the point, if an audit of a database shows the data to be less than 95 percent accurate — your mailing list produces a delivery error on more than one out of every twenty sends — you’re obligated to either upgrade or purge, period.
That said, is such an edict enforceable? It would seem that storage capacity is expanding (or, rather, dropping in price) faster than computing power, so we’re going to be able to store more data than we’re able to parse and maintain effectively. This argues for a classic, analog data retention policy — any record that hasn’t been updated after a certain period (the IRS says seven years) should be purged.
Still more complicated; not all records are created equal. Financial records are the sort of data that might need be kept indefinitely, especially for organizations of certain size, or any outfit that’s publicly traded. The same goes for any government transactional records. There is a public accountability stake in those records.
Conversely, address records (of the physical, IP, URL or e-mail variety) that haven’t been updated in perhaps three years may not be worth keeping. Personal finances likely qualify here as well.
At Backupify, we believe you own your data, not us. We’re the bank, but the money is yours. Withdraw it whenever you like, and it’s gone from our ledgers forever. But government may have something to say about those data polices — and everyone else’s — very soon.
Suffice it to say, I would not want to be the one crafting data compliance legislation right now. I welcome your insights into this issue in the comments section.
Related articles by Zemanta
- Bill Gates Knows What You Did Last Summer (yro.slashdot.org)
- American intelligence is great at building haystacks, lousy at finding needles. (slate.com)
- Data mining at a higher level (zyxo.wordpress.com)
- MIT Experiment: Predicting Sexuality Based On Facebook Friends (huffingtonpost.com)

{ Comments }



