Giving the US Govt a Lesson in Data Management

Right now Wired has an absolutely fascinating article called Road Map for Financial Recovery: Radical Transparency Now!. Pulling examples from the 1930’s the most recent financial crisis, from back room spreadsheet analysis to open source data crunching, the article paints a picture of the latest crisis as a crisis of information overload. The signs to have predicted the latest crisis were there Wired says, except it was buried in a crushing mound of unformatted, unsorted, unprocessed data.

Their solution? Standardize and format the data, then open it to everyone. Turn loose an army of citizen regulators on the mound of cleanly formatted, machine readable data. Let them build mashups, import it into spreadsheets, develop their own financial applications to crunch it.

They focus on a specific technology called XBRL a mark up language designed for the financial world by Charlie Hoffman, a 50 year old account. For those who aren’t familar with geek speak, a markup language is special way to format information so that both people and machines can read it. For example a list of contacts in a markup language would look something like this.

<contact>
<name>Andrew Swerlick</name>
<phone>111-111-1111</phone>
<address>2222 American Way,
Atlanta, GA
30322 </address>
<email>andrew dot swerlick at gmail.com </email>
<website>https://beyondoverload.wordpress.com</website&gt;
</contact>
<contact>
<name>John Doe</name>
<phone>121-121-1212</phone>
<address>1111 Main Street
Baltimore, MD
10100 </addresss>
</contact>

The idea is that each piece of data is labeled with exactly what it is. The labeling is done by the <> tags in this case. <stuff> represents the start of a piece of data and </stuff> the end.  The data is hierarchical too, like how the name, phone, address etc are under the contact section.

Once you’re data is formatted like this, then it’s easy for a machine pick out relevant data and do whatever you want with it. With financial data the possibilities are nearly endless. Put the data on the web in a simple standard markup format and people will find all sorts of applications for it.

For me the article hits especially close to home because of what I’m encountering at work. One of the things I’m finding is that many of the clients I work with, and even some inside the company, don’t seem to get the importance of accurate, consistent data capture. If you capture data in a consistent and standardized format you can do absolutely amazing things with it. Computer’s are data wizards, and especially in the modern world there is almost nothing you can’t do if you’ve captured the information correctly. You don’t even have to be a developer, just marginally skilled with Excel.

Mint.com
offers the perfect example of the power of consistent formatting. Ultimately, there’s almost no financial reporting that mint does that you can’t do in excel… if you’ve got the data. Mint’s really power is in data aggregation. In fact, I’ve built my own spreadsheet financial data analysis tool, and all I do is export everything from mint into it for processing, letting mint give me a wealth of consistently formatted data.

As I mentioned earlier, alot of people I encounter through work just don’t seem to get this. I don’t know if it’s a generational thing, or just a part of my personality, but everyday I’m seeing more and more value in clean consistent data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s