FBF: The Untapped Goldmine of Legislative Data: Rocket Fuel for AI

This past Tuesday, I sat in on an excellent hearing on the use of Artificial Intelligence in the Legislative Branch, hosted by the House Administration Committee. I’m not going to recap it here — Aubrey already did that — but I did want to share a good idea that I ripped off from the Obama administration.

It’s a really simple idea: in order to build on top of data, you have to know what you have. To wit: way back in the midst of time, 2013 to be precise, President Obama required agencies to conduct and create “enterprise data inventories” — a comprehensive list of all the data an agency holds.

Just like the porridge for three little bears, the datasets identified in the inventory would be classified in one of three ways. It could be categorized as public, i.e., able to be made publicly available without restriction; as restricted public, i.e., able to be made available but under certain use restrictions; and as non-public, i.e., not able to be made publicly available (but potentially could be shared with other agencies). Each agency also was supposed to publish a list of their currently publicly available datasets at a vanity url.

I don’t remember all the details, but for some reason or another the Obama administration balked at publishing their lists of datasets. Remember, this wasn’t the data itself, but the metadata. So my colleagues at the now-defunct Sunlight Foundation filed a FOIA request and began preparations for litigation, and apparently prompted the dataset to be released. In 2019, the OPEN Government Data Act was enacted into law, §7 of which required the US government to continue making these data inventories.

Are the agencies in compliance? I’ll leave that to others to say. (Here’s the government’s dashboard.)

But — but — if we are talking about using AI to support the Legislative branch, anyone building AI tools will also need to know where the data is, and have access to it. This is a problem faced by those of us on the outside, but also, and perhaps more acutely, by those on the inside.

What information is held by each office or agency? In what format? Who maintains it? What is its range? How often is it updated? Is it capable of being shared with others, whether in its current circumstance or with some changes?

Please forgive me for reinventing the wheel, but the offices and agencies of the Legislative branch should be directed to conduct their own Enterprise Data Inventory and to keep it up to date. They should be directed to share the inventory with others inside the Legislative branch and with those on the outside. And information held by the offices and agencies should be made available for use — by others within the office, by others within the Legislative branch, and by the public — to the maximum extent possible. It would even be possible to use data.gov to do this, or perhaps a congressional version like GPO’s bulk data repository.

Data about the Legislative branch is a strategic asset for democracy. It can be surprising, but information gathered for one purpose can often help solve a very different problem. A fair part of my career has been trying to figure out what information Congress holds and how to use it to solve other problems.

I’ve even gone so far as to propose a Legislative Branch Chief Data Officer (see p. 9) whose job is a lot like I imagine the House Historian’s, but for data. As the risk of quoting myself:

The CDO should have the responsibility for tracking datasets released by the legislative branch; providing advice, guidance, and encouragement to offices regarding the publication of legislative branch information as data; supporting the work of the [Congressional] Data Task Force, including assisting Deputy Clerk Reeves; coordinating the annual Legislative Data and Transparency Conference; and providing assistance to the public with finding and obtaining legislative data.

So there you have it. Let’s keep track of the information held by the Legislative branch and make it available for transformative uses to solve problems, inside the Legislative branch and beyond. And let’s have a person (or two or three?) whose job it is to help with this. Maybe it’s the Legislative Branch Chief Data Officer, maybe it’s the good folks at the House Digital Service (with an expanded remit), or why not both?

Published by Daniel Schuman