AI-Enhanced House Earmark Request Data

Stacks of dollars in front of US Capitol
Stacks of dollars in front of US Capitol

Cross-posted from Congressional Data Coalition

At the end of last week, the House Appropriations Committee published all earmark requests for FY 2024 on the committee’s website, including publishing them as a spreadsheet. This is great and welcome news. For the first time, the appropriations spreadsheet separated member names into different columns and included state, district, party, and recipient address. This makes the information significantly more usable. Thank you.

In fact, it’s so usable, we spent a little time over the weekend making it even more robust. We enhanced their spreadsheet by adding bioguide IDs for each member, appropriations subcommittee codes, a standardized recipient address (with help from ChatGPT), and extracted the recipient state and zip code. We have been playing around with using the AI to categorize whether the recipient entity is a non-profit or a governmental entity. We can imagine a lot of use cases for this cleaned-up data.

The spreadsheet is available online here. We are continuing to tinker with it.

Unfortunately, the Appropriators’ spreadsheet does not include the request summaries published on the committees website nor a direct link to the request letter. We would also love to see the EINs for the non-profit requesting entities, because then we could tie that request to their 990 tax form and maybe to their lobbying disclosure records as well.

Regardless, all in all, this is a significant step forward in improving the transparency of the requests and we hope it will continue to improve.

The earmarks dataset was also a great opportunity for us to play with marrying the new ChatGPT technology with Google Sheets. I think this technology has the possibility of fundamentally transforming how appropriators gather requests from the public — which is the subject of a current Senate request for comments — and how the committee gathers requests from members. The ability to clean up requests (i.e. moving information from unstructured to structured formats), categorize them, summarize them, and do due diligence on the requesters should be a game changer.