Leveraging AI for legislative text classification
Kasha Akrami, Marlies Michielssen, Andrew Tang, Brooke Tran
USAFacts is a non-profit organization and website that provides data and reports on the US population, its government's finances, and government's impact on society. USAFacts aims to make facts easily accessible so that public discourse and understanding can be grounded in facts.
USAFacts classifies legislative actions by category, to illustrate where the priorities of congress lie. Currently, this classification is a manual process, which is time- and labor-intensive. The goal of this senior project is to produce an automated tool capable of processing and classifying legislative text documents into their appropriate topical category (e.g., Immigration, Social Welfare, Environmental Protection). Relative to the current process of manual text classification, we hope our tool can save a significant number of working hours and increase the scope and scale of government documents included in their analyses. From a broader perspective, an automated tool for this process will enable USAFacts to generate a more complete and accessible analysis of legislative actions and provide better insights into societal problems for the general public.
Solutions, methods, and models used
We developed several models capable of processing text and categorizing legislative actions, including multi-class logistic regression with tf-idf scores, Hierarchical Attention Network, and fastText models. Ultimately we recommend the fastText model since it achieved the highest accuracy rate and the shortest training time.