Reducing the US Treasury's Taxpayer Data Base by Optimization
Abstract
This paper describes the implementation of a novel optimization approach for reducing a large data base for the Office of Tax Analysis, US Treasury Department. The model minimizes the loss of information which results from using a subset of the data base, rather than the entire file. The specific application involves the 1977 US Statistics of Income File for Individuals. This file was reduced in size from 155,212 weighted records to 74,762 weighted records by employing a subgradient optimization method that was specialized for extremely large-scale problems. Differences between the original and reduced data bases are presented.

