Logo image
Distinguishing Tax Avoider Types: An Unsupervised Machine Learning Approach
Working paper   Open access

Distinguishing Tax Avoider Types: An Unsupervised Machine Learning Approach

Sonja O. Rego, Brian Williams, Ryan J. Wilson and Junwei Xia
SSRN
12/2025
DOI: 10.2139/ssrn.5942755
url
https://doi.org/10.2139/ssrn.5942755View
Open Access

Abstract

We utilize unsupervised machine learning to identify distinct types of tax avoiders based on 15 observable firm characteristics associated with tax avoidance mechanisms. The most common type—the “PPE/DEBT” cluster—exhibits greater amounts of PPE, capital expenditures, total debt, and mezzanine financing (47 percent), followed by the “R&D/NOL” cluster, which exhibits greater R&D expenditures and NOLs (35 percent). In contrast, only 18 percent of tax avoiders are assigned to the “income shifting” cluster, characterized by greater usage of intangible assets, foreign sales, tax havens, and stock options. We observe time-series variation in the composition of tax avoider types, and out-of-sample tests reveal that the income shifting cluster exhibits higher levels of both future IRS attention and tax settlements. Our findings on the combination of mechanisms that characterize distinct types of tax avoiders should be of direct use for policymakers and tax authorities in designing tax incentives and allocating resources, respectively.

Details

Metrics

7 Record Views
Logo image