Paper Title
Predicting Days in Hospital using Health Insurance Claims and Clinical Codes

Abstract
Health care administrators around the world work to reduce costs while raising the standard of treatment. The biggest portion of health spending is hospitalization. As a result, earlier detection of those who are more likely to require hospitalization will aid healthcare managers and insurers in creating more effective plans and strategies. In this study, a strategy for predicting the number of hospitalization days in a community was developed utilizing extensive data from health insurance claims. Key clinical information for health insurance claims is mostly derived from the accompanying clinical codes, such as diagnosis codes and procedure codes, which are organized hierarchically. We used a regression decision tree method to estimate the number of hospital days in the third year based on hospital admissions and procedure claims data and clinical codes collected over three years from 2,42 ,075 individuals. A full feature set, low-level, medium-level, high-level clinical codes, and feature sets using only basic demographic characteristics from bagged trees were all compared. The fundamental conclusion of this study is that the predictive capacity is unaffected significantly by the various clinical code hierarchies. Other findings include: 1) Sample size significantly influences the predicted outcome (more observations lead to more consistent and accurate out comes); 2) Combining enhanced clinical and demographic information performs better than doing it individually. Keywords – Clinical Code, Big data, Healthcare, Health insurance claims, Hospitalizations, Predictive modeling.