Regression

Fundamentals of Machine Learning for NHS using R

Today No Theory!

Length of Stay

Length of Stay (LOS) is defined in number of days from the initial admit date to the date that the patient is discharged from any given hospital facility.

LOS prediction at the time of admission can greatly enhance the quality of care as well as operational workload efficiency and help with accurate planning for discharges resulting in lowering of various other quality measures such as readmissions.

Dataset Facts

  • 100,000 observations
  • 28 variables (11 numerical, 14 categorical, 2 dates, 1 id)
  • No missing values

Categorical Variables

Field Type Description
gender String Gender of the patient - M or F
dialysisrenalendstage String Flag for renal disease during encounter
asthma String Flag for asthma during encounter
irondef String Flag for iron deficiency during encounter
pneum String Flag for pneumonia during encounter
substancedependence String Flag for substance dependence during encounter
psychologicaldisordermajor String Flag for major psychological disorder during encounter
depress String Flag for depression during encounter
psychother String Flag for other psychological disorder during encounter
fibrosisandother String Flag for fibrosis during encounter
malnutrition String Flag for malnutrition during encounter
hemo String Flag for blood disorder during encounter
secondarydiagnosisnonicd9 Integer Flag for whether a non ICD 9 formatted diagnosis was coded as a secondary diagnosis
facid Integer Facility ID at which the encounter occurred

Numerical Variables

Field Type Description
rcount Integer Number of readmissions within last 180 days
hematocrit Float Average haematocrit value during encounter (g/dL)
neutrophils Float Average neutrophils value during encounter (cells/microL)
sodium Float Average sodium value during encounter (mmol/L)
glucose Float Average glucose value during encounter (mmol/L)
bloodureanitro Float Average blood urea nitrogen value during encounter (mg/dL)
creatinine Float Average creatinine value during encounter (mg/dL)
bmi Float Average BMI during encounter (kg/m2)
pulse Float Average pulse during encounter (beats/m)
respiration Float Average respiration during encounter (breaths/m)

Other Variables

Field Type Description
eid Integer Unique Id of the hospital admission
vdate String Visit date
discharged String Date of discharge

Model

We will try to fit a multiple regression model. Recall that this means that our hypothesis function is of the form:

\[h(x) = \beta_0 + \beta_1 x_1 + \dots + \beta_m x_m \qquad \beta_0, \beta_1, \dots, \beta_m \in \mathbb{R}\]

What are the challenges?