Machine learning in rehabilitation care: from patient data to treatment prognosis

ML Platform architecture on Google Cloud

A rehabilitation clinic treats patients with chronic pain through an interdisciplinary program. The treatment is intensive: weeks long, multiple disciplines simultaneously, a significant investment from both patient and clinic. The question that had been on the table for years: can we estimate upfront which patients will benefit most from this treatment? And for whom might a different trajectory be more effective?

This is not an academic question. It is an operational question with direct consequences for treatment planning, capacity, and ultimately patient outcomes.

The assignment

The clinic had developed a machine learning model for prognostic patient profiles, together with a data science specialist. The model was built in scikit-learn and trained on historical patient data: treatment outcomes linked to 51 characteristics recorded at intake.

Those 51 parameters come from validated questionnaires: instruments measuring pain, anxiety, depression, functional limitations, and psychological factors. Each questionnaire is scientifically validated and administered as standard practice during intake. The model uses the scores to generate a prognosis: how likely is it that this patient will show clinically relevant improvement after treatment?

The problem was not the model. The model worked. The problem was that it ran as a Python script on a laptop. The data science specialist had to manually export data, run the script, and report results back. That is not scalable, not reproducible, and not acceptable if you want to deploy it as a standard component of the intake process.

Our assignment: turn this into a production platform.

The architecture

We built the platform on Google Cloud, connecting to the data infrastructure we had previously set up for this clinic. The architecture follows a clear pattern:

Patient data to BigQuery. The clinic records patient information in an EHR system. Through our EHR Connector (a Cloud Run service that runs daily) relevant data is extracted and written to BigQuery. This is the raw layer: unprocessed questionnaire responses, demographics, and treatment history.

dbt as transformation layer. The raw EHR data is not directly usable for the model. Scores need to be calculated from individual answers. Subscales need aggregation. Missing items need to be handled according to the questionnaires’ scoring protocols. dbt transforms the raw data into exactly the 51 parameters the model expects: documented, tested, and version-controlled.

Cloud Run Jobs for model inference. The scikit-learn model runs as a container on Cloud Run Jobs. The job fetches the transformed parameters from BigQuery, loads the trained model, generates prognoses, and writes results back to BigQuery. No VM running 24/7; compute only when needed.

GCP Workflows as orchestrator. A Workflow chains the steps together: trigger dbt transformation, wait for completion, start the Cloud Run Job, and alert on failures. This replaces the manual steps the data scientist previously had to perform.

Results to Power BI. The clinic uses Microsoft 365. The prognoses from BigQuery are available in Power BI dashboards that treatment teams can access. No exports, no loose files: current results, directly available.

The NaN challenge

The most interesting technical challenge was not in the infrastructure, but in the data. Scikit-learn (the library the model is built on) has a treacherous property: it does not fail on missing values. It simply produces incorrect output.

That sounds like a small problem. It is a large problem.

In practice, values are regularly missing. A patient does not complete all questions. A questionnaire is updated in a later version, causing scores to be calculated differently. An intake is conducted across two sessions and the second session has not been recorded yet. The result: NaN values in the feature matrix.

On the data scientist’s laptop, this was manageable: they knew the data, saw the gaps, and handled them. In an automated pipeline, nobody is watching. If the model silently generates a prognosis based on incomplete data, you get results that look valid but are not. That is worse than an error, because nobody notices.

Our solution had three components:

Validation before inference. Before data enters the model, the pipeline validates each row for completeness. Missing values are explicitly detected and logged. Rows with missing required parameters are excluded from the model run and reported separately.
dbt tests on the 51 parameters. In the transformation layer, we test not only whether the calculations are correct, but also whether the output is complete. A not_null test on every required column is the minimum. On top of that, range checks: if a score falls outside the theoretically possible range, something is wrong in the calculation.
Local versus cloud comparison. During initial validation, we ran the model on both environments (the specialist’s laptop and the cloud platform) on identical datasets. The outcomes had to match exactly. Any discrepancy was a signal that something was going wrong in the data transformation or model configuration.

That last step sounds simple, but it was essential. It is the only way to guarantee that the migration from laptop to cloud does not introduce shifts. In healthcare, “approximately correct” is not good enough.

Containerisation for reproducibility

The model had to produce exactly the same results regardless of where it ran. That means: the same Python version, the same library versions, the same order of operations.

We packaged the model and inference code as a Docker container. The container includes a pinned requirements.txt with exact version numbers. The trained model is included as an artifact, not retrained on each run. The container runs identically on a local machine and on Cloud Run Jobs.

This also makes the platform transferable. When another rehabilitation centre running the same treatment program wants to deploy the model, they only need to deliver their data in the correct format. The container, the model, and the pipeline are reusable.

What this demonstrates

This project demonstrates a pattern we encounter regularly in healthcare: the domain knowledge is there, the model is there, but the platform is missing. A data scientist builds something that works on a laptop. Then it needs to become scalable, reliable, and reproducible; and that is where the real work begins.

The value we add is not the model. The model was already built by someone with deep domain knowledge and statistical expertise. The value is in the platform around it:

Reliability: the pipeline runs daily, without manual intervention, with monitoring at every step.
Reproducibility: containerisation guarantees identical results regardless of environment.
Data quality: validation catches the silent errors that scikit-learn does not.
Scalability: the same architecture can be rolled out to other clinics and other treatment programs.

The clinic now has an operational system that can generate a treatment prognosis at every intake. The clinician gets an additional data point, not to replace clinical judgement, but to support it.

From laptop to production. That is the step where many ML projects stall. Building the model is the spectacular part. The platform is the part that determines whether it actually gets used.

The assignment

The architecture

The NaN challenge

Containerisation for reproducibility

What this demonstrates

Get in touch for an initial consultation.