Interests

Primary research interest is in the use of large data sets, such as electronic health records in conjunction with artificial intelligence/machine learning techniques for the early diagnosis, identification, and prediction of mental health disorders. Identifying disorders early gives a better chance of identifying the best treatment options and recovery pathway. The patient may benefit from earlier treatment which can reduce the time taken to achieve remission or cure, a reduced probability of relapse and consequently a better social and economic outcome. The care provider, such as the NHS, may benefit from a reduction in overall resource usage and relieve pressure both economically and in terms of staff well being. In line with this interest in early diagnosis especially in adolescents and young adults, I have been a member of the Warwick Medical School’s Applied Research Collaboration led by Professor Swaran Singh. I follow developments in related areas including other sources of data that might lead to earlier identification and interventions in mental health, and other clinical areas, to drive better outcomes.

My PhD, “Prediction of depression from Electronic Health Records using machine learning approaches,” Looked at the viability of using EHRs with machine learning techniques for the identification of depression. There were three core components, a systematic review to establish the state of the art and the possibility of clinical implementation, a replication of an existing model and an assessment of public confidence in the use of ML/EHRs models in healthcare. The second of these included the replication and further development of a backwards logistics regression model aimed at the early diagnosis of depression for adolescents and  young adults in primary care.  This required developing skills, in “r” for data cleaning, wrangling, visualisation, extracting exposure variables/predictors from EHRs, replicating the original model, validating it on an “out-of-sample” data set and developing new models using more sophisticated techniques to review comparative performance. For the new models, to investigate explainability Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) addons were coded in Python. Additionally, I specified and negotiated the procurement of the new data set of 500,000 anonymised cases/controls from The Health Improvement Network.