Can Google Trends data improve forecasting of Lyme disease incidence?
Background: Online activity-based epidemiological surveillance and forecasting is getting more and more attention. To date, Google search volumes have not been assessed for forecasting of tick-borne diseases. Thus, we performed an analysis of forecasting of the Lyme disease incidence based on the traditional data extended with Google Trends. Methods: Data on the weekly incidence of Lyme disease in Germany from 16 June 2013 to 27 May 2018 were obtained from the database of the Robert Koch Institute. Data of Internet searches were obtained from Google Trends searching "Borreliose" in Germany for the "last 5 years" as a timespan category. Data were split into the training (from 16 June 2013 to 11 June 2017) and validation (from 12 June 2017, to 27 May 2018) data sets. A seasonal autoregressive moving average model, SARIMA (0,1,1)  model was selected to describe the time series of the weekly Lyme incidence. After this, we added the Google Trends data as an external regressor and identified the SARIMA (0,1,1)  model as optimal. We made predictions for the validation interval using these two models and compared predictions with the values of the validation data set. Results: Forecasting for the validation timespan resulted in similar values for the models. Comparing the forecasted values with the reported ones resulted in an residual mean squared error (RMSE) of 0.3763; the mean absolute percentage error (MAPE) was 8.233 for the model without Google searches with an RMSE of 0.3732; and the MAPE was 8.17495 for the Google Trends values-expanded model. The difference between the predictive performances was insignificant (Diebold-Mariano Test, p-value=0.4152). Conclusion: Google Trends data are a good correlate of the reported incidence of Lyme disease in Germany, but it failed to significantly improve the forecasting accuracy in models based on traditional data.