Enhancing repository fungal data for biogeographic analyses.
Open-access occurrence data are useful for studying spatial patterns of fungi, but often have quality issues. These include errors in taxonomy and geo-coordinates, and incomplete coverage across areas and taxonomic groups. We identify 15 quality issues that can lead to incorrect biogeographic inference, and develop a reproducible pipeline that flags and removes problematic entries. This pipeline tests accuracy of geographic records and names. Then, if information on non-native status is unavailable or unreliable, it detects non-native species via a predictive model. Finally, it identifies spatial and environmental outliers and removes them when biologically improbable. We test the pipeline by cleaning data for Australian fungi, with 251,642 records retained after cleaning the initial 1,034,601 records. Exploratory analysis showed that the cleaned data is useful for analyses such as biogeographic regionalisation, but recording gaps and lack of saturation in collection effort also caution that more surveys are needed to improve collection completeness.