Monthly benchmark dataset
COST Action on homogenisation
The COST Action ES0601: Advances in homogenisation methods of climate series: an integrated approach (HOME) aims at improving homogenisation algorithms. One of the activities of this Action is to produce a benchmark dataset on which homogenisation algorithms can be tested. The dataset contains both monthly temperature as well as monthly precipitation data. The benchmark has three difference types of datasets: 1) raw climate records, 2) idealised synthetic data series, and 3) surrogate data. Raw climate records are the most realistic case. However in this case the truth is not known. Therefore the benchmark also contains synthetic and surrogate data with known inhomogeneities. The synthetic data consists of Gaussian white noise with the mean and variance of homogenised climate records. The surrogate data has the distributions (e.g. moments) and spectrum (auto- and cross-correlations) of homogenised climate observations and it thought to be the most realistic type of artificial data. The real data section is also important for the validation of the inserted inhomogeneities by comparing the statistical properties detected inhomogeneities in the real and in the artificial data sections.
Homogenisation of the benchmark database
The participants have returned 25 blind contributions, as well as 22 further contributions, which were submitted after revealing the truth. The best algorithms are the ones that were designed to function with an inhomogeneous reference time series, which are: Craddock, PRODIGE, MASH, ACMANT and USHCN. Relative homogenization algorithms typically improve the homogeneity of temperature data, but only the best ones improve precipitation data. The main results of the this study are described in this manuscript. Comments are welcome.
More information and download
Everyone is invited to download the homogenized benchmark datast, which can be found on our FTP-server, and analyse the benchmark in more detail. The files are in the ASCII COST-HOME file format. The main features of the benchmark can be found in the manuscript mentioned above. More detail is found in a report.
See also my related blog posts:
Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma. Benchmarking monthly homogenization algorithms. Climate of the Past, 8, pp. 89-115, doi: 10.5194/cp-8-89-2012, 2012.
Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma. Description of the COST-HOME monthly benchmark dataset and the submitted homogenized contributions. Report, 2011.The File Format for COST-HOME
Report, July 2008
Victor Venema, Olivier Mestre
PresentationsThe COST-HOME monthly benchmark dataset with temperature and precipitation data for testing homogenisation algorithms
Management committee COST-HOME, Bologna, Italy, 25-29 May, 2009
Victor Venema, Enric Aguilar, José A. Guijarro and Olivier Mestre
Benchmark database inhomogeneous data, surrogate data and synthetic data
Working group meeting COST-HOME, Tarragona, Spain, 9-11 March, 2009
Benchmark database: inhomogeneous data, surrogate data and synthetic data
Management committee COST-HOME, Budapest, 26-30 May 2008
based on surrogate climate records
COST HOME management commitee, 23 November, 2007