Por Holger Drees (University of Hamburg).
Abstract: Many estimators of the extreme value index and other tail parameters use a certain fraction of largest observations. The data-driven choice of this fraction is a notoriously difficult problem. The influential paper Clauset, Shalizi and Newman (2009) suggests fitting a generalized Pareto distribution (GPD) to the top k order statistics for all possible k and choose the value that minimizes the Kolmogorov-Smirnov distance between the fitted GPD and the empirical cdf of the exceedances. By the example of the Hill estimator, we will argue why this minimum distance selection procedure usually leads to an inefficient tail estimator. In particular, often a serious underestimation of the optimal sample fraction leads to a largely increased asymptotic variance, which can also be observed in simulations.