We review the estimation of coverage and error rate in high-throughput protein-protein interaction datasets and argue that reports of the low quality of such data are to a substantial extent based on misinterpretations. Probabilistic statistical models and methods can be used to estimate properties of interest and to make the best use of the available data.