The Probability of Chance Correlation Using Partial Least Squares (PLS)

Matthew Clark, Richard D. Cramer

Research output: Contribution to journalArticle

284 Citations (Scopus)

Abstract

The frequency of chance correlation using partial least squares (PLS) has been measured experimentally for variously dimensioned data, comprising either completely random numbers, random numbers containing a perfect correlation within, and CoMFA field descriptors. This frequency, much lower than that for stepwise multiple regression, is maximal for datasets in which the number of descriptors equals the number of compounds, and surprisingly decreases indefinitely as the number of descriptors becomes much greater than the number of compounds. However, perfect correlations involving descriptor subsets are not detected by PLS if the number of irrelevant descriptors is excessive. In CoMFA applications, the probability of chance correlation is usually negligible. For example with 21 compounds a crossvalidated r2 value greater than 0.25 will occur by chance in less than 5% of trials.

Original languageEnglish
Pages (from-to)137-145
Number of pages9
JournalQuantitative Structure‐Activity Relationships
Volume12
Issue number2
DOIs
StatePublished - Jan 1 1993

Fingerprint

Least-Squares Analysis
Datasets

Keywords

  • chance correlation
  • CoMFA
  • cross validation
  • Partial least squares
  • stepwise regression

Cite this

@article{a5e8c7920e96420fa26511f379e1b46a,
title = "The Probability of Chance Correlation Using Partial Least Squares (PLS)",
abstract = "The frequency of chance correlation using partial least squares (PLS) has been measured experimentally for variously dimensioned data, comprising either completely random numbers, random numbers containing a perfect correlation within, and CoMFA field descriptors. This frequency, much lower than that for stepwise multiple regression, is maximal for datasets in which the number of descriptors equals the number of compounds, and surprisingly decreases indefinitely as the number of descriptors becomes much greater than the number of compounds. However, perfect correlations involving descriptor subsets are not detected by PLS if the number of irrelevant descriptors is excessive. In CoMFA applications, the probability of chance correlation is usually negligible. For example with 21 compounds a crossvalidated r2 value greater than 0.25 will occur by chance in less than 5{\%} of trials.",
keywords = "chance correlation, CoMFA, cross validation, Partial least squares, stepwise regression",
author = "Matthew Clark and Cramer, {Richard D.}",
year = "1993",
month = "1",
day = "1",
doi = "10.1002/qsar.19930120205",
language = "English",
volume = "12",
pages = "137--145",
journal = "Molecular Informatics",
issn = "1868-1743",
publisher = "Wiley - VCH Verlag GmbH & CO. KGaA",
number = "2",

}

The Probability of Chance Correlation Using Partial Least Squares (PLS). / Clark, Matthew; Cramer, Richard D.

In: Quantitative Structure‐Activity Relationships, Vol. 12, No. 2, 01.01.1993, p. 137-145.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The Probability of Chance Correlation Using Partial Least Squares (PLS)

AU - Clark, Matthew

AU - Cramer, Richard D.

PY - 1993/1/1

Y1 - 1993/1/1

N2 - The frequency of chance correlation using partial least squares (PLS) has been measured experimentally for variously dimensioned data, comprising either completely random numbers, random numbers containing a perfect correlation within, and CoMFA field descriptors. This frequency, much lower than that for stepwise multiple regression, is maximal for datasets in which the number of descriptors equals the number of compounds, and surprisingly decreases indefinitely as the number of descriptors becomes much greater than the number of compounds. However, perfect correlations involving descriptor subsets are not detected by PLS if the number of irrelevant descriptors is excessive. In CoMFA applications, the probability of chance correlation is usually negligible. For example with 21 compounds a crossvalidated r2 value greater than 0.25 will occur by chance in less than 5% of trials.

AB - The frequency of chance correlation using partial least squares (PLS) has been measured experimentally for variously dimensioned data, comprising either completely random numbers, random numbers containing a perfect correlation within, and CoMFA field descriptors. This frequency, much lower than that for stepwise multiple regression, is maximal for datasets in which the number of descriptors equals the number of compounds, and surprisingly decreases indefinitely as the number of descriptors becomes much greater than the number of compounds. However, perfect correlations involving descriptor subsets are not detected by PLS if the number of irrelevant descriptors is excessive. In CoMFA applications, the probability of chance correlation is usually negligible. For example with 21 compounds a crossvalidated r2 value greater than 0.25 will occur by chance in less than 5% of trials.

KW - chance correlation

KW - CoMFA

KW - cross validation

KW - Partial least squares

KW - stepwise regression

UR - http://www.scopus.com/inward/record.url?scp=0027209171&partnerID=8YFLogxK

U2 - 10.1002/qsar.19930120205

DO - 10.1002/qsar.19930120205

M3 - Article

AN - SCOPUS:0027209171

VL - 12

SP - 137

EP - 145

JO - Molecular Informatics

JF - Molecular Informatics

SN - 1868-1743

IS - 2

ER -