logo

1 Estimate BLUP scores using cross-validation in the discovery sample

We have created 10 lists of IDs (eid eid) named UKB_ID_crossValidation_XX.txt that we use for the cross validation.
Note that each list of IDs contain 9/10 of the sample (not the other way around), though OSCA could also accomodate it.

2 BLUP scores from full discovery sample for out of sample prediction

We now assume you have an indepdenent sample, processed the same way as the discovery sample. BOD files have been created for this independent sample, which may be another cohort (e.g. HCP) or a later release of the UKB (UKB replication) as per our manuscript.

3 Evaluate prediction accuracy

Example of evaluating prediction in the UKB replication sample. In addition to training BLUP weights on phenotypes with covariates regressed out, we also evaluate BLUP scores while controlling for the same covariates.

3.1 Open Phenotypes and variable list

In this example the first column is the predicted variable (used in training), the 4th column is the new variable that we are trying to predict. In some cases the names are the same (prediction from UKB into UKB) but often the exact same variables are not available in the test sample.

4 Read BLUP scores and merge with phenotype file

4.1 Evaluate prediction performance in replication set

library(lmtest)
library(pROC)
res=NULL

# Loop on subset of phenotypes for which BRS was calculated 
for (varLoop in phe$V1){

BRSvarLoop=paste0("BRS_", varLoop) # Name of the BLUP score
newVarLoop=phe$V4[which(phe$V1==varLoop)] # Phenotype to predict in independent sample
    
  # Fixed effect R2
m1<-lm(formula = as.formula(paste0(newVarLoop," ~  Age_MRI +  sexuses_datacoding_9_f31_0_0 + ICV + LThickness + RThickness + LSurfArea + RSurfArea + standing_height_f50_2_0 + weight_f21002_2_0 + body_mass_index_bmi_f21001_2_0 "))  , data = TT2)
 
m2<-lm(formula = as.formula(paste0(newVarLoop," ~ ", BRSvarLoop, "+ Age_MRI +  sexuses_datacoding_9_f31_0_0 + ICV + LThickness + RThickness + LSurfArea + RSurfArea + standing_height_f50_2_0 + weight_f21002_2_0 + body_mass_index_bmi_f21001_2_0 "))  , data = TT2)
 
mICV<-as.data.frame(summary(m2)$coefficients)
# Add values of interest
mICV$R2<-NA
mICV[BRSvarLoop, "R2_cov"]<-summary(m1)$adj.r.squared
mICV[BRSvarLoop, "R2"]<-summary(m2)$adj.r.squared
lmt=lrtest(m1,m2)
mICV[BRSvarLoop, "pval"]<-lmt$`Pr(>Chisq)`[2]
mICV[BRSvarLoop, "chisQ"]<-lmt$Chisq[2]
mICV[BRSvarLoop, "df"]<-lmt$Df[2]
mICV[BRSvarLoop, "R2_final"]<-mICV[BRSvarLoop, "R2"]-mICV[BRSvarLoop, "R2_cov"]

# Get AUC for discrete variables
if (length(table(TT2[,newVarLoop]))<3){
mICV[BRSvarLoop, "auc"]<-auc(TT2[,newVarLoop], TT2[,BRSvarLoop])
mICV[BRSvarLoop, "auc.se"]<-sqrt(var(roc(TT2[,newVarLoop]~ TT2[,BRSvarLoop])))
} else{
mICV[BRSvarLoop, "auc"]<-NA
mICV[BRSvarLoop, "auc.se"]<-NA
}
res<-rbind(res,cbind(BRSvarLoop,  newVarLoop, mICV[BRSvarLoop,]) )
}

# Write results
write.csv(res, "BLUP_prediction_UKBreplication_Bslne_bdsz.csv")



 




A work by by [Baptiste Couvy-Duchesne] - 23 April 2020