logo

2 Run lasso and ridge (when available) using different R packages

We have made several scripts (RR_9.0) that use the different R packages that have implemented penalised regression. Note that with the size of the current data, only the bigstatsr package managed to perform the analyses.

All R scripts require the following arguments:
1. /working/directory/ : setwd() in R
2. VertexFile.txt : created above
3. outputFolder
4. variable name
5. lasso or ridge
6. n cpus to use

They require two lists of IDs in order to separate discovery and replication (train / test samples)
Here: ID_discovery_UKBanalysis.txt and ID_replication_UKBanalysis.txt

4 Using glmnet and biglasso packages

For us, it only worked on a subset of the data but could not scale to the full sample. For biglasso, simply replace RR_9.0_PenalisedRegressionFunction_glmnet.R by the biglasso equivalent.

On full sample, we got error: Error in elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, : long vectors (argument 5) are not supported in .Fortran Calls: cv.glmnet -> cv.glmnet.raw -> glmnet -> elnet Execution halted

Similarily using biglasso, we got the error: Error in SetMatrixElements(, as.double(j), as.double(i), as.double(value)) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522 Calls: as.big.matrix … [<- -> .local -> SetElements.bm -> SetMatrixElements Execution halted

7 Evaluate prediction performance in replication set

8 Compare absolute error made by BLUP and LASSO

9 Final plots - BLUP vs. LASSO

blup=read.csv("BLUP_prediction_UKBreplication_Bslne.csv")
lasso=read.csv("Lasso_prediction_UKB_replicationsample5K.csv")

all=merge(blup, lasso, by.x="HCPvarLoop", by.y="newVarLoop" )
#all=all[which(all$pval.x <0.1 & all$pval.x>0.01),]

# Calculate SE and CI of R2
# Using exponential transformation - see Algina et al., 2010
all$z=log( (1+sqrt(all$R2_final.x)) / (1-sqrt(all$R2_final.x)) )
all$z1=all$z-2*1.96/sqrt(all$N)
all$z2=all$z+2*1.96/sqrt(all$N)

all$lb.x=((exp(all$z1)-1)/(exp(all$z1)+1))**2
all$lb.x[which(all$z1<0)]=0
all$ub.x=((exp(all$z2)-1)/(exp(all$z2)+1))**2

# LASSO
all$z=log( (1+sqrt(all$R2_final.y)) / (1-sqrt(all$R2_final.y)) )
all$z1=all$z-2*1.96/sqrt(all$N)
all$z2=all$z+2*1.96/sqrt(all$N)

all$lb.y=((exp(all$z1)-1)/(exp(all$z1)+1))**2
all$lb.y[which(all$z1<0)]=0
all$ub.y=((exp(all$z2)-1)/(exp(all$z2)+1))**2

# Add labels for plotting
phe=read.csv("UKB_allvars_categories_fancylabels_Jan19Update.csv")
all=merge(all, phe, by.x="HCPvarLoop", by.y="V4")

png("Prediction_BLUP_LASSO_replicationUKB.png", width = 15, height = 15, units = "cm", res = 400)
par(mar=c(4,4,1,1))
plot(all$R2_final.x, all$R2_final.y, pch=20, cex=1.5,ylab="Prediction R2 from LASSO", xlab="Prediction R2 from BLUP", xlim = c(0,0.6), ylim = c(0,0.6) )
abline(a=0, b=1, col="red")

# Variable labels
text(all$ub.x[which(all$R2_final.x>0.05)], all$R2_final.y[which(all$R2_final.x>0.05)], labels =all$V3[which(all$R2_final.x>0.05)], pos = 4, cex=0.5 )

# Confidence intervals
arrows(y0 = all$lb.y , x0 = all$R2_final.x, y1 = all$ub.y, x1 =all$R2_final.x, code = 3, angle = 90,length = 0.05  )
arrows(x0 = all$lb.x , y0 = all$R2_final.y, x1 = all$ub.x, y1 =all$R2_final.y, code = 3, angle = 90,length = 0.05  )

dev.off()



 




A work by by [Baptiste Couvy-Duchesne] - 28 April 2020