library(readxl)
library(tibble)
library(DT)
library(purrr)
library(magrittr)
library(dplyr)
library(PrecisionTrialDrawer)
library(ggplot2)
library(parallel)
library(RColorBrewer)
library(VariantAnnotation)
library(org.Hs.eg.db) # GENOME wide annotation for HUman based on Entrez Gene identifiers.
#R helper function library
source("helper_functions.R")
# Define Plot color schema
MYPALETTE <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
set.seed(5)
The Shiva trial is a proof of concept randomized trial based on targeted therapy using molecular characterization. The trial was aiming at compare convertional therapy with a genomic-driven therapy approach. However the results did not find any difference in progression-free survival (PFS) between the treatment arms. The goal of this retrospective analysis is to verify that the R package PrecisionTrialDrawer
(here-hence refered as PTD) can be applied for predicting population allocation in a genomic-driven clinical trial design using alterations from the TCGA repositories.
The following information were collected from https://clinicaltrials.gov/show/NCT01771458
Sequencer used: ION Torrent PGM.
Selected gene copy number alterations (amplification or deletion) were assessed using Cytoscan HD according the manufacturer protocol (Affymetrix®).
At the time of the publication on lancet (july 11 2014), the authors have screened 496 patients from different tumor types. 203 had no targetable molecular alteration and 293 (59.1%) patients were enrolled in the study.
The objective of this retrospective analysis is to verify that the TCGA population can be used to predict the population allocation in a real clinical trial design. We will use the R package PTD
to model the design of the SHIVA trial. Then, we will import the raw data from the population recruited at the time of the trail (kindly provided by the authors of the SHIVA trial), download the TCGA data and adjust the population to match the tumor type distribution recruited in SHIVA. Lastly, using PTD we will investigate on the performance of the two population. More specifically we will compare the:
We need to prepare the following inputs
The first objective is to model the original design of the SHIVA trial.
Create a CancerPanel object based on the design suggested in the paper of the SHIVA trial, Supplementary Table S3. In the SHIVA trial, molecular alteration were groups in three 3 molecular pathways:
In case of multiple alteration, the allocation was discussed in the Molecular Biology Board (MBB) and alteration were discussed using the following criteria:
AR (Androgen receptor), ER (Estrogen receptor) and PR (Progesterone receptor) were quantified with Immunohistochemistry. Unfortunately we were not able to simulate IHC values with the TCGA, and this part of the trial was left out.
#import SHIVA Design
# --------------------------------------------------------------------------------
SHIVA_design <- readxl::read_excel("version_6.0/Supplementary_file/SHIVA_design_v6.1.xlsx"
, sheet = 1
, col_types = c("text", "text", "text", "text", "text", "text"))
#fixing input
SHIVA_design$mutation_specification <- as.character(SHIVA_design$mutation_specification)
SHIVA_design$mutation_specification <- sub("\"\"", "", SHIVA_design$mutation_specification)
#CONVERT AMINOACIT TO FORMAT COMPATIBLE FOR PANEL
for (i in 1:nrow(SHIVA_design)) {
if (!is.na(SHIVA_design$mutation_specification[i]) &&
SHIVA_design$mutation_specification[i] != "" &&
SHIVA_design$mutation_specification[i] != "truncating") {
#Print modification
cat("Gene", as.character(SHIVA_design$gene_symbol[i]), ":"
, SHIVA_design$mutation_specification[i]
, "-->", convertAA(SHIVA_design$mutation_specification[i])
, "\n")
#use the helper function to convert the a.a.
SHIVA_design$mutation_specification[i] <- convertAA(SHIVA_design$mutation_specification[i])
}
}
## Gene AKT1 : Glu17Lys --> E17K
## Gene PIK3CA : Glu545Lys --> E545K
## Gene PIK3CA : Glu542Lys --> E542K
## Gene PIK3CA : His1047Arg --> H1047R
## Gene PIK3CA : Thr1025Ala --> T1025A
## Gene PIK3CA : Glu453Lys --> E453K
## Gene PIK3CA : Gln546Lys --> Q546K
## Gene BRAF : Gly469Ala --> G469A
## Gene BRAF : Val600Glu --> V600E
## Gene ERBB2 : Ser792Phe --> S792F
## Gene ERBB2 : Thr862Ala --> T862A
## Gene FLT3 : Met665Thr --> M665T
## Gene KIT : Val852Ile --> V852I
## Gene KIT : Asp572Gly --> D572G
## Gene KIT : Met722Val --> M722V
## Gene KIT : Pro838Ser --> P838S
## Gene PDGFRA : Leu655Trp --> L655W
# preview
# --------------------------------------------------------------------------------
datatable_withbuttons(SHIVA_design)
The original data collected during the SHIVA trials, were provided to us by the authors as an excel file, one with SNV information as well as aggregated data, one with the CNA profile.
Import in the analysis the raw data. Add a column with the TCGA tumor ID confersion Supplementary_file/SHIVA_conversiontable.xlsx
.
# Read in SHIVA raw data
merged_df <- readxl::read_excel("version_6.0/Supplementary_file/SHIVA_authorTable.xlsx"
, sheet = 1
, col_names = TRUE
# , col_types = rep("text", times=24)
)
# Import CNA data for the 496 profiled patients
cna_data <- readxl::read_excel("version_6.0/Supplementary_file/SHIVA_cna_data.xls"
, sheet = 1
, col_names = TRUE)
The objective in this section is to filter out patients that could create problem in the comparison. Using the SHIVA authors table, we want to keep those patients with:
of the remaing now, you can check how many are eligible for randomization
and therefore calculate the expected detected fraction.
Select patients with a full data profile (NGS + CNA) on which to run some population inference.
We have not removed those patients that were not randomized for any particular reasons (death, consensus withdrawal, ..) as long as the mutational profile is known.
Finally, we removed patients with TCGA tumor types acyc, sclc, es, because have no CNA data in TCGA population.
The following table shows the patients consituting the SHIVA population for this analysis.
# Patient filtering
# ------------------------------------------------------------------------------
# All patients have CNA data, so we do not include any filter here.
df_full_profile <- merged_df %>%
# filter by patients that do not have a Complete profile: CNA are OK, no patients filtered there.
# Filter those patient without a complete NGS profile (for SNV analysis)
dplyr::filter(!NGS == "Not done") %>%
# filter those patients with locations incompatible with the TCGA
dplyr::filter(!TCGA_tumor_ID == "") %>%
# Remove patients that were not randomized because of problems
# dplyr::filter(Reason.for.no.randomization == "ERROR")
# Remove patients with tumor types acyc, sclc, es, becuase have no CNA data in TCGA population
dplyr::filter(!TCGA_tumor_ID %in% c("acyc", "sclc", "es"))
# Define tumor type list in the population
# -------------------------------------------------------------------------------
tumor_types <- as.character(unique(df_full_profile$TCGA_tumor_ID))
# calculate weights
tumor_weights <- table(as.character(df_full_profile$TCGA_tumor_ID))
# calculate frequency
tumor_freq <- prop.table(tumor_weights)
# Preview
# --------------------------------------------------------------------------------
datatable_withbuttons(df_full_profile, caption = "Patients with a complete profile")
The population recruited in the trial is composed by differnt types of tumor types (after conversion to TCGA tumor ids). The following barplot shows the number of patient for each tumor type.
data.frame(tumor_weights) %>%
dplyr::rename(Tumor = Var1, Count = Freq) %>%
dplyr::mutate(Tumor = factor(Tumor
, level = Tumor[order(Count)])) %>%
ggplot(aes(Tumor, Count)) +
geom_bar(stat="identity", position=position_dodge() ) +
ggtitle("Number of patient for each tumor type in the SHIVA population") +
coord_flip()
In this paragraph we will convert the SHIVA dataset to a format compatible with PTD. Once correctly formatted, it will be imported as a “population background”. Together, with the TCGA data we will have have two population backgrounds, the SHIVA, and the TCGA. We will be able to compare them and investigate whether TCGA data can simulate the SHIVA situation.
For each gene, in each patient, it was reported a CNA status. For a small percentage of cases, the value reported was ambigous (e.g. “amplification, normal, delete_het”).
################################################################################
# CONVERT SHIVA POPULATION TO BE CAMPATIBLE WITH PTD
################################################################################
################################################################################
# PREPARE CNA DATA
# ------------------------------------------------------------------------------
# dataFull Structure
# LIST:
# $copynumber
# $data
# $gene_symbol
# $CNA
# $case_id (sample names with at least one alteration)
# $tumor_type
# $Samples
# $brca (all screened samples)
# Create Data
# ----------------------
cna_long <- cna_data %>%
lapply(as.character) %>%
data.frame(stringsAsFactors=FALSE) %>%
tidyr::gather("gene_symbol", "CNA", 2:ncol(cna_data)) %>%
dplyr::select(gene_symbol, CNA, case_id = Patient.ID) %>%
# Filter by patients.ID we are interested in
dplyr::filter(case_id %in% df_full_profile$Patient.ID) %>%
#dplyr::mutate(tumor_type="SHIVA") %>% # add fake tumor name
dplyr::mutate(case_id=as.character(case_id)) # conver fctr to chr
# preview before correction
cna_long %>%
dplyr::group_by(CNA) %>%
dplyr::summarise(counts = n()) %>%
ggplot(aes(x=reorder(CNA, counts), y=counts)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title="CNA Alterations types (BEFORE correction)") +
coord_flip()
In this paragraph we will perform a correction on the CNA value, by making all the values match one of the following standard states:
We considered as deleted, only homologous deletions (as in the Clinical trial).
fixIT <- function(from, to, target=cna_long$CNA){
#print(paste("^",from,"$", sep=""))
gsub(paste("^",from,"$", sep=""), to , target)
}
# fix CNA encoding
# ------------------------------------------------------------------------------
cna_long$CNA <- fixIT("delete_hom", "deletion")
cna_long$CNA <- fixIT("normal, delete_het, delete_hom", "deletion")
cna_long$CNA <- fixIT("delete_het, delete_hom", "deletion")
cna_long$CNA <- fixIT("delete_het, loh", "deletion")
cna_long$CNA <- fixIT("normal, delete_hom, loh", "deletion")
cna_long$CNA <- fixIT("normal, delete_hom", "deletion")
# Convert to normal (ambigous amplifications)
# ------------------------------------------------------------------------------
cna_long$CNA <- fixIT("amplification, normal, delete_het", "normal")
cna_long$CNA <- fixIT("amplification, normal, loh", "normal")
cna_long$CNA <- fixIT("gain, delete_het", "normal")
cna_long$CNA <- fixIT("gain, delete_hom", "normal")
cna_long$CNA <- fixIT("gain, normal, delete_hom", "normal")
cna_long$CNA <- fixIT("normal, delete_het", "normal")
cna_long$CNA <- fixIT("amplification, delete_het", "normal")
cna_long$CNA <- fixIT("delete_het", "normal")
cna_long$CNA <- fixIT("gain, loh", "normal")
cna_long$CNA <- fixIT("gain", "normal")
cna_long$CNA <- fixIT("gain, normal, loh", "normal")
cna_long$CNA <- fixIT("NA", "normal")
cna_long$CNA <- fixIT("normal, loh", "normal")
cna_long$CNA <- fixIT("gain, normal", "normal")
cna_long$CNA <- fixIT("amplification, gain", "normal")
cna_long$CNA <- fixIT("amplification, gain, normal", "normal")
cna_long$CNA <- fixIT("amplification, normal", "normal")
# Get patients list and associated tumor id
# ------------------------------------------------------------------------------
temp <- df_full_profile %>%
mutate(case_id = as.character(Patient.ID)) %>%
dplyr::select(tumor_type = TCGA_tumor_ID
, case_id)
# join patients
cna_long <- dplyr::left_join(cna_long, temp, by= "case_id")
# Create Sample
# ----------------------
# get list of tumors
tumor_chr <- df_full_profile$TCGA_tumor_ID %>%
unique %>%
as.character
# prepare funtion to extract sample names for each tumor
sampleXtumor <- function(x, df){
df %>%
dplyr::select(Patient.ID, TCGA_tumor_ID) %>%
dplyr::filter(TCGA_tumor_ID %in% x) %>% .[["Patient.ID"]] %>%
as.character() %>%
unique()
}
# Extract samples for each tumors
Samples <- purrr::map(tumor_chr, ~ sampleXtumor(., df=df_full_profile))
names(Samples) <- tumor_chr
# Check if there are problems.
testthat::expect_equal(length(unique(cna_long$case_id)), nrow(df_full_profile))
testthat::expect_equal(length(unlist(Samples)), nrow(df_full_profile))
# Create list element for CNA for the DataFull argument of the panel (panel@DataFull)
# ----------------------
copynumber <- list( data = data.frame(cna_long)
, Samples = Samples)
Preview the CNA dataset after the correction:
copynumber$data %>%
dplyr::group_by(CNA) %>%
dplyr::summarise(counts = n()) %>%
ggplot(aes(x=CNA, y=counts)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title="CNA Alterations types (after correction)") +
coord_flip()
Preview the CNA dataset in a data table.
copynumber$data %>%
dplyr::group_by(gene_symbol, CNA) %>%
dplyr::summarise(counts = n()) %>%
DT::datatable(caption = "Alterations Split by gene")
Preview the Alteration count split by gene:
copynumber$data %>%
dplyr::filter(CNA %in% c("amplification", "deletion")) %>%
dplyr::group_by(gene_symbol, CNA) %>%
dplyr::summarise(counts = n()) %>%
DT::datatable(caption = "Alterations Split by gene (keep only deletion and amplification terms)")
# Alterations Split by gene (keep only deletion and amplification terms
copynumber$data %>%
dplyr::filter(CNA %in% c("amplification", "deletion")) %>%
dplyr::group_by(gene_symbol, CNA) %>%
dplyr::summarise(counts = n()) %>%
ggplot(aes(x=gene_symbol, y=counts, fill=CNA)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title="Alterations Split by gene (keep only deletion and amplification terms)") +
coord_flip()
Convert the SNV data into a format compatible with PTD.
Some of the gene names in the raw data were corrected
################################################################################
# PREPARE MUTATION DATA
# ------------------------------------------------------------------------------
# str(panel@dataFull$mutations)
# Convert Gene Name to HGNC standard
# --------------------------------------
wrongGName <- c("RAPTOR","PI3KCA","PIK3CB")
rightGName <- c("RPTOR" ,"PIK3CA" ,"PI3KCB")
df_full_profile$gene_symbol <- plyr::mapvalues(df_full_profile$Mutated_genes
, from = wrongGName
, to=rightGName
, warn_missing = FALSE)
# Select fields of interest and start preformatting
#--------------------------------------
mutation_data <- df_full_profile %>%
dplyr::select(Patient.ID
, amino_acid_change = pp_old
, tumor_type = TCGA_tumor_ID
, gene_symbol
#, nm = NM_old
) %>%
dplyr::mutate(amino_acid_change = as.character(amino_acid_change)
, tumor_type = as.character(tumor_type)
#, nm = as.character(nm)
)
# Unnesting Genes and the data
###################################################
# split genes and mutations by ";"
# -----------------------------------------------
genes <- strsplit(mutation_data$gene_symbol, ";")
mutations <- strsplit(mutation_data$amino_acid_change, ";")
# Quality check - remove those patients that have ambiguos SNV annotations
# -----------------------------------------------
gene_lengths <- purrr::map_int(genes, length)
mutations_lengths <- purrr::map_int(mutations, length)
# init patiets to be removed
index_TOBEREMOVED <- c()
# 1. Remove genes that have not the number of mutatiosn
index <- gene_lengths != mutations_lengths
if (any(index)) {
warning("Problem with imported raw data: the number of alterated genes, does not match the number of SNV alterations. The patient affected by this problem will be removed the analysis\n")
# show message with ruther details
message("patients removed:\n", as.character(mutation_data[index, "Patient.ID"]))
# FIX - REMOVE THEM
# ~~~~~~~~~~~~~~~~~
#mutation_data <- mutation_data[-which(index),]
index_TOBEREMOVED <- index
}
## Warning: Problem with imported raw data: the number of alterated genes, does not match the number of SNV alterations. The patient affected by this problem will be removed the analysis
## patients removed:
## c("08-0024", "03-0007", "04-0025", "05-0042", "06-0096", "07-0099", "06-0113", "07-0027", "04-0039")
# 2. Remove genes that have one NA in genes or Mutations and
# not a corresponding NA in the other field.
index2 <- is.na(genes) != is.na(mutations)
if (any(index2)) {
warning("Problem with imported raw data: some ambigous annotation in some genes-mutations. Patients affected will be removed\n")
# show message with ruther details
message("patients removed:\n", as.character(mutation_data[index2, "Patient.ID"]))
# FIX - REMOVE THEM
# ~~~~~~~~~~~~~~~~~
#mutation_data <- mutation_data[-which(index2),]
index_TOBEREMOVED <- c(index_TOBEREMOVED, index2)
}
## Warning: Problem with imported raw data: some ambigous annotation in some genes-mutations. Patients affected will be removed
## patients removed:
## c("07-0098", "01-0222")
if(length(index_TOBEREMOVED)>0){
# remove patients that were selected for removal
mutation_data <- mutation_data[-which(index),]
}
# Unnesting genes and alterations
# -----------------------------------------------
unnested_gene <- mutation_data %>% tidyr::separate_rows(gene_symbol, sep = ";")
unnested_pp <- mutation_data %>% tidyr::separate_rows(amino_acid_change, sep = ";")
#merge them back together
unnested_gene$amino_acid_change <- unnested_pp$amino_acid_change
# unnest alterations, second level, split by "|"
unnested_df <- unnested_gene %>%
tidyr::separate_rows(amino_acid_change, sep = "\\|")# %>% unique
# Quality Control
# ----------------------------------------------
if(sum((unnested_df$amino_acid_change) != "", na.rm = TRUE) + sum(is.na(unnested_df$amino_acid_change))
!= purrr::map_int(strsplit(mutation_data$amino_acid_change, ";|\\|"), length) %>% sum()){
stop("unnesting did not work as expected")
}
unnested_df <- unique(unnested_df)
# Adjust AA change
#--------------------------------------
unnested_df$amino_acid_change <- sub("^p." , "" , unnested_df$amino_acid_change)
unnested_df$amino_acid_change <- sub("[" , "" , unnested_df$amino_acid_change, fixed = TRUE)
unnested_df$amino_acid_change <- gsub("Qln" , "Gln" , unnested_df$amino_acid_change, fixed = TRUE)
# conver amminoacid for from long to short
unnested_df$amino_acid_change <- purrr::map_chr(unnested_df$amino_acid_change, hgvs_long_to_short)
# Fix strange simbols
unnested_df <- unnested_df %>%
mutate_cond(amino_acid_change %in% c("NA", "", "XXXX"), amino_acid_change = NA) # custom made conditional mutate
# parse aa position
unnested_df$amino_position <- stringr::str_extract(unnested_df$amino_acid_change , "\\d+")
# 2. Remove genes that have one NA in genes or Mutations and
# not a corresponding NA in the other field.
index2 <- is.na(unnested_df$gene_symbol) != is.na(unnested_df$amino_acid_change)
if (any(index2)) {
warning("Problem with imported raw data: some ambigous annotation in some genes-mutations. Patients affected will be removed\n")
# show message with ruther details
message("patients removed:\n", as.character(unnested_df[index2, "Patient.ID"]))
# FIX - REMOVE THEM
# ~~~~~~~~~~~~~~~~~
unnested_df <- unnested_df[-which(index2),]
}
## Warning: Problem with imported raw data: some ambigous annotation in some genes-mutations. Patients affected will be removed
## patients removed:
## c("01-0202", "06-0097", "07-0098", "01-0222")
# Extract and prepare Sample info
#--------------------------------------
temp_sampledf <- unnested_df %>% dplyr::select(Patient.ID, TCGA_tumor_ID = tumor_type)
tumor_chr <- unique(temp_sampledf$TCGA_tumor_ID)
Samples <- purrr::map( tumor_chr, ~ sampleXtumor(., df= temp_sampledf))
names(Samples) <- tumor_chr
# Subset dataset where there are mutations
#--------------------------------------
# !!!(THIS HAS TO BE DONE AFTER, RETRIEVING THE SAMPLES for the panel structure) !!!!
# otherwise we risk to miss patients that are not included simply becuase they have no alteration
# Load entrez ID
xx <- as.list(org.Hs.egALIAS2EG)
# Remove pathway identifiers that do not map to any entrez gene id
xx <- xx[!is.na(xx)]
unnested_df <- unnested_df %>%
dplyr::filter(!gene_symbol == "") %>% # remove those patients that DO NOT have a gene matched to an alteration
dplyr::mutate(Entrez_gene_ID = sapply(gene_symbol , function(x) as.integer(xx[[x]][1])))
# Very shallow classification of variants
unnested_df$mutation_type <- with( unnested_df ,
ifelse(grepl("Ter$" , amino_acid_change) , "Nonsense_Mutation" ,
ifelse(grepl("fs$" , amino_acid_change) , "Frame_Shift_Ins" ,
ifelse(grepl("del$" , amino_acid_change) , "Frame_Shift_Del" ,
ifelse(grepl("ins" , amino_acid_change) , "Frame_Shift_Ins" ,
"Missense_Mutation"))))
)
# dummy genomic positon as a place holder
unnested_df$genomic_position <- "1:11111:C,G"
#"7:140439656:C,G"
# dummy genetic profile as a place holder
unnested_df$genetic_profile_id <- unnested_df$tumor_type
# change name of Patient.ID column to case_id
#colnames(unnested_df)[colnames(unnested_df)=="Patient.ID"] <- "case_id"
# SOrf df
mutations_df <- unnested_df %>%
dplyr::select(entrez_gene_id= Entrez_gene_ID
, gene_symbol
, case_id = Patient.ID
, mutation_type
, amino_acid_change
, genetic_profile_id
, tumor_type
, amino_position
, genomic_position
) %>%
dplyr::mutate(amino_position = as.integer(amino_position)) %>%
dplyr::mutate(case_id = as.character(case_id))
# Create list element for CNA for the DataFull argument of the panel (panel@DataFull)
# ----------------------
mutations <- list( data = data.frame(mutations_df)
, Samples = Samples)
# PUT THEM TOGETHER
################################################################################
fusion = list(data = NULL
, Samples = NULL
)
expression = list(data = NULL
, Samples = NULL
)
SHIVA_repo <- list(fusions = fusion
, mutations = mutations
, copynumber = copynumber
, expression = expression
)
Preview the SNV data to be imported
mutations$data %>%
dplyr::group_by(gene_symbol) %>%
dplyr::summarise(count=n()) %>%
ggplot(aes(x=reorder(gene_symbol, count), y=count)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title="Total SNV (before any filtering) from the SHIVA population") +
coord_flip()
This can be achieved using the build in argument “repo” from the PTD package.
# DESIGN
SHIVA_population <- newCancerPanel(SHIVA_design)
#DOWNLOAD alternative
SHIVA_population <- getAlterations(SHIVA_population
, tumor_type = tumor_types
, repos = SHIVA_repo
)
#SIMULATE
SHIVA_population <- subsetAlterations(SHIVA_population)
#save a backup copy to speed up future analysis
saveRDS(SHIVA_population, file = "version_6.0/Temp/SHIVA_population.rds")
Since we are interested in balancing the TCGA population to make it match with the recruited SHIVA population, we need to estimate how many patients per tumor types will be used.
We can calculate this number “adjusting weight” looking at those patients that have both CNA and SNV data.
Number of patients per tumor type divided by dataset type:
# check pop size from CNA
mya <- purrr::map_int(SHIVA_population@dataFull$copynumber$Samples, length) %>%
data.frame(count = .) %>%
tibble::rownames_to_column("tumor_id") %>%
dplyr::mutate(origin="CNA")
# check pop size from Mutations
myb <- purrr::map_int(SHIVA_population@dataFull$mutations$Samples, length) %>%
data.frame(count = .) %>%
tibble::rownames_to_column("tumor_id") %>%
dplyr::mutate(origin="mutations")
# calculate adjusting weight
# ------------------------------------------------------------------------------
# tumor_weights will be the vector to adjust the TCGA population
tumor_weights <- names(SHIVA_population@dataFull$copynumber$Samples) %>%
purrr::map(function(tumor_name){
mut = SHIVA_population@dataFull$mutations$Samples[[tumor_name]]
cna = SHIVA_population@dataFull$copynumber$Samples[[tumor_name]]
# find intersection of patients
output <- length(mut %in% cna)
names(output) = tumor_name
return(output)
}) %>% flatten_int
# check pop size used internally for adjusting TCGA
myc <- tumor_weights %>%
data.frame(count = .) %>%
tibble::rownames_to_column("tumor_id") %>%
dplyr::mutate(origin="new adjusting weight")
# generate a plot
rbind(mya, myb, myc) %>%
ggplot(aes(tumor_id, count, fill=origin)) +
geom_bar(stat="identity", position= position_dodge()) +
coord_flip() +
ggtitle("number of patients per alteration type from SHIVA.")
Simulate Clinical trial on TCGA population
# DESIGN
TCGA_population <- newCancerPanel(SHIVA_design)
#DOWNLOAD alternative
TCGA_population <- getAlterations(TCGA_population
, tumor_type = tumor_types
, expr_z_score = 1
)
#SUBSET
TCGA_population <- subsetAlterations(TCGA_population)
#save a backup copy to speed up future analysis
saveRDS(TCGA_population, file = "version_6.0/Temp/TCGA_population.rds")
In this section we will apply a filter to remove all the NON pathogenic mutation fetched from cosmic dataset (01-05-2017).
# save the number of entries in the TCGA beofere any filtering
TCGA_original_number <- nrow(unique(TCGA_population@dataFull$mutations$data))
# Clean TCGA popualation running a cosmic filter
cosmic <- readr::read_csv("version_6.0/external_resources/cosmic_pathogenic_SHIVApanel")
cosmic_df <- data.frame(gene_symbol = cosmic$gene
, amino_acid_change = sub("^p." , "" , cosmic$AA.Mutation)
, stringsAsFactors = FALSE)
TCGA_population <- filterMutations(TCGA_population , filtered = cosmic_df , mode = "keep")
32.2
% ** of the TCGA original SNVs.CNA data provided by cBioPortal reports the final step in CNV calls from GISTIC pipeline (CIT). CNVs are reported as either
Values of -2 or 2 are considered alterations, while the rest is considered as normal copies. For research purposes, these criteria are very useful to discover new potential gene copies alterations but a clinical setting requires stronger evidence for a call, especially in case of amplification.
The SHIVA protocol for calling AKT CNVs, for example, requires at least the presence of 5-6 copies to be called altered and possibly targettable.
A similar granularity is not possible with the final GISTIC results but it can be obtained up to a certain degree using the original data provided by the GDAC Firehose (the same used by cBioPortal).
We downloaded GISTIC run 2016_01_28 all_thresholded.by_genes.txt files for 32 tumor types and normalized the segmented value for each patient by their respective thresholds. Each sample of the TCGA cohort has a deletion and amplification call threshold calculated according to estimated ploidy and purity of the original sample. For example, a low purity sample will have lower threshold for calling a CNV compared to a 100% purity sample because of the normal diploid cells contamination. These thresholds represents the points were an amplification (2) or deletion (-2) are called.
This normalization reformats the data as absolute number of copies in the following way:
While data from 1.1 to 2.9 are the same as before (considered normal on default), now 1 is the point of -2 deletion and 3 the point of +2 amplifiaction. We have now the full spectrum of amplification from > 3 that can be used to fine-tune the clinically relevant threshold (4, 5 ,6 copies for example).
# SETTINGS
# ==============================================================================
# DEFINE GENES: define genes we are interested for the CNA analysis
CNA_genes <- TCGA_population@arguments$panel %>%
dplyr::filter(alteration == "CNA") %>%
dplyr::select(gene_symbol) %>%
.[[1]]
# PATH: set global variable with the path to.rds files with CNA raw data
# PATH_2CNA_FILES <- "/Users/ale/IEO/projects/IEO_research/R_package_panel_and_projection/functionfactory/cna_firehose_rds/"
# Set a relative path for reproducibility
PATH_2CNA_FILES <- "version_6.0/Temp/cna_firehose_rds/"
# helper function that reads in a firehose RDS file, filters by gene, and apply the threshold configured
# ==============================================================================
get_cna_long <- function(tumor_type, PATH_2rds, genes_chr, del_thr = 1.1, amp_thr){
# CHECKS
# --------------------------------------------------------------------------------
# check that PATH actually leads to a dir
if(!dir.exists(PATH_2rds)){ stop("path to a dir that cannot be found")}
# Check that all the files are found for each tumor type required in the analysis
input_lgl <- purrr::map_lgl(tumor_types, ~ file.exists(paste0(PATH_2rds, toupper(.) , "_PTD.rds")))
if(!all(input_lgl)){
stop(paste("Files for some of the tumor types where not found:", tumor_types[-which(input_lgl)]))
}
# Checks
if(!is.character(tumor_types) | is.null(tumor_types) | length(tumor_types) ==0){
stop(paste("Tumor type is not confirm", tumor_types))
}
# Define an internal function to do the job
# -------------------------------------------------------------------------------
get_CNA_long_onetumor <- function(one_tumor, PATH_2rds, genes_chr, del_thr = 1.1, amp_thr){
# read in CNA file
# myfile <- readRDS(paste0(PATH_2rds, toupper(one_tumor) , "_PTD.rds"))
# if(!file.exists(myfile)){
# message(myfile)
# return(NULL)
# }
# mat <- tryCatch(readRDS(myfile) , error = function(e) stop(myfile))
mat <- readRDS(paste0(PATH_2rds, toupper(one_tumor) , "_PTD.rds"))
# extract Patients
pats <- colnames(mat)
# Filter for gene of interest
mat <- mat[ genes_chr , ]
# Melt and adjust
output <- reshape2::melt(mat , value.name = "CNAvalue")
colnames(output) <- c("gene_symbol" , "case_id" , "CNAvalue")
output$tumor_type <- one_tumor
# Choose cutoff values
# -----------------------------------------------------------------------------
# With default threshold data are now like this:
# [0 , 1.1) "deletion"
# 1.1 "shallow deletion"
# 2 "normal"
# 2.9 "shallow amplification"
# (2.9 , Inf] "amplification"
output$CNA <- ifelse( output$CNAvalue <del_thr
, "deletion"
, ifelse(output$CNAvalue > amp_thr , "amplification" , "normal"))
# select
output <- output %>%
dplyr::select(gene_symbol , CNA , case_id , tumor_type, CNAvalue) %>%
dplyr::mutate(gene_symbol = as.character(gene_symbol)) %>%
dplyr::mutate(case_id = as.character(case_id))
return(output)
}
# Main body of the function
# -----------------------------------------------------------------------------
cna_long_df <- purrr::map(tumor_types
, ~ get_CNA_long_onetumor(., PATH_2rds, genes_chr, del_thr, amp_thr)
) %>%
dplyr::bind_rows()
return(cna_long_df)
}
# RUN MAIN FUNCTION
# ==============================================================================
cna_long_df <- get_cna_long(tumor_types, PATH_2CNA_FILES, CNA_genes, del_thr = 1.1, amp_thr = 4)
# Create Sample
# ----------------------
# get list of tumors
tumor_chr <- cna_long_df$tumor_type %>%
unique %>%
as.character
# prepare funtion to extract sample names for each tumor
sampleXtumor <- function(x, df){
df %>%
dplyr::select(case_id, tumor_type) %>%
dplyr::filter(tumor_type %in% x) %>% .[["case_id"]] %>%
as.character() %>%
unique()
}
# Extract samples for each tumors
Samples <- purrr::map(tumor_chr, ~ sampleXtumor(., df=cna_long_df))
names(Samples) <- tumor_chr
# Check if there are problems.
testthat::expect_equal(length(unlist(Samples)), length(unique(cna_long_df$case_id)))
# Create list element for CNA for the DataFull argument of the panel (panel@DataFull)
# ==============================================================================
copynumber <- list( data = data.frame(cna_long_df)
, Samples = Samples)
# Preview changes before committing
#TCGA_population@dataFull$copynumber <- copynumber
TCGA_cna_pop_comparison <- purrr::map2(
.x = list( TCGA_population@dataFull$copynumber$Samples, copynumber$Samples )
, .y = list( "TCGA", "Firehose" )
, .f = ~ purrr::map_int(.x, length) %>%
data.frame(n_patients = .) %>%
tibble::rownames_to_column("tumor_type") %>%
dplyr::mutate(source = .y)
) %>%
dplyr::bind_rows() %>%
ggplot(aes(tumor_type, n_patients, fill=source)) +
geom_bar(stat="identity", position= position_dodge()) +
coord_flip() +
ggtitle("number of patients in the CNA datasets")
TCGA_cna_pop_comparison
apply changes to the TCGA population object.
# Apply changes
TCGA_population@dataFull$copynumber <- copynumber
TCGA_population <- subsetAlterations(TCGA_population)
## Subsetting mutations...
## Subsetting copynumber...
In this section we will visualize a comparison between the samples available from the TCGA versus the samples in the SHIVA trial.
purrr::map2(
.x = list( TCGA_population@dataFull$copynumber$Samples, SHIVA_population@dataFull$copynumber$Samples )
, .y = list( "TCGA", "SHIVA" )
, .f = ~ purrr::map_int(.x, length) %>%
data.frame(n_patients = .) %>%
tibble::rownames_to_column("tumor_type") %>%
dplyr::mutate(source = .y)
) %>%
dplyr::bind_rows() %>%
ggplot(aes(reorder(tumor_type, n_patients), n_patients, fill=source)) +
geom_bar(stat="identity", position= position_dodge()) +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
coord_flip() +
ggtitle("Number of samples available from TCGA and SHIVA group by tumor types")
One common question that is often considered when trying to design custom target enrichment panel, is the percentage of the population that would be “captured” by the design of the panel. The following plot shows the number of alteration (x-axis) found in the sample (patient) population (y-axis). The panel detected fraction can be defined as the number of patients with at least one alteration detectable by the genomic panel. In this simulation we are interested in comparing the detected fraction in the TCGA and SHIVA population. The TCGA population, will be corrected with a weighted bootstrap approach, to match the composition of the SHIVA population during recruitment.
# Run simulation
# ------------------------------------------------------------------------------
N_SIMULATIONS = 200
simul <- mclapply(1:N_SIMULATIONS , function(x) {
cov <- coveragePlot(TCGA_population
, alterationType=c("mutations" , "copynumber")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 7
, noPlot=TRUE
)
return(cov$plottedTable/cov$Samples[1])
} , mc.cores=4)
## Prepare function for plotting detected fraction with Confidenti intervals
# ------------------------------------------------------------------------------
# Convert to dataframe
simul <- data.frame(matrix(unlist(simul)
, nrow=length(simul)
, byrow=T )
, stringsAsFactors = FALSE)
# GET STATS
# ------------------------------------------------------------------------------
# Calculate for each n° of alteration, the mean
detectionPower <- colMeans(simul)
# Calculate lower CI for bootstrap
leftCI <- apply(simul,2,function(x){left <- quantile(x, 0.025)})
# Calculate upper CI for bootstrap
rightCI <- apply(simul,2,function(x){right <- quantile(x, 0.975)})
# Preapre for column with # alterations
n_alterations <- 1:length(detectionPower)
# Put them all together in a data frame
dtpow <- data.frame(n_alterations, detectionPower, leftCI, rightCI)
# Generate the plot
TCGA_detection_image <- dtpow %>%
ggplot(aes(x=n_alterations, y=detectionPower)) +
geom_bar(stat="identity" , position=position_dodge(), fill=MYPALETTE(10)[2]) +
scale_y_continuous(limits = c(0,1)) +
scale_x_continuous(breaks = n_alterations, trans = "reverse") +
geom_errorbar(aes(ymax=rightCI
, ymin=leftCI)
, position=position_dodge(0.9)
) +
ggtitle("Overall detected fraction in TCGA Population") +
annotate("text"
, x=n_alterations
, y=detectionPower+0.15
, label= sapply(detectionPower*100
, function(x) paste(c(round(x, digits=2), "%")
, collapse=" ")
)
) +
coord_flip()
# Show image
TCGA_detection_image
# Preview table
datatable_withbuttons(dtpow, caption = "detected fraction TCGA")
Preview table with the contribution to the overall detected fraction by only mutation alterations.
# MUTATION CONTRIBUTIONS
# ------------------------------------------------------------------------------
simul <- mclapply(1:N_SIMULATIONS , function(x) {
cov <- coveragePlot(TCGA_population
, alterationType=c("mutations")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 7
, noPlot=TRUE
)
return(cov$plottedTable/cov$Samples[1])
} , mc.cores=4)
TCGA_detection_muts <- simulate_detectionPower_img(simul)
# Preview image
#TCGA_detection_muts + ggtitle("Mutation only detected fraction in TCGA")
# preview table
simulate_detectionPower_tbl(simul) %>%
datatable_withbuttons(caption = "Mutation only detected fraction in TCGA")
Preview table with the contribution to the overall detected fraction by only CNA alteartions.
# COPYNUMBER CONTRIBUTION
# ------------------------------------------------------------------------------
simul <- mclapply(1:N_SIMULATIONS , function(x) {
cov <- coveragePlot(TCGA_population
, alterationType=c("copynumber")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 7
, noPlot=TRUE
)
return(cov$plottedTable/cov$Samples[1])
} , mc.cores=4)
TCGA_detection_cna <- simulate_detectionPower_img(simul)
# Preview image
#TCGA_detection_cna + ggtitle("CNA only detected fraction in TCGA")
# Preview table
simulate_detectionPower_tbl(simul) %>%
datatable_withbuttons(caption = "CNA only detected fraction in TCGA")
Preview plot with the contribution to the overall detected fraction by only mutation and CNA alteartions seperately
gridExtra::grid.arrange(TCGA_detection_muts + ggtitle("detected fraction in TCGA: Mutation only")
, TCGA_detection_cna + ggtitle("detected fraction in TCGA: CNA only "))
# calculate shiva detected fraction
shiva_detect <-suppressWarnings(
coveragePlot(SHIVA_population
, alterationType = c("mutations", "copynumber")
, collapseByGene = TRUE
, maxNumAlt = 7
, noPlot= TRUE
)
)
# calculate frequencies
shiva_freq <- as.numeric(shiva_detect$plottedTable/shiva_detect$Samples[1])
# Preapre for column with # alterations
n_alterations <- 1:length(shiva_freq)
# Put them all together in a data frame
dtpow_shiva <- data.frame(n_alterations, shiva_freq= shiva_freq)
# left CI and right CI
dtpow_shiva$leftCI <- purrr::map_dbl(shiva_freq, ~ leftCIprop(., sum(tumor_weights)))
dtpow_shiva$rightCI <- purrr::map_dbl(shiva_freq, ~ rightCIprop(., sum(tumor_weights)))
# Generate the plot
SHIVA_detection_image <- dtpow_shiva %>%
ggplot(aes(x=n_alterations, y=shiva_freq)) +
geom_bar(stat="identity" , position=position_dodge(), fill=MYPALETTE(10)[10]) +
scale_y_continuous(limits = c(0,1)) +
scale_x_continuous(breaks = n_alterations, trans = "reverse") +
geom_errorbar(aes( ymax= leftCI
, ymin= rightCI)
, position=position_dodge(0.9)
) +
ggtitle("Detected fraction SHIVA population") +
annotate("text"
, x=n_alterations
, y=shiva_freq+0.15
, label= sapply(shiva_freq*100
, function(x) paste(c(round(x, digits=2), "%")
, collapse=" ")
)
) +
coord_flip()
#Export as a svg
#img_file = "../Figures/fig4B.svg"
#ggsave(filename=img_file, plot=SHIVA_detection_image, device = "svg")
# Show image
SHIVA_detection_image
# Preview table
datatable_withbuttons(dtpow_shiva)
coveragePlot(SHIVA_population
, alterationType = c("mutations", "copynumber")
, grouping = "alteration_id"
#, noPlot=TRUE
)
# SHIVA mutations ONLY
cov <- coveragePlot(SHIVA_population
, alterationType = c("mutations")
, grouping = "alteration_id"
, noPlot=TRUE
)
cov <- data.frame( cov$plottedTable / cov$Samples) %>%
tidyr::gather(n_alterations, freq)
cov$leftCI <- purrr::map_dbl(cov$freq, ~ leftCIprop(., sum(tumor_weights)))
cov$rightCI <- purrr::map_dbl(cov$freq, ~ rightCIprop(., sum(tumor_weights)))
datatable_withbuttons(cov, caption="Datatable with detected fraction from Mutations for the SHIVA population")
# SHIVA copynumber ONLY
cov <- coveragePlot(SHIVA_population
, alterationType = c("copynumber")
, grouping = "alteration_id"
, noPlot=TRUE
)
cov <- data.frame( cov$plottedTable / cov$Samples) %>%
tidyr::gather(n_alterations, freq)
cov$leftCI <- purrr::map_dbl(cov$freq, ~ leftCIprop(., sum(tumor_weights)))
cov$rightCI <- purrr::map_dbl(cov$freq, ~ rightCIprop(., sum(tumor_weights)))
datatable_withbuttons(cov, caption="Datatable with detected fraction from CNA for the SHIVA population")
fig4a <- gridExtra::grid.arrange(TCGA_detection_image, SHIVA_detection_image)
#Export as a svg
#img_file = "../Figures/fig4A.svg"
#ggsave(filename=img_file, plot=fig4a, device = "svg", width=10, height = 8)
# Preview image
fig4a
## TableGrob (2 x 1) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (2-2,1-1) arrange gtable[layout]
In this section, we will compare each gene individually, for both CNA and Mutations.
###############################################################################
# Fetch SNV data from TCGA population
# -----------------------------------------------------------------------------
tcga_wide <- purrr::map(1:N_SIMULATIONS
, ~ cpFreq(TCGA_population, alteration="mutations", tumor.weights = tumor_weights) %>%
tidyr::spread(gene_symbol, freq)
) %>% bind_rows
# GET STATS
# ------------------------------------------------------------------------------
# Calculate for each n° of alteration, the mean
freq <- colMeans(tcga_wide)
# Calculate lower CI for bootstrap
leftCI <- apply(tcga_wide,2,function(x){left <- quantile(x, 0.025)})
# Calculate upper CI for bootstrap
rightCI <- apply(tcga_wide,2,function(x){right <- quantile(x, 0.975)})
# Put them all together in a data frame
tcga_snv <- data.frame(freq, leftCI, rightCI, population="TCGA") %>%
tibble::rownames_to_column("gene")
###############################################################################
# Fetch CNA data from SHIVA population
# -----------------------------------------------------------------------------
temp <- cpFreq(SHIVA_population, alteration="mutations")
temp <- temp[which(temp[["gene_symbol"]] %in% SHIVA_population@arguments$panel$gene_symbol),]
# get CI
leftCI <- purrr::map_dbl(temp$freq, ~ leftCIprop(., sum(tumor_weights)))
rightCI <- purrr::map_dbl(temp$freq, ~ rightCIprop(., sum(tumor_weights)))
# get DF
shiva_snv <- temp %>%
dplyr::rename(gene = gene_symbol) %>%
dplyr::mutate(leftCI=leftCI
, rightCI=rightCI
) %>%
dplyr::mutate(population = "SHIVA")
###############################################################################
# put the plots together
# -----------------------------------------------------------------------------
# Put them together
merged_snv <- rbind(tcga_snv, shiva_snv)
# bind SHIVA and TCGA
fig4b <- merged_snv %>%
dplyr::mutate(gene= factor(gene, levels = unique(gene[order(freq)]))) %>%
ggplot(aes(x=gene, y=freq, fill=population)) +
geom_bar(stat="identity", position=position_dodge()) +
scale_fill_manual(values=MYPALETTE(10)[c(2,10)]) +
#scale_y_continuous(limits = c(0,.7)) +
geom_errorbar(aes(ymax=rightCI
, ymin=leftCI)
, position=position_dodge(0.9)
) +
ggtitle("SNV frequency comparison in the two populations") +
coord_flip()
#img_file = "../Figures/fig4B.svg"
#ggsave(filename=img_file, plot=fig4b, device = "svg", width=10, height = 8)
# Preview
fig4b
# Fetch CNA data from TCGA population
# ------------------------------------------------------------------------------
simul_amplifications <- lapply(1:50, function(x) {
cpFreq(TCGA_population, alteration="copynumber", tumor.weights = tumor_weights) %>%
dplyr::select(amplification) %>%
tibble::rownames_to_column("gene") %>%
tidyr::spread(gene, amplification)
})
simul_amplifications <- simul_amplifications %>% bind_rows()
simul_deletions <- lapply(1:50, function(x) {
cpFreq(TCGA_population, alteration="copynumber", tumor.weights = tumor_weights) %>%
dplyr::select(deletion) %>%
tibble::rownames_to_column("gene") %>%
tidyr::spread(gene, deletion)
}) %>% bind_rows()
# GET STATS
# ------------------------------------------------------------------------------
TCGA_amplifications_df <- get_bootstrap_stats(simul_amplifications) %>%
tibble::rownames_to_column("gene") %>%
dplyr::mutate(population = "TCGA")
TCGA_deletion_df <- get_bootstrap_stats(simul_deletions) %>%
tibble::rownames_to_column("gene") %>%
dplyr::mutate(population = "TCGA")
# Fetch CNA data from SHIVA population
# ------------------------------------------------------------------------------
# Amplifications
# ------------
shiva_cna_amplification <- cpFreq(SHIVA_population, alteration="copynumber") %>%
tibble::rownames_to_column("gene") %>%
dplyr::select(-deletion, -normal) %>%
dplyr::rename(freq = amplification) %>%
dplyr::mutate(population = "SHIVA")
# Calculate left CI and right CI
shiva_cna_amplification$leftCI <- purrr::map_dbl(shiva_cna_amplification$freq, ~ leftCIprop(., sum(tumor_weights)))
shiva_cna_amplification$rightCI <- purrr::map_dbl(shiva_cna_amplification$freq, ~ rightCIprop(., sum(tumor_weights)))
# last formattings
shiva_cna_amplification <- shiva_cna_amplification %>%
dplyr::mutate(row_counter = 1:nrow(shiva_cna_amplification)) %>%
dplyr::select(gene, row_counter, freq, leftCI, rightCI, population)
# Deletion
# ------------
shiva_cna_deletion <- cpFreq(SHIVA_population, alteration="copynumber") %>%
tibble::rownames_to_column("gene") %>%
dplyr::select(-amplification, -normal) %>%
dplyr::rename(freq = deletion) %>%
dplyr::mutate(population = "SHIVA")
# Calculate left CI and right CI
shiva_cna_deletion$leftCI <- purrr::map_dbl(shiva_cna_deletion$freq, ~ leftCIprop(., sum(tumor_weights)))
shiva_cna_deletion$rightCI <- purrr::map_dbl(shiva_cna_deletion$freq, ~ rightCIprop(., sum(tumor_weights)))
# last formattings
shiva_cna_deletion <- shiva_cna_deletion %>%
dplyr::mutate(row_counter = 1:nrow(shiva_cna_deletion)) %>%
dplyr::select(gene, row_counter, freq, leftCI, rightCI, population)
# Prepare filter
# Removed genes not considered in the panel for amplifications or deletions
# ------------
amplification_filter <- SHIVA_design %>%
dplyr::filter(exact_alteration == "amplification") %>%
dplyr::select(gene_symbol) %>%
.[[1]] %>%
as.character()
deletion_filter <- SHIVA_design %>%
dplyr::filter(exact_alteration == "deletion") %>%
dplyr::select(gene_symbol) %>%
.[[1]] %>%
as.character()
# Merge datasets
# ------------
merged_cna_amplifications <- rbind(TCGA_amplifications_df, shiva_cna_amplification) %>%
dplyr::filter(gene %in% amplification_filter)
merged_cna_deletion <- rbind(TCGA_deletion_df, shiva_cna_deletion) %>%
dplyr::filter(gene %in% deletion_filter)
# bind SHIVA and TCGA | Amplification
# -----
CNA_plot_amplification <- merged_cna_amplifications %>%
dplyr::mutate(gene= factor(gene, levels = unique(gene[order(freq)]))) %>%
ggplot(aes(x=gene, y=freq, fill=population)) +
geom_bar(stat="identity", position=position_dodge()) +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
#scale_y_continuous(limits = c(0,.7)) +
geom_errorbar(aes(ymax=rightCI
, ymin=leftCI)
, position=position_dodge(0.9)
) +
ggtitle("CNA Amplification frequency comparison in the two populations") +
coord_flip()
CNA_plot_deletion <- merged_cna_deletion %>%
dplyr::mutate(gene= factor(gene, levels = unique(gene[order(freq)]))) %>%
ggplot(aes(x=gene, y=freq, fill=population)) +
geom_bar(stat="identity", position=position_dodge()) +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
#scale_y_continuous(limits = c(0,.7)) +
geom_errorbar(aes(ymax=rightCI
, ymin=leftCI)
, position=position_dodge(0.9)
) +
ggtitle("CNA Deletion frequency comparison in the two populations") +
coord_flip()
# Make the plot
fig4c <- gridExtra::grid.arrange(CNA_plot_amplification, CNA_plot_deletion, ncol=1, heights = c(20,5))
#Export as a svg
#img_file = "../Figures/fig4C.svg"
#ggsave(filename=img_file, plot=fig4c, device = "svg", width=10, height = 8)
# preview plot
fig4c
## TableGrob (2 x 1) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (2-2,1-1) arrange gtable[layout]
In this part of the analysis we are comparing the patient allocation to the two molecular pathway threatments.
Show the maximum patient allocation to the molecular pathway threatmet. This is theoretical as, it presumes that one patient can be allocated to more than one treatment. There is infact a small percentage of patients with a set of alteration that makes them eligeble for both molecular treatmets.
# ##############################################################################
# Shiva population
# ##############################################################################
shiva_pathway <- coveragePlot(SHIVA_population
, alterationType = c("mutations", "copynumber")
, collapseByGene = TRUE
, grouping = "group"
, maxNumAlt = 1
, noPlot = TRUE
)
# calcualte frequency
shiva_pathway <- data.frame(shiva_pathway$plottedTable /shiva_pathway$Samples) %>%
tibble::rownames_to_column("group") %>%
dplyr::rename(allocation =X1)
# Calculate left CI and right CI
shiva_pathway$leftCI <- purrr::map_dbl(shiva_pathway$allocation, ~ leftCIprop(., sum(tumor_weights)))
shiva_pathway$rightCI <- purrr::map_dbl(shiva_pathway$allocation, ~ rightCIprop(., sum(tumor_weights)))
# transform table
shiva_allocation <- shiva_pathway %>%
dplyr::mutate(row_counter = 1:n(), population = "SHIVA") %>%
dplyr::select(group, row_counter, allocation, leftCI, rightCI, population)
# pepare plot
# shiva_allocation_img <- shiva_allocation %>%
# ggplot(aes(x=row_counter, y=allocation, fill=group)) +
# geom_bar(stat="identity" , position=position_dodge()) +
# ggtitle("Shiva population pathway allocaiton frequency")
# Preview
#shiva_allocation_img
# ##############################################################################
# TCGA population
# ##############################################################################
simul <- mclapply(1:N_SIMULATIONS , function(x) {
# calculate allocated patients
cov <- coveragePlot(TCGA_population
, alterationType = c("mutations", "copynumber")
, collapseByGene = TRUE
, grouping = "group"
, maxNumAlt = 1
, tumor.weights = tumor_weights #tumor_weights
, noPlot=TRUE
)
# get frequency
as.data.frame(cov$plottedTable) %>%
tibble::rownames_to_column("group") %>%
dplyr::select(group, allocation= `1`) %>%
dplyr::mutate(allocation = allocation /cov$Samples[1]) %>%
tidyr::spread(group, allocation) %>%
return()
} , mc.cores=4) %>% bind_rows
# GET STATS
# ------------------------------------------------------------------------------
# Calculate for each n° of alteration, the mean
allocation <- apply(simul,2,mean)
# Calculate lower CI for bootstrap
leftCI <- apply(simul,2,function(x){left <- quantile(x, 0.025)})
# Calculate upper CI for bootstrap
rightCI <- apply(simul,2,function(x){right <- quantile(x, 0.975)})
# Preapre for column with the counter
row_counter <- 1:length(allocation)
# Put them all together in a data frame
tcga_allocation <- data.frame(row_counter, allocation, leftCI, rightCI, population="TCGA") %>%
tibble::rownames_to_column("group")
# Generate the Plot
# ------------------------------------------------------------------------------
# Generate the plot
# tcga_allocation_img <- tcga_allocation %>%
# ggplot(aes(x=row_counter, y=allocation, fill=group)) +
# geom_bar(stat="identity" , position=position_dodge()) +
# scale_y_continuous(limits = c(0,1)) +
# scale_x_continuous(breaks = row_counter, trans = "reverse") +
# geom_errorbar(aes(ymax=rightCI
# , ymin=leftCI)
# , position=position_dodge(0.9)
# ) +
# ggtitle("Patient Allocation to molecular pathway threatment") +
# annotate("text"
# , x=row_counter
# , y=allocation+0.15
# , label= sapply(allocation*100
# , function(x) paste(c(round(x, digits=2), "%")
# , collapse=" ")
# )
# ) +
# coord_flip()
# Preview
#tcga_allocation_img
# ##############################################################################
# PLOT THEM TOGETHER
# ##############################################################################
merged_allocation <- rbind(shiva_allocation, tcga_allocation) %>%
dplyr::mutate(row_counter = 1:n())
fig4D <- merged_allocation %>%
ggplot(aes(x=group, y=allocation, fill=population)) +
geom_bar(stat="identity" , position=position_dodge()) +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
scale_y_continuous(limits = c(0,.5)) +
ylab(label = "allocation frequency") +
facet_grid(. ~ group, space="free", scales="free") +
#scale_x_discrete(limits = merged_allocation$group) +
geom_errorbar(aes(ymax=rightCI
, ymin=leftCI)
, position="dodge"
) +
ggtitle("Patient Allocation to molecular pathway treatment, considering all the patients") +
geom_text(aes(label = sapply(allocation*100
, function(x) paste(c(round(x, digits=2), "%")
, collapse=" ")
)
) , position = position_dodge(width = 1)
, vjust = -6
) +
theme(axis.text.x=element_blank()
, axis.ticks.x= element_blank()
, axis.title.x = element_blank()
)
#Export as a svg
#img_file = "../Figures/fig4D.svg"
#ggsave(filename=img_file, plot=fig4D, device = "svg")
# Preview image
fig4D
datatable_withbuttons(merged_allocation)
In case of a patient eligible for both molecular pathways treatment, the SHIVA medical board would decide in with cohord to allocate the patient. To allow a better comparison, the following plot separates the patients eligible to both treatments. The y-axes shows the tot number of patients that can be allocated in a molecular pathway treatmet.
# ALLOCATION FUNCTION with not predefined rule
# ------------------
# Create function to perform the following tasks:
# 1. extract a weighted (according to SHIVA population) sample from the population
# 2. Simulate patient allocation according the defined schema
allocate_patients_fn <- function(population, weights=FALSE){
#extract patient data from the panel
if(weights){
patient_lt <- dataExtractor(population
, alterationType = c("copynumber", "mutations")
, collapseByGene = TRUE
, tumor.weights = weights
)
} else {
patient_lt <- dataExtractor(population
, alterationType = c("copynumber", "mutations")
, collapseByGene = TRUE
)
}
#select general dataframe
patient_df <- patient_lt$data
# replace / with _
patient_df[,2] <- gsub("/", "_", patient_df[,2])
# generate a frequency table that indicates, for each patient, elibibility to group allocation, as well as tumor_type
# one patient per row.
patient_allocation_df <- patient_df %>%
dplyr::select(group, case_id, tumor_type) %>%
dplyr::mutate(found = TRUE) %>%
unique() %>%
tidyr::spread(group, found)
# SIMULATE PATIENTS ALLOCATION ACCORDING TO A SPECIFIC ORDER
# ----------------------------------------
# select ONLY PIK3_AKT_mTOR compatible patients
PIK3_AKT_mTOR_assigned <- patient_allocation_df %>%
dplyr::filter(is.na(RAF_MEK)) %>%
dplyr::filter(PIK3_AKT_mTOR == TRUE)
# select ONLY RAF_MEK compatible patients
RAF_MEK_assigned <- patient_allocation_df %>%
dplyr::filter(is.na(PIK3_AKT_mTOR)) %>%
dplyr::filter(RAF_MEK == TRUE)
# Select patients that would be allocated to either
eligible_both <- patient_allocation_df %>%
dplyr::filter(RAF_MEK == TRUE) %>%
dplyr::filter(PIK3_AKT_mTOR == TRUE)
# freq of patient allocation in each treatment group
df <- data.frame(PIK3_AKT_mTOR = nrow(PIK3_AKT_mTOR_assigned) / nrow(patient_allocation_df)
, RAF_MEK = nrow(RAF_MEK_assigned) / nrow(patient_allocation_df)
, eligible_both = nrow(eligible_both) / nrow(patient_allocation_df)
)
return(df)
}
# TCGA
# ------------------------------------------------------------------------------
bootstrapping_df <- replicate(500
, allocate_patients_fn(TCGA_population, tumor_weights)
, simplify=FALSE
) %>% do.call("rbind", .)
# calculae the mean, sd and CI for each treatment group
bootstrappingMeans <- colMeans(bootstrapping_df)
lowerCI <- apply( bootstrapping_df , 2 , function(x) quantile(x, 0.025))
upperCI <- apply( bootstrapping_df , 2 , function(x) quantile(x, 0.975))
# Prepare dataframe for plotting
# ----------------------------
# covert to dataframe
tcga_df <- data.frame(mean=bootstrappingMeans
, lowerSD = NA
, upperSD = NA
, lowerCI = lowerCI
, upperCI = upperCI
) %>%
tibble::rownames_to_column("group_treatment") %>%
dplyr::arrange(mean) %>%
dplyr::mutate(dataset="TCGA")
# SHIVA
# ------------------------------------------------------------------------------
shiva_df <- allocate_patients_fn(SHIVA_population) %>%
tidyr::gather(group_treatment, mean) %>%
dplyr::mutate(lowerSD = NA
, upperSD = NA
, lowerCI = leftCIprop(mean, sum(tumor_weights))
, upperCI = rightCIprop(mean, sum(tumor_weights))
, dataset = "SHIVA"
)
# merge dataframes together
merge_df <- rbind(shiva_df, tcga_df) %>%
dplyr::rename(freq = mean)
# ----------------------------
# PLOT2 - Same side barplot
# barplot, with only the genes in the panel
fig4E <- merge_df %>%
dplyr::mutate(origin=paste(as.character(group_treatment), as.character(dataset), sep=":")) %>%
dplyr::mutate(row_counter = as.integer(factor(dataset, labels = c(0,1)))) %>%
ggplot(aes(x=origin, y=freq, fill=dataset)) +
geom_bar(stat="identity") +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
geom_errorbar(aes(min=lowerCI, max=upperCI)) +
facet_grid(. ~ group_treatment, space="free", scales="free") +
theme(axis.text.x=element_blank()
, axis.ticks.x= element_blank()
, axis.title.x = element_blank()
) +
labs(title="Comparison between expected allocation to molecular pathways")
fig4E <- fig4E + geom_text(aes(x=(row_counter)
, y=upperCI+.05
, label= sapply(freq*100
, function(x) paste(c(round(x, digits=2), "%")
, collapse=" ")
)
)
)
#Export as a svg
#img_file = "../Figures/fig4E.svg"
#ggsave(filename=img_file, plot=fig4E, device = "svg", width=10, height = 8)
# preview image
fig4E
merge_df %>%
dplyr::select(-lowerSD, -upperSD) %>%
datatable_withbuttons
################################################################################
# ADD plot by Drug
################################################################################
# get the numer of samples used for the next computation
n_samples <- coveragePlot(TCGA_population
, alterationType=c("mutations" , "copynumber")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 1
, noPlot=TRUE
, grouping = "drug"
)$Sample[1]
# TCGA
# ------------------------------------------------------------------------------
bootstrapping_df <- replicate(500
, coveragePlot(TCGA_population
, alterationType=c("mutations" , "copynumber")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 1
, noPlot=TRUE
, grouping = "drug"
) %>%
.$plottedTable %>%
data.frame %>%
tibble::rownames_to_column("drug") %>%
dplyr::rename(count=X1) %>%
dplyr::mutate(freq = count/n_samples) %>%
dplyr::select(-count) %>%
dplyr::mutate(drug= gsub( "drug: ", "", drug) ) %>%
tidyr::spread(drug, freq) %>%
dplyr::mutate(Sorafenib = if (exists('Sorafenib', where = .)) Sorafenib else 0) %>%
dplyr::mutate(Dasatinib = if (exists('Dasatinib', where = .)) Dasatinib else 0)
, simplify=FALSE
) %>% do.call("rbind", .)
# calculae the mean, sd and CI for each treatment group
bootstrappingMeans <- colMeans(bootstrapping_df)
lowerCI <- apply( bootstrapping_df , 2 , function(x) quantile(x, 0.025))
upperCI <- apply( bootstrapping_df , 2 , function(x) quantile(x, 0.975))
# Prepare dataframe for plotting
# ----------------------------
# covert to dataframe
tcga_df <- data.frame(mean=bootstrappingMeans
, lowerSD = NA
, upperSD = NA
, lowerCI = lowerCI
, upperCI = upperCI
) %>%
tibble::rownames_to_column("group_treatment") %>%
dplyr::arrange(mean) %>%
dplyr::mutate(dataset="TCGA")
# SHIVA
# ------------------------------------------------------------------------------
shiva_df <- coveragePlot(SHIVA_population
, alterationType=c("mutations" , "copynumber")
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, maxNumAlt = 1
, noPlot=TRUE
, grouping = "drug"
) %>% .$plottedTable %>%
data.frame %>%
tibble::rownames_to_column("drug") %>%
dplyr::rename(count=X1) %>%
dplyr::mutate(freq = count/n_samples) %>%
dplyr::select(-count) %>%
dplyr::mutate(drug= gsub( "drug: ", "", drug) ) %>%
tidyr::spread(drug, freq) %>%
dplyr::mutate(Sorafenib = if (exists('Sorafenib', where = .)) Sorafenib else 0) %>%
dplyr::mutate(Dasatinib = if (exists('Dasatinib', where = .)) Dasatinib else 0) %>%
tidyr::gather(group_treatment, mean) %>%
dplyr::mutate(lowerSD = NA
, upperSD = NA
, lowerCI = map_dbl(mean, ~ leftCIprop(., sum(tumor_weights)))
, upperCI = map_dbl(mean, ~ rightCIprop(., sum(tumor_weights)))
, dataset = "SHIVA"
)
# merge dataframes together
merge_df <- rbind(shiva_df, tcga_df) %>%
dplyr::rename(freq = mean) %>%
mutate_cond(group_treatment %in% "Lapatinib+Trastuzumab", group_treatment = "Lapatinib+\nTrastuzumab")
# ----------------------------
# PLOT2 - Same side barplot
# barplot, with only the genes in the panel
fig4F <- merge_df %>%
dplyr::mutate(origin=paste(as.character(group_treatment), as.character(dataset), sep=":")) %>%
dplyr::mutate(row_counter = as.integer(factor(dataset, labels = c(0,1)))) %>%
ggplot(aes(x=origin, y=freq, fill=dataset)) +
geom_bar(stat="identity") +
scale_fill_manual(values=MYPALETTE(10)[c(10,2)]) +
geom_errorbar(aes(min=lowerCI, max=upperCI)) +
facet_grid(. ~ group_treatment, space="free", scales="free") +
theme(axis.text.x=element_blank()
, axis.ticks.x= element_blank()
, axis.title.x = element_blank()
) +
labs(title="Comparison between expected allocation to molecular pathways")
# +
# coord_flip()
#Export as a svg
#img_file = "../Figures/fig4F.svg"
#ggsave(filename=img_file, plot=fig4F, device = "svg")
# Preview
fig4F
# preview table
merge_df %>%
dplyr::select(-lowerSD, -upperSD) %>%
datatable_withbuttons()
In this section we will calculate the sample size required for the SHIVA trial and our TCGA approximation. We first compare the sample size calculated on both populations on a full design with no granularity on the drug target. Basically a design to answer the question “target therapy VS standard of care”
# Shiva full design
survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = seq(0.6 , 0.9 , 0.1)
, alpha = 0.05
, case.fraction = 0.5
#, tumor.freqs = tumor_freq
) + xlim(c(200,1000))
survPowerSampleSize(TCGA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = seq(0.6 , 0.9 , 0.1)
, alpha = 0.05
, case.fraction = 0.5
, tumor.freqs = tumor_freq
) + xlim(c(200,1000))
tcgaFullDesign <- survPowerSampleSize(TCGA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = seq(0.6 , 0.9 , 0.1)
, alpha = 0.05
, case.fraction = 0.5
, tumor.freqs = tumor_freq
, noPlot = TRUE)[ , c("EligibleSampleSize" , "ScreeningSampleSize")]
shivaFullDesign <- survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = seq(0.6 , 0.9 , 0.1)
, alpha = 0.05
, case.fraction = 0.5
#, tumor.freqs = tumor_freq
, noPlot = TRUE)[ , c("EligibleSampleSize" , "ScreeningSampleSize")]
fullDesignComparison <- data.frame(ScreeningSampleSize_TCGA = tcgaFullDesign$ScreeningSampleSize
, ScreeningSampleSize_SHIVA = shivaFullDesign$ScreeningSampleSize
, EligibleSampleSize = shivaFullDesign$EligibleSampleSize
, Beta = 1 - seq(0.6 , 0.9 , 0.1)
, Power = seq(0.6 , 0.9 , 0.1)
, HR = 0.625)
datatable_withbuttons(fullDesignComparison)
The difference is very small. Just 2
patients at a 80% power.
Now we want to calculate sample size by each drug separately. This in not an optimal design since it relies on the assumption of indipendent studies. From this point on we will use the same value used in the design of the SHIVA trial:
From the original publication:
The objective of the study was to detect a difference in progression-free survival between the treatment groups. Expected 6-month progression-free survival in the control group was 15%. We postulated that the experimental group would have 40% longer progression-free survival than that of the control group (hazard ratio [HR] 0·625). A total of 142 events was needed to detect a statistically significant difference with a type I error rate of 5% and a power of 80% in a two-sided setting. To be able to see these events, we planned to include a total of 200 patients.
shivaByDrug <- survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = 0.8
, alpha = 0.05
, case.fraction = 0.5
, ber=0.85
, fu=1.883
#, tumor.freqs = tumor_freq
, var = "group"
, noPlot=TRUE)
datatable_withbuttons(shivaByDrug)
tcgaByDrug <- survPowerSampleSize(TCGA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = 0.8
, alpha = 0.05
, case.fraction = 0.5
, ber=0.85
, fu=1.883
, tumor.freqs = tumor_freq
, var = "group"
, noPlot=TRUE)
datatable_withbuttons(tcgaByDrug)
With our tool, we are also able to optimize the number of patients needed to be screened in order to assure a minimum of 80% power to all the arms of the study. We call this option priority trial and is basically a cascade design that tries to estimate a total sample size at screening that is sufficient to cover all the arms of the study with the desired power.
We run the TCGA population first.
tcgaSimul <- mclapply(1:100 , function(x) {
survPowerSampleSize(TCGA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = 0.8
, alpha = 0.05
, ber=0.85
, fu=1.883
, case.fraction = 0.5
, collapseByGene=TRUE
, tumor.weights = tumor_weights
, var = "group"
, noPlot=TRUE
, priority.trial=c("PIK3/AKT/mTOR", "RAF/MEK")
, priority.trial.order="optimal")$'HR:0.625 | HR0:1 | Power:0.8'$Summary
} , mc.cores=4)
summaryMat <- tcgaSimul[[1]]
for(i in rownames(summaryMat)){
for(j in colnames(summaryMat)){
vec <- sapply(tcgaSimul , function(x) x[i , j])
summaryMat[i , j] <- paste(mean(vec) ,
paste0( "(" , quantile(vec , 0.025)
, "-"
, quantile(vec , 0.975) , ")"))
}
}
knitr::kable(summaryMat)
RAF/MEK | PIK3/AKT/mTOR | Total | |
---|---|---|---|
Screened | 1507.05 (1182.15-1939.15) | 0 (0-0) | 1507.05 (1182.15-1939.15) |
Eligible | 199 (199-199) | 296.51 (214.95-413.675) | 495.51 (413.95-612.675) |
Not.Eligible | 1012.53 (760.175-1369.525) | 0 (0-0) | 1012.53 (760.175-1369.525) |
The table above represents our cascade design starting from upper left. The first row is the recruitment row: number of new patients that need to be sequenced to obtain the required power. The left and right Confidence Intervals from the bootstrap simuation are indicated in parentesis. Starting by recruiting 1507.05 (1182.15-1939.15) for the most rare drug category (RAF/MEK) we will obtain on average 199 (199-199) eligible patients and discarding 1012.53 (760.175-1369.525) (the former are the patients not eligible to any molecular drug treatments). The non eligible patients of RAF/MEK might be eligible for other drug categories, so we calculate whether within these patients, there is an enough number of eligible patients to reach the required power. In case the number is sufficient, the algorithm indicates how many more patients are necessary to recruit. Based on our calculations no further patientes are necessary.
Finally, the last column indicates the total number of patients screened, eligible, and non eligible.
Starting from the most rare drug, we assure the minimum number of recruited patients that guarantee at least 80% power for each drug and the whole study.
With the SHIVA population we are going to check if the value obtained is compatible with the confidence interval calculated on TCGA data.
shivaSimul <- survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = 0.8
, alpha = 0.05
, ber=0.85
, fu=1.883
, case.fraction = 0.5
, collapseByGene=TRUE
#, tumor.weights = tumor_weights
, var = "group"
, noPlot=TRUE
, priority.trial=c("PIK3/AKT/mTOR", "RAF/MEK")
, priority.trial.order="optimal")$'HR:0.625 | HR0:1 | Power:0.8'$Summary
knitr::kable(shivaSimul)
RAF/MEK | PIK3/AKT/mTOR | Total | |
---|---|---|---|
Screened | 1277 | 0 | 1277 |
Eligible | 199 | 222 | 421 |
Not.Eligible | 857 | 0 | 857 |
The number of recruited patients is compatible.
We analyze now retrospectively the power of the SHIVA study under our framework by using the same number of patients reported in the publication. From text and Table 1 of the SHIVA publication:
In the design of a target therapy VS standard of care trial, we can simply check what is the resulting power of plugging-in 496 in our survPowerSampleSize
function:
phPower <- survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = NULL
, sample.size = 496
, alpha = 0.05
, ber=0.85
, fu=1.883
, case.fraction = 0.5
, noPlot=TRUE)
knitr::kable(phPower)
Var | ScreeningSampleSize | EligibleSampleSize | Beta | Power | HazardRatio |
---|---|---|---|---|---|
Full Design | 496 | 164 | 0.2792024 | 0.7207976 | 0.625 |
A cohort of 496 individuals can guarantee a power of 0.72 with an eligible sample size of 164. The reason for the discrepancy with the eligible size of the original SHIVA study is the lack of the entire Hormone arm that decreases the detected fraction in this analysis. The entire study reaches this power but do not guarantees that each single arm has the same power level.
As mentioned before, the option priority.trial allows us to calculate an optimal sample size so that each arm reaches at least the desired power. Unfortunately, the option is not available yet to calculate post-hoc power given a sample size. We therefore try to calculate the post-hoc power by optimization. The idea is to calculate various sample sizes for different power levels to get a value equal or close enough to our 496.
First, create a function to optimize. This function takes a power level as input and returns the difference between the calculated sample size and 496 in absolute value.
powerOptimizer <- function(mypower){
simul <- suppressWarnings(suppressMessages(
survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = mypower # only moving parameter, the rest is constant
, alpha = 0.05
, ber=0.85
, fu=1.883
, case.fraction = 0.5
, collapseByGene=TRUE
#, tumor.weights = tumor_weights
, var = "group"
, noPlot=TRUE
, priority.trial=c("PIK3/AKT/mTOR", "RAF/MEK")
, priority.trial.order="optimal")))
# Return only the calculated Total Screened sample size
out <- simul[[1]][1]$Summary["Screened" , "Total"]
# Actual value to minimize is the absolute difference (we want to get close to 0)
return( abs(496-out))
}
Let’s try some power values to see how this function behaves.
mypowers <- seq(0.1 , 0.8 , 0.05)
mySampSizeDiff <- sapply(mypowers , powerOptimizer)
plot(mypowers , mySampSizeDiff , pch=20 , col="black")
lines(mypowers , mySampSizeDiff , col="red")
It’s pretty clear that we have a minimum at 0.4 when the sample size calculated is equal to 517. This function is clearly convex in the interval (0-1) of the possible power levels, so the post-hoc power resides between 0.35 and 0.45 (according to our plot points).
For a more robust optimization, we are going to use a linear optimizer. To facilitate the process, we will inizialize a value of 0.4 and lower and upper bounds of 0.35 and 0.45 (as calculated by our graphical method)
Since we are dealing with a one-dimentional problem with box constrains, we are going to use the Brent method.
myPHPoptim <- optim(par = 0.4 , fn = powerOptimizer
, lower = 0.35 , upper = 0.45 , method="Brent")
myPHPoptim
## $par
## [1] 0.4105573
##
## $value
## [1] 2
##
## $counts
## function gradient
## NA NA
##
## $convergence
## [1] 0
##
## $message
## NULL
The value of the slot convergence is telling us that the method worked. The actual power that can be reached by a 496 individuals study is just 41.0557295 for each single arm and it’s the reason why the SHIVA trial probably failed at demonstrating a difference between genomic therapy and standard of care.
By plugging in our value, we can see that:
plugInSampleSize <- survPowerSampleSize(SHIVA_population
, alterationType = c("mutations", "copynumber")
, HR = 0.625
, power = myPHPoptim$par
, alpha = 0.05
, ber=0.85
, fu=1.883
, case.fraction = 0.5
, collapseByGene=TRUE
#, tumor.weights = tumor_weights
, var = "group"
, noPlot=TRUE
, priority.trial=c("PIK3/AKT/mTOR", "RAF/MEK")
, priority.trial.order="optimal")[[1]][1]$Summary
##
## Minimum Eligible Sample Size to reach 41% of power with an HR equal to 0.625 and a HR0 equal to 1 for each group is equal to: 77
knitr::kable(plugInSampleSize)
RAF/MEK | PIK3/AKT/mTOR | Total | |
---|---|---|---|
Screened | 494 | 0 | 494 |
Eligible | 77 | 86 | 163 |
Not.Eligible | 332 | 0 | 332 |
So with an approximated screened sample size of 494 we obtain 163 eligible patients divided in 77 for RAF/MEK pathway arm and 86 for PIK3/AKT/mTOR pathway arm.
For reproducibility purpose:
sessionInfo()
## R version 3.5.0 Patched (2018-06-01 r74839)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] stringr_1.3.1 org.Hs.eg.db_3.6.0
## [3] AnnotationDbi_1.42.1 VariantAnnotation_1.26.0
## [5] Rsamtools_1.32.0 Biostrings_2.48.0
## [7] XVector_0.20.0 SummarizedExperiment_1.10.1
## [9] DelayedArray_0.6.0 BiocParallel_1.14.1
## [11] matrixStats_0.53.1 Biobase_2.40.0
## [13] GenomicRanges_1.32.3 GenomeInfoDb_1.16.0
## [15] IRanges_2.14.10 S4Vectors_0.18.3
## [17] BiocGenerics_0.26.0 RColorBrewer_1.1-2
## [19] dplyr_0.7.5 magrittr_1.5
## [21] purrr_0.2.5 tibble_1.4.2
## [23] gdtools_0.1.7 svglite_1.2.1
## [25] bindrcpp_0.2.2 readxl_1.1.0
## [27] ggplot2_2.2.1 DT_0.4
## [29] knitr_1.20 PrecisionTrialDrawer_0.99.0
## [31] BiocStyle_2.8.2
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 bit64_0.9-7
## [3] progress_1.1.2 httr_1.3.1
## [5] rprojroot_1.3-2 tools_3.5.0
## [7] backports_1.1.2 R6_2.2.2
## [9] DBI_1.0.0 lazyeval_0.2.1
## [11] colorspace_1.3-2 gridExtra_2.3
## [13] tidyselect_0.2.4 prettyunits_1.0.2
## [15] bit_1.1-14 curl_3.2
## [17] compiler_3.5.0 LowMACAAnnotation_0.99.3
## [19] profileModel_0.5-9 rtracklayer_1.40.3
## [21] labeling_0.3 scales_0.5.0
## [23] brglm_0.6.1 readr_1.1.1
## [25] digest_0.6.15 shinyBS_0.61
## [27] rmarkdown_1.10 pkgconfig_2.0.1
## [29] htmltools_0.3.6 BSgenome_1.48.0
## [31] highr_0.7 htmlwidgets_1.2
## [33] rlang_0.2.1 RSQLite_2.1.1
## [35] shiny_1.1.0 bindr_0.1.1
## [37] jsonlite_1.5 crosstalk_1.0.0
## [39] R.oo_1.22.0 RCurl_1.95-4.10
## [41] GenomeInfoDbData_1.1.0 Matrix_1.2-14
## [43] Rcpp_0.12.17 munsell_0.5.0
## [45] R.methodsS3_1.7.1 stringi_1.2.2
## [47] yaml_2.1.19 zlibbioc_1.26.0
## [49] plyr_1.8.4 grid_3.5.0
## [51] blob_1.1.1 promises_1.0.1
## [53] ggrepel_0.8.0 lattice_0.20-35
## [55] GenomicFeatures_1.32.0 hms_0.4.2
## [57] pillar_1.2.3 cgdsr_1.2.10
## [59] reshape2_1.4.3 codetools_0.2-15
## [61] biomaRt_2.36.1 googleVis_0.6.2
## [63] XML_3.98-1.11 glue_1.2.0
## [65] evaluate_0.10.1 data.table_1.11.4
## [67] httpuv_1.4.3 testthat_2.0.0
## [69] cellranger_1.1.0 gtable_0.2.0
## [71] tidyr_0.8.1 assertthat_0.2.0
## [73] mime_0.5 xtable_1.8-2
## [75] later_0.7.3 GenomicAlignments_1.16.0
## [77] memoise_1.1.0
Copyright © 2017 Giorgio Melloni. All rights reserved.