Skip to contents
library(tidyverse)
library(gt)
library(DT)
library(kidsides)

download_sqlite_db(force=TRUE) #downloads to cache, if not already there
con <- kidsides::connect_sqlite_db()

KidSIDES is large!

For most users, it is a bit unwieldy using a ~900 MB database. Even some of 17 tables in the database are pretty large!

KidSIDES table Table size
ade_raw 453.95 MB
ade_nichd 355.16 MB
ade 66.36 MB
ade_nichd_enrichment 65.48 MB
gene_expression 15.63 MB
sider 12.13 MB
event 4.41 MB
ade_null_distribution 2.48 MB
drug_gene 1.12 MB
drug 336.13 kB
grip 147.34 kB
gene 104.05 kB
ryan 58.08 kB
dictionary 32.76 kB
cyp_gene_expression_substrate_risk_information 18.19 kB
atc_raw_map 3.31 kB
ade_null 1.59 kB

A subset of the data (up to 10MB) will be more manageable to work with. This vignette gives a non-exhaustive list of manageable datasets from the KidSIDES database.

Extract smaller datasets by drugs and events

Extracting datasets from KidSIDES requires interacting with standard vocabularies for drugs and events. Drugs are represented by the Anatomical Therapeutic Class vocabulary (Reference from the WHO). Events are encoded in the Medical Dictionary of Regulatory Activities vocabulary (Reference from MedDRA). You can interact with these vocabularies for identifying drugs and events using the PDSportal. This a shiny application for first identifying drugs and events of interest and then viewing their drug safety signals across childhood.

In this document, some example datasets are extracted using a specific drug and event:

drug_ <- 
    tbl(con,"drug") %>% 
    filter(atc_concept_name=="montelukast; oral") %>%  
    collect() %>% 
    pull(atc_concept_name)
drug_id_ <- 
    tbl(con,"drug") %>% 
    filter(atc_concept_name==drug_) %>% 
    collect() %>% 
    pull(atc_concept_id)
event_ <- 
    tbl(con,"event") %>% 
    filter(meddra_concept_name_1=="Suicidal ideation") %>%  
    collect() %>% 
    pull(meddra_concept_name_1)
event_id_ <- 
    tbl(con,"event") %>% 
    filter(meddra_concept_name_1==event_) %>%  
    collect() %>% 
    pull(meddra_concept_id)

drug_
#> [1] "montelukast; oral"
drug_id_
#> [1] 21603356
event_
#> [1] "Suicidal ideation"
event_id_
#> [1] 36919235

Example 1: Drug dataset

Table of all reported drugs with the ATC vocabulary in KidSIDES.

table_ <- "drug"

df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 1088   12
df %>% lobstr::obj_size()
#> 336.13 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
drug atc_concept_id The ATC 5th level OMOP concept identifier. int
drug atc_concept_name The ATC 5th level OMOP concept name. In the ade_nichd_enrichment table, this ATC concept is from any level in the hierarchy. character
drug atc_concept_code The ATC 5th level OMOP concept code. character
drug ndrugreports The number of reports of the drug in Pediatric FAERS. int
drug atc4_concept_name The ATC 4th level OMOP concept name. character
drug atc4_concept_code The ATC 4th level OMOP concept code. character
drug atc3_concept_name The ATC 3rd level OMOP concept name. character
drug atc3_concept_code The ATC 3rd level OMOP concept code. character
drug atc2_concept_name The ATC 2nd level OMOP concept name. character
drug atc2_concept_code The ATC 2nd level OMOP concept code. character
drug atc1_concept_name The ATC 1st level OMOP concept name. character
drug atc1_concept_code The ATC 1st level OMOP concept code. character

Example 2: Event dataset

Table of all co-reported events with the MedDRA vocabulary in KidSIDES.

table_ <- "event"

df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 16941    22
df %>% lobstr::obj_size()
#> 4.41 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
event meddra_concept_name_4 The MedDRA system organ class concept name. character
event meddra_concept_id The MedDRA preferred term OMOP concept identifier. int
event neventreports The number of adverse event reports in Pediatric FAERS. int
event meddra_concept_class_id_1 The MedDRA preferred term concept class identifier. character
event meddra_concept_class_id_2 The MedDRA higher level concept class identifier. character
event meddra_concept_class_id_3 The MedDRA higher level greater term concept class identifier. character
event meddra_concept_class_id_4 The MedDRA system organ class concept class identifier. character
event meddra_concept_code_1 The MedDRA preferred term concept code identifier. character
event meddra_concept_code_2 The MedDRA higher level concept code identifier. character
event meddra_concept_code_3 The MedDRA higher level greater term concept code identifier. character
event meddra_concept_code_4 The MedDRA system organ class concept code identifier. character
event meddra_concept_id_2 The MedDRA higher level concept identifier. int
event meddra_concept_id_3 The MedDRA higher level greater term concept identifier. int
event meddra_concept_id_4 The MedDRA system organ class concept identifier. int
event meddra_concept_name_1 The MedDRA preferred term concept name. Same as 'meddra_concept_name' character
event meddra_concept_name_2 The MedDRA higher level concept name. character
event meddra_concept_name_3 The MedDRA higher level greater term concept name. character
event relationship_id_12 The relationship identifier between columns *1 and *2; should be 'Is a' denoting 1-to-1 mapping. character
event relationship_id_23 The relationship identifier between columns *2 and *3; should be 'Is a' denoting 1-to-1 mapping. character
event relationship_id_34 The relationship identifier between columns *3 and *4; should be 'Is a' denoting 1-to-1 mapping. character
event soc_category The customized category to represent meddra_concept_name_4 events more broadly as used in the manuscript. Developed in consultation with https://admin.new.meddra.org/sites/default/files/guidance/file/intguide_21_0_english.pdf. character
event pediatric_adverse_event Whether this event concept (meddra concept id) was defined by MedDRA 19th edition vocabulary as a pediatric-specific adverse event. One (1) indicates yes and zero (0) indicates no. The list of events were curated from this site: https://www.meddra.org/paediatric-and-gender-adverse-event-term-lists. int

Example 3: Drug safety report datasets

Table of report characteristics for drugs reports with events in KidSIDES.

table_ <- "ade_raw"

#dataset size for most frequent drug
dplyr::tbl(con,table_) %>% 
    dplyr::filter(atc_concept_id=="21603929") %>% 
    dplyr::collect() %>% 
    lobstr::obj_size()
#> 9.97 MB

#dataset size for least frequent drug
dplyr::tbl(con,table_) %>% 
    dplyr::filter(atc_concept_id=="21600407") %>% 
    dplyr::collect() %>% 
    lobstr::obj_size()
#> 4.21 kB

#datasets using pre-selected drugs and events
df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::filter(
        atc_concept_id==drug_id_
    ) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 28300    23
df %>% lobstr::obj_size()
#> 5.69 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::filter(
        atc_concept_id==drug_id_ &
            meddra_concept_id==event_id_
    ) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 505  23
df %>% lobstr::obj_size()
#> 147.40 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
ade_raw safetyreportid The unique identifier for the report. character
ade_raw ade Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. character
ade_raw atc_concept_id The ATC 5th level OMOP concept identifier. int
ade_raw meddra_concept_id The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. int
ade_raw nichd This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. character
ade_raw sex The reported sex. character
ade_raw reporter_qualification The type of reporter. character
ade_raw receive_date The date the report was first submitted. date
ade_raw XA GAM covariate name for the ATC 1st level concept name 'ALIMENTARY TRACT AND METABOLISM' float
ade_raw XB GAM covariate name for the ATC 1st level concept name 'BLOOD AND BLOOD FORMING ORGANS' float
ade_raw XC GAM covariate name for the ATC 1st level concept name 'CARDIOVASCULAR SYSTEM' float
ade_raw XD GAM covariate name for the ATC 1st level concept name 'DERMATOLOGICALS' float
ade_raw XG GAM covariate name for the ATC 1st level concept name 'GENITO URINARY SYSTEM AND SEX HORMONES' float
ade_raw XH GAM covariate name for the ATC 1st level concept name 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS' float
ade_raw XJ GAM covariate name for the ATC 1st level concept name 'ANTIINFECTIVES FOR SYSTEMIC USE' float
ade_raw XL GAM covariate name for the ATC 1st level concept name 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS' float
ade_raw XM GAM covariate name for the ATC 1st level concept name 'MUSCULO-SKELETAL SYSTEM' float
ade_raw XN GAM covariate name for the ATC 1st level concept name 'NERVOUS SYSTEM' float
ade_raw XP GAM covariate name for the ATC 1st level concept name 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS' float
ade_raw XR GAM covariate name for the ATC 1st level concept name 'RESPIRATORY SYSTEM' float
ade_raw XS GAM covariate name for the ATC 1st level concept name 'SENSORY ORGANS' float
ade_raw XV GAM covariate name for the ATC 1st level concept name 'VARIOUS' float
ade_raw polypharmacy The number of drugs reported. int

Example 4: Pediatric drug safety signal dataset

Table of drug-event observations including signal chharacteristics in KidSIDES.

table_ <- "ade"

#Significant signals
df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::filter(gt_null_99==1) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 19438     9
df %>% lobstr::obj_size()
#> 2.81 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
ade ade Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. character
ade atc_concept_id The ATC 5th level OMOP concept identifier. int
ade meddra_concept_id The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. int
ade cluster_id The identifier for the cluster group assigned to a drug-event by our data-driven clustering approach. See the manuscript's methods for details. character
ade gt_null_statistic The boolean value indicating whether at least one stage's score was greater than nominal significance (the 90 percent confidence interval was above 0). float
ade gt_null_99 The boolean value indicating whether at least one stage's score was greater than significance by the null model, as referenced in the paper (the score was greater than the 99th percentile of the null distribution of randomly co-reported drugs and events). float
ade max_score_nichd The child development stage that had the highest risk score for the drug-event. float
ade cluster_name The dynamics name given to the identfier of a cluster group. This is descriptive of the risk trend across stages, from birth through adolescence. character
ade ade_nreports The number of reports of the drug and event co-occurring character

Example 5: Pediatric drug safety signal time series dataset

Table of drug safety signals across childhood in KidSIDES.

table_ <- "ade_nichd"

df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::filter(ade_name==paste0(drug_," and ",event_)) %>% 
    dplyr::collect()

df %>% dim()
#> [1]  7 13
df %>% lobstr::obj_size()
#> 3.75 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::filter(atc_concept_id==drug_id_) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 17360    13
df %>% lobstr::obj_size()
#> 1.90 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
ade_nichd atc_concept_id The ATC 5th level OMOP concept identifier. int
ade_nichd meddra_concept_id The MedDRA preferred term OMOP concept identifier. int
ade_nichd ade Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. character
ade_nichd nichd This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. character
ade_nichd gam_score The risk coefficient from a drug-event GAM, given to each nichd stage. It is the log odds risk of event occurrence given the data as specified in the manuscript. float
ade_nichd norm The normalized risk coefficient, between 0 and 1, across stages for a drug-event. This preserves the risk trend but constrains the range of the risk scores between 0 and 1. float
ade_nichd gam_score_se The standard deviation of the risk coefficient. float
ade_nichd gam_score_90mse The 90 percent lower bounded risk score using the formula gam_score - (1.645*gam_score_se). float
ade_nichd gam_score_90pse The 90 percent upper bounded risk score using the formula gam_score + (1.645*gam_score_se). float
ade_nichd D The number of reports of the drug at the child development stage int
ade_nichd E The number of reports of the event at the child development stage int
ade_nichd DE The number of reports of the drug & event at the child development stage int
ade_nichd ade_name The named identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_name and the meddra_concept_name. character

Example 6: Significant drug safety signal class enrichments

Table of drug and event classes for significant drug safety signal enrichment in KidSIDES.

table_ <- "ade_nichd_enrichment"

df <- 
    tbl(con,table_) %>% 
    dplyr::filter(
        is.na(atc_concept_class_id) &
        meddra_concept_class_id=="SOC"
    ) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 189  15
df %>% lobstr::obj_size()
#> 25.38 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
iter <- 
    dplyr::tbl(con,table_) %>% 
    select(atc_concept_class_id,meddra_concept_class_id) %>% 
    distinct() %>% 
    collect() %>% 
    drop_na()

gt <- purrr::map(1:nrow(iter),~{
    df <- 
        tbl(con,table_) %>% 
        dplyr::filter(
        atc_concept_class_id==!!iter[.x,
                                   "atc_concept_class_id",
                                   drop=T] &
        meddra_concept_class_id==!!iter[.x,
                                      "meddra_concept_class_id",
                                      drop=T]
      ) %>% 
      dplyr::collect()
    
    size <- df %>% 
        lobstr::obj_size() %>% 
        as.numeric()
    
    dplyr::tibble(
        "atc_concept_class_id" = iter[.x,
                                   "atc_concept_class_id",
                                   drop=T],
        "meddra_concept_class_id" = iter[.x,
                                      "meddra_concept_class_id",
                                      drop=T],
         "Object size" = size %>% prettyunits::pretty_bytes()
       )

}) %>% 
    bind_rows() %>% 
    gt()

gt
atc_concept_class_id meddra_concept_class_id Object size
ATC1 SOC 261.36 kB
ATC2 SOC 985.78 kB
ATC3 SOC 1.54 MB
ATC4 SOC 2.36 MB
ATC5 SOC 3.56 MB
ATC1 HLGT 1.56 MB
ATC2 HLGT 2.89 MB
ATC3 HLGT 3.48 MB
ATC4 HLGT 4.16 MB
ATC5 HLGT 4.74 MB
ATC1 HLT 3.16 MB
ATC2 HLT 4.18 MB
ATC3 HLT 4.55 MB
ATC4 HLT 4.96 MB
ATC5 HLT 5.22 MB
ATC1 PT 3.78 MB
ATC2 PT 3.93 MB
ATC3 PT 4.03 MB
ATC4 PT 4.16 MB
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
ade_nichd_enrichment category The category on enrichment. Either a MedDRA adverse event class, ATC drug class, or a combination of ATC and MedDRA classes. These categories are included in the manuscript results associated to this database. character
ade_nichd_enrichment atc_concept_name The ATC concept identifier. character
ade_nichd_enrichment meddra_concept_name The MedDRA concept identifier. character
ade_nichd_enrichment nichd This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. character
ade_nichd_enrichment atc_concept_class_id The ATC concept class identifier. character
ade_nichd_enrichment meddra_concept_class_id The MedDRA concept class identifier. character
ade_nichd_enrichment a The number of significant, by the null model, drug-events in both the stage and ATC/MedDRA concept category. int
ade_nichd_enrichment b The number of significant, by the null model, drug-events in the stage and not in the ATC/MedDRA concept category. int
ade_nichd_enrichment c The number of significant, by the null model, drug-events not in the stage but in the ATC/MedDRA concept category. int
ade_nichd_enrichment d The number of significant, by the null model, drug-events not in the stage and not in the ATC/MedDRA concept category. int
ade_nichd_enrichment lwr The 95% lower bound of the odds ratio. float
ade_nichd_enrichment odds_ratio The odds ratio for the category and stage enrichment. float
ade_nichd_enrichment upr The 95% lower bound of the odds ratio. float
ade_nichd_enrichment pvalue The p-value from the fisher exact test. float
ade_nichd_enrichment fdr The FDR corrected pvalue. float

Example 7: Gene expression across childhood

Table of gene expression across childhood in KidSIDES.

table_ <- "gene_expression"

df <- 
    dplyr::tbl(con,table_) %>% 
    dplyr::collect() %>% 
    dplyr::select(sample,nichd,probe,gene_symbol,prediction) %>% 
    dplyr::collect()

df %>% dim()
#> [1] 194054      5
df %>% lobstr::obj_size()
#> 7.87 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
gt <- 
    dplyr::tbl(con,"dictionary") %>% 
    dplyr::filter(table==table_) %>% 
    gt()

gt
table field description type
gene_expression sample The GEO sample identifier used in the GSE datasets. character
gene_expression nichd This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. character
gene_expression probe The probe identifier on the affymetrix gene chip. character
gene_expression gene_symbol The gene symbol identifier from joining the uniprot identifier to the entrez identifer from the microarray platform database package within Bioconductor. character
gene_expression actual The sample value from the stage-association GLM. See the manuscript for details. float
gene_expression prediction The sample predicted value from the stage-association GLM. See the manuscript for details. float
gene_expression residual The sample residual (actual - predicted) value from the stage-association GLM. See the manuscript for details. float
gene_expression fdr The F test FDR corrected pvalue. float
gene_expression f_statistic The F test, as summarized from the glm, statistic. float
gene_expression f_pvalue The F test, as summarized from the glm, pvalue. float

Check out the references for dataset details

It is out of scope for this vignette to detail the information in each dataset. Please contact Nick directly by sending an email, posting an issue on Github, tooting at Fosstodon, or sending a message via Carrier Pigeon. The best source is the Med paper for more information on the pediatric drug safety data. Hopefully these examples show how to extract manageable datasets for exploring what KidSIDES has to offer!

References

Giangreco, Nicholas. Mind the developmental gap: Identifying adverse drug effects across childhood to evaluate biological mechanisms from growth and development. 2022. Columbia University, PhD dissertation.

Giangreco NP, Tatonetti NP. A database of pediatric drug effects to evaluate ontogenic mechanisms from child growth and development. Med (N Y). 2022 Aug 12;3(8):579-595.e7. doi: 10.1016/j.medj.2022.06.001. Epub 2022 Jun 24. PMID: 35752163; PMCID: PMC9378670.

kidsides::disconnect_sqlite_db(con)