Extracting datasets from KidSIDES
Source:vignettes/articles/extracting_datasets.Rmd
extracting_datasets.Rmd
library(tidyverse)
library(gt)
library(DT)
library(kidsides)
download_sqlite_db(force=TRUE) #downloads to cache, if not already there
con <- kidsides::connect_sqlite_db()
KidSIDES is large!
For most users, it is a bit unwieldy using a ~900 MB database. Even some of 17 tables in the database are pretty large!
KidSIDES table | Table size |
---|---|
ade_raw | 453.95 MB |
ade_nichd | 355.16 MB |
ade | 66.36 MB |
ade_nichd_enrichment | 65.48 MB |
gene_expression | 15.63 MB |
sider | 12.13 MB |
event | 4.41 MB |
ade_null_distribution | 2.48 MB |
drug_gene | 1.12 MB |
drug | 336.13 kB |
grip | 147.34 kB |
gene | 104.05 kB |
ryan | 58.08 kB |
dictionary | 32.76 kB |
cyp_gene_expression_substrate_risk_information | 18.19 kB |
atc_raw_map | 3.31 kB |
ade_null | 1.59 kB |
A subset of the data (up to 10MB) will be more manageable to work with. This vignette gives a non-exhaustive list of manageable datasets from the KidSIDES database.
Extract smaller datasets by drugs and events
Extracting datasets from KidSIDES requires interacting with standard vocabularies for drugs and events. Drugs are represented by the Anatomical Therapeutic Class vocabulary (Reference from the WHO). Events are encoded in the Medical Dictionary of Regulatory Activities vocabulary (Reference from MedDRA). You can interact with these vocabularies for identifying drugs and events using the PDSportal. This a shiny application for first identifying drugs and events of interest and then viewing their drug safety signals across childhood.
In this document, some example datasets are extracted using a specific drug and event:
drug_ <-
tbl(con,"drug") %>%
filter(atc_concept_name=="montelukast; oral") %>%
collect() %>%
pull(atc_concept_name)
drug_id_ <-
tbl(con,"drug") %>%
filter(atc_concept_name==drug_) %>%
collect() %>%
pull(atc_concept_id)
event_ <-
tbl(con,"event") %>%
filter(meddra_concept_name_1=="Suicidal ideation") %>%
collect() %>%
pull(meddra_concept_name_1)
event_id_ <-
tbl(con,"event") %>%
filter(meddra_concept_name_1==event_) %>%
collect() %>%
pull(meddra_concept_id)
drug_
#> [1] "montelukast; oral"
drug_id_
#> [1] 21603356
event_
#> [1] "Suicidal ideation"
event_id_
#> [1] 36919235
Example 1: Drug dataset
Table of all reported drugs with the ATC vocabulary in KidSIDES.
table_ <- "drug"
df <-
dplyr::tbl(con,table_) %>%
dplyr::collect()
df %>% dim()
#> [1] 1088 12
df %>% lobstr::obj_size()
#> 336.13 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
drug | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
drug | atc_concept_name | The ATC 5th level OMOP concept name. In the ade_nichd_enrichment table, this ATC concept is from any level in the hierarchy. | character |
drug | atc_concept_code | The ATC 5th level OMOP concept code. | character |
drug | ndrugreports | The number of reports of the drug in Pediatric FAERS. | int |
drug | atc4_concept_name | The ATC 4th level OMOP concept name. | character |
drug | atc4_concept_code | The ATC 4th level OMOP concept code. | character |
drug | atc3_concept_name | The ATC 3rd level OMOP concept name. | character |
drug | atc3_concept_code | The ATC 3rd level OMOP concept code. | character |
drug | atc2_concept_name | The ATC 2nd level OMOP concept name. | character |
drug | atc2_concept_code | The ATC 2nd level OMOP concept code. | character |
drug | atc1_concept_name | The ATC 1st level OMOP concept name. | character |
drug | atc1_concept_code | The ATC 1st level OMOP concept code. | character |
Example 2: Event dataset
Table of all co-reported events with the MedDRA vocabulary in KidSIDES.
table_ <- "event"
df <-
dplyr::tbl(con,table_) %>%
dplyr::collect()
df %>% dim()
#> [1] 16941 22
df %>% lobstr::obj_size()
#> 4.41 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
event | meddra_concept_name_4 | The MedDRA system organ class concept name. | character |
event | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. | int |
event | neventreports | The number of adverse event reports in Pediatric FAERS. | int |
event | meddra_concept_class_id_1 | The MedDRA preferred term concept class identifier. | character |
event | meddra_concept_class_id_2 | The MedDRA higher level concept class identifier. | character |
event | meddra_concept_class_id_3 | The MedDRA higher level greater term concept class identifier. | character |
event | meddra_concept_class_id_4 | The MedDRA system organ class concept class identifier. | character |
event | meddra_concept_code_1 | The MedDRA preferred term concept code identifier. | character |
event | meddra_concept_code_2 | The MedDRA higher level concept code identifier. | character |
event | meddra_concept_code_3 | The MedDRA higher level greater term concept code identifier. | character |
event | meddra_concept_code_4 | The MedDRA system organ class concept code identifier. | character |
event | meddra_concept_id_2 | The MedDRA higher level concept identifier. | int |
event | meddra_concept_id_3 | The MedDRA higher level greater term concept identifier. | int |
event | meddra_concept_id_4 | The MedDRA system organ class concept identifier. | int |
event | meddra_concept_name_1 | The MedDRA preferred term concept name. Same as 'meddra_concept_name' | character |
event | meddra_concept_name_2 | The MedDRA higher level concept name. | character |
event | meddra_concept_name_3 | The MedDRA higher level greater term concept name. | character |
event | relationship_id_12 | The relationship identifier between columns *1 and *2; should be 'Is a' denoting 1-to-1 mapping. | character |
event | relationship_id_23 | The relationship identifier between columns *2 and *3; should be 'Is a' denoting 1-to-1 mapping. | character |
event | relationship_id_34 | The relationship identifier between columns *3 and *4; should be 'Is a' denoting 1-to-1 mapping. | character |
event | soc_category | The customized category to represent meddra_concept_name_4 events more broadly as used in the manuscript. Developed in consultation with https://admin.new.meddra.org/sites/default/files/guidance/file/intguide_21_0_english.pdf. | character |
event | pediatric_adverse_event | Whether this event concept (meddra concept id) was defined by MedDRA 19th edition vocabulary as a pediatric-specific adverse event. One (1) indicates yes and zero (0) indicates no. The list of events were curated from this site: https://www.meddra.org/paediatric-and-gender-adverse-event-term-lists. | int |
Example 3: Drug safety report datasets
Table of report characteristics for drugs reports with events in KidSIDES.
table_ <- "ade_raw"
#dataset size for most frequent drug
dplyr::tbl(con,table_) %>%
dplyr::filter(atc_concept_id=="21603929") %>%
dplyr::collect() %>%
lobstr::obj_size()
#> 9.97 MB
#dataset size for least frequent drug
dplyr::tbl(con,table_) %>%
dplyr::filter(atc_concept_id=="21600407") %>%
dplyr::collect() %>%
lobstr::obj_size()
#> 4.21 kB
#datasets using pre-selected drugs and events
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(
atc_concept_id==drug_id_
) %>%
dplyr::collect()
df %>% dim()
#> [1] 28300 23
df %>% lobstr::obj_size()
#> 5.69 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(
atc_concept_id==drug_id_ &
meddra_concept_id==event_id_
) %>%
dplyr::collect()
df %>% dim()
#> [1] 505 23
df %>% lobstr::obj_size()
#> 147.40 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
ade_raw | safetyreportid | The unique identifier for the report. | character |
ade_raw | ade | Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. | character |
ade_raw | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
ade_raw | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. | int |
ade_raw | nichd | This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. | character |
ade_raw | sex | The reported sex. | character |
ade_raw | reporter_qualification | The type of reporter. | character |
ade_raw | receive_date | The date the report was first submitted. | date |
ade_raw | XA | GAM covariate name for the ATC 1st level concept name 'ALIMENTARY TRACT AND METABOLISM' | float |
ade_raw | XB | GAM covariate name for the ATC 1st level concept name 'BLOOD AND BLOOD FORMING ORGANS' | float |
ade_raw | XC | GAM covariate name for the ATC 1st level concept name 'CARDIOVASCULAR SYSTEM' | float |
ade_raw | XD | GAM covariate name for the ATC 1st level concept name 'DERMATOLOGICALS' | float |
ade_raw | XG | GAM covariate name for the ATC 1st level concept name 'GENITO URINARY SYSTEM AND SEX HORMONES' | float |
ade_raw | XH | GAM covariate name for the ATC 1st level concept name 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS' | float |
ade_raw | XJ | GAM covariate name for the ATC 1st level concept name 'ANTIINFECTIVES FOR SYSTEMIC USE' | float |
ade_raw | XL | GAM covariate name for the ATC 1st level concept name 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS' | float |
ade_raw | XM | GAM covariate name for the ATC 1st level concept name 'MUSCULO-SKELETAL SYSTEM' | float |
ade_raw | XN | GAM covariate name for the ATC 1st level concept name 'NERVOUS SYSTEM' | float |
ade_raw | XP | GAM covariate name for the ATC 1st level concept name 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS' | float |
ade_raw | XR | GAM covariate name for the ATC 1st level concept name 'RESPIRATORY SYSTEM' | float |
ade_raw | XS | GAM covariate name for the ATC 1st level concept name 'SENSORY ORGANS' | float |
ade_raw | XV | GAM covariate name for the ATC 1st level concept name 'VARIOUS' | float |
ade_raw | polypharmacy | The number of drugs reported. | int |
Example 4: Pediatric drug safety signal dataset
Table of drug-event observations including signal chharacteristics in KidSIDES.
table_ <- "ade"
#Significant signals
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(gt_null_99==1) %>%
dplyr::collect()
df %>% dim()
#> [1] 19438 9
df %>% lobstr::obj_size()
#> 2.81 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
ade | ade | Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. | character |
ade | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
ade | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. | int |
ade | cluster_id | The identifier for the cluster group assigned to a drug-event by our data-driven clustering approach. See the manuscript's methods for details. | character |
ade | gt_null_statistic | The boolean value indicating whether at least one stage's score was greater than nominal significance (the 90 percent confidence interval was above 0). | float |
ade | gt_null_99 | The boolean value indicating whether at least one stage's score was greater than significance by the null model, as referenced in the paper (the score was greater than the 99th percentile of the null distribution of randomly co-reported drugs and events). | float |
ade | max_score_nichd | The child development stage that had the highest risk score for the drug-event. | float |
ade | cluster_name | The dynamics name given to the identfier of a cluster group. This is descriptive of the risk trend across stages, from birth through adolescence. | character |
ade | ade_nreports | The number of reports of the drug and event co-occurring | character |
Example 5: Pediatric drug safety signal time series dataset
Table of drug safety signals across childhood in KidSIDES.
table_ <- "ade_nichd"
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(ade_name==paste0(drug_," and ",event_)) %>%
dplyr::collect()
df %>% dim()
#> [1] 7 13
df %>% lobstr::obj_size()
#> 3.75 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(atc_concept_id==drug_id_) %>%
dplyr::collect()
df %>% dim()
#> [1] 17360 13
df %>% lobstr::obj_size()
#> 1.90 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
ade_nichd | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
ade_nichd | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. | int |
ade_nichd | ade | Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. | character |
ade_nichd | nichd | This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. | character |
ade_nichd | gam_score | The risk coefficient from a drug-event GAM, given to each nichd stage. It is the log odds risk of event occurrence given the data as specified in the manuscript. | float |
ade_nichd | norm | The normalized risk coefficient, between 0 and 1, across stages for a drug-event. This preserves the risk trend but constrains the range of the risk scores between 0 and 1. | float |
ade_nichd | gam_score_se | The standard deviation of the risk coefficient. | float |
ade_nichd | gam_score_90mse | The 90 percent lower bounded risk score using the formula gam_score - (1.645*gam_score_se). | float |
ade_nichd | gam_score_90pse | The 90 percent upper bounded risk score using the formula gam_score + (1.645*gam_score_se). | float |
ade_nichd | D | The number of reports of the drug at the child development stage | int |
ade_nichd | E | The number of reports of the event at the child development stage | int |
ade_nichd | DE | The number of reports of the drug & event at the child development stage | int |
ade_nichd | ade_name | The named identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_name and the meddra_concept_name. | character |
Example 6: Significant drug safety signal class enrichments
Table of drug and event classes for significant drug safety signal enrichment in KidSIDES.
table_ <- "ade_nichd_enrichment"
df <-
tbl(con,table_) %>%
dplyr::filter(
is.na(atc_concept_class_id) &
meddra_concept_class_id=="SOC"
) %>%
dplyr::collect()
df %>% dim()
#> [1] 189 15
df %>% lobstr::obj_size()
#> 25.38 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
iter <-
dplyr::tbl(con,table_) %>%
select(atc_concept_class_id,meddra_concept_class_id) %>%
distinct() %>%
collect() %>%
drop_na()
gt <- purrr::map(1:nrow(iter),~{
df <-
tbl(con,table_) %>%
dplyr::filter(
atc_concept_class_id==!!iter[.x,
"atc_concept_class_id",
drop=T] &
meddra_concept_class_id==!!iter[.x,
"meddra_concept_class_id",
drop=T]
) %>%
dplyr::collect()
size <- df %>%
lobstr::obj_size() %>%
as.numeric()
dplyr::tibble(
"atc_concept_class_id" = iter[.x,
"atc_concept_class_id",
drop=T],
"meddra_concept_class_id" = iter[.x,
"meddra_concept_class_id",
drop=T],
"Object size" = size %>% prettyunits::pretty_bytes()
)
}) %>%
bind_rows() %>%
gt()
gt
atc_concept_class_id | meddra_concept_class_id | Object size |
---|---|---|
ATC1 | SOC | 261.36 kB |
ATC2 | SOC | 985.78 kB |
ATC3 | SOC | 1.54 MB |
ATC4 | SOC | 2.36 MB |
ATC5 | SOC | 3.56 MB |
ATC1 | HLGT | 1.56 MB |
ATC2 | HLGT | 2.89 MB |
ATC3 | HLGT | 3.48 MB |
ATC4 | HLGT | 4.16 MB |
ATC5 | HLGT | 4.74 MB |
ATC1 | HLT | 3.16 MB |
ATC2 | HLT | 4.18 MB |
ATC3 | HLT | 4.55 MB |
ATC4 | HLT | 4.96 MB |
ATC5 | HLT | 5.22 MB |
ATC1 | PT | 3.78 MB |
ATC2 | PT | 3.93 MB |
ATC3 | PT | 4.03 MB |
ATC4 | PT | 4.16 MB |
table | field | description | type |
---|---|---|---|
ade_nichd_enrichment | category | The category on enrichment. Either a MedDRA adverse event class, ATC drug class, or a combination of ATC and MedDRA classes. These categories are included in the manuscript results associated to this database. | character |
ade_nichd_enrichment | atc_concept_name | The ATC concept identifier. | character |
ade_nichd_enrichment | meddra_concept_name | The MedDRA concept identifier. | character |
ade_nichd_enrichment | nichd | This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. | character |
ade_nichd_enrichment | atc_concept_class_id | The ATC concept class identifier. | character |
ade_nichd_enrichment | meddra_concept_class_id | The MedDRA concept class identifier. | character |
ade_nichd_enrichment | a | The number of significant, by the null model, drug-events in both the stage and ATC/MedDRA concept category. | int |
ade_nichd_enrichment | b | The number of significant, by the null model, drug-events in the stage and not in the ATC/MedDRA concept category. | int |
ade_nichd_enrichment | c | The number of significant, by the null model, drug-events not in the stage but in the ATC/MedDRA concept category. | int |
ade_nichd_enrichment | d | The number of significant, by the null model, drug-events not in the stage and not in the ATC/MedDRA concept category. | int |
ade_nichd_enrichment | lwr | The 95% lower bound of the odds ratio. | float |
ade_nichd_enrichment | odds_ratio | The odds ratio for the category and stage enrichment. | float |
ade_nichd_enrichment | upr | The 95% lower bound of the odds ratio. | float |
ade_nichd_enrichment | pvalue | The p-value from the fisher exact test. | float |
ade_nichd_enrichment | fdr | The FDR corrected pvalue. | float |
Example 7: Gene expression across childhood
Table of gene expression across childhood in KidSIDES.
table_ <- "gene_expression"
df <-
dplyr::tbl(con,table_) %>%
dplyr::collect() %>%
dplyr::select(sample,nichd,probe,gene_symbol,prediction) %>%
dplyr::collect()
df %>% dim()
#> [1] 194054 5
df %>% lobstr::obj_size()
#> 7.87 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
gene_expression | sample | The GEO sample identifier used in the GSE datasets. | character |
gene_expression | nichd | This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. | character |
gene_expression | probe | The probe identifier on the affymetrix gene chip. | character |
gene_expression | gene_symbol | The gene symbol identifier from joining the uniprot identifier to the entrez identifer from the microarray platform database package within Bioconductor. | character |
gene_expression | actual | The sample value from the stage-association GLM. See the manuscript for details. | float |
gene_expression | prediction | The sample predicted value from the stage-association GLM. See the manuscript for details. | float |
gene_expression | residual | The sample residual (actual - predicted) value from the stage-association GLM. See the manuscript for details. | float |
gene_expression | fdr | The F test FDR corrected pvalue. | float |
gene_expression | f_statistic | The F test, as summarized from the glm, statistic. | float |
gene_expression | f_pvalue | The F test, as summarized from the glm, pvalue. | float |
Check out the references for dataset details
It is out of scope for this vignette to detail the information in each dataset. Please contact Nick directly by sending an email, posting an issue on Github, tooting at Fosstodon, or sending a message via Carrier Pigeon. The best source is the Med paper for more information on the pediatric drug safety data. Hopefully these examples show how to extract manageable datasets for exploring what KidSIDES has to offer!
References
Giangreco, Nicholas. Mind the developmental gap: Identifying adverse drug effects across childhood to evaluate biological mechanisms from growth and development. 2022. Columbia University, PhD dissertation.
Giangreco NP, Tatonetti NP. A database of pediatric drug effects to evaluate ontogenic mechanisms from child growth and development. Med (N Y). 2022 Aug 12;3(8):579-595.e7. doi: 10.1016/j.medj.2022.06.001. Epub 2022 Jun 24. PMID: 35752163; PMCID: PMC9378670.
kidsides::disconnect_sqlite_db(con)