Extracting datasets from KidSIDES
Source:vignettes/articles/extracting_datasets.Rmd
extracting_datasets.Rmd
library(tidyverse)
library(gt)
library(DT)
library(kidsides)
download_sqlite_db(force=TRUE) #downloads to cache, if not already there
con <- kidsides::connect_sqlite_db()
KidSIDES is large!
For most users, it is a bit unwieldy using a ~900 MB database. Even some of 17 tables in the database are pretty large!
KidSIDES table | Table size |
---|---|
ade_raw | 453.95 MB |
ade_nichd | 355.16 MB |
ade | 66.36 MB |
ade_nichd_enrichment | 65.48 MB |
gene_expression | 15.63 MB |
sider | 12.13 MB |
event | 4.41 MB |
ade_null_distribution | 2.48 MB |
drug_gene | 1.12 MB |
drug | 336.13 kB |
grip | 147.34 kB |
gene | 104.05 kB |
ryan | 58.08 kB |
dictionary | 32.76 kB |
cyp_gene_expression_substrate_risk_information | 18.19 kB |
atc_raw_map | 3.31 kB |
ade_null | 1.59 kB |
A subset of the data (up to 10MB) will be more manageable to work with. This vignette gives a non-exhaustive list of manageable datasets from the KidSIDES database.
Extract smaller datasets by drugs and events
Extracting datasets from KidSIDES requires interacting with standard vocabularies for drugs and events. Drugs are represented by the Anatomical Therapeutic Class vocabulary (Reference from the WHO). Events are encoded in the Medical Dictionary of Regulatory Activities vocabulary (Reference from MedDRA). You can interact with these vocabularies for identifying drugs and events using the PDSportal. This a shiny application for first identifying drugs and events of interest and then viewing their drug safety signals across childhood.
In this document, some example datasets are extracted using a specific drug and event:
drug_ <-
tbl(con,"drug") %>%
filter(atc_concept_name=="montelukast; oral") %>%
collect() %>%
pull(atc_concept_name)
drug_id_ <-
tbl(con,"drug") %>%
filter(atc_concept_name==drug_) %>%
collect() %>%
pull(atc_concept_id)
event_ <-
tbl(con,"event") %>%
filter(meddra_concept_name_1=="Suicidal ideation") %>%
collect() %>%
pull(meddra_concept_name_1)
event_id_ <-
tbl(con,"event") %>%
filter(meddra_concept_name_1==event_) %>%
collect() %>%
pull(meddra_concept_id)
drug_
#> [1] "montelukast; oral"
drug_id_
#> [1] 21603356
event_
#> [1] "Suicidal ideation"
event_id_
#> [1] 36919235
Example 1: Drug dataset
Table of all reported drugs with the ATC vocabulary in KidSIDES.
table_ <- "drug"
df <-
dplyr::tbl(con,table_) %>%
dplyr::collect()
df %>% dim()
#> [1] 1088 12
df %>% lobstr::obj_size()
#> 336.13 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
drug | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
drug | atc_concept_name | The ATC 5th level OMOP concept name. In the ade_nichd_enrichment table, this ATC concept is from any level in the hierarchy. | character |
drug | atc_concept_code | The ATC 5th level OMOP concept code. | character |
drug | ndrugreports | The number of reports of the drug in Pediatric FAERS. | int |
drug | atc4_concept_name | The ATC 4th level OMOP concept name. | character |
drug | atc4_concept_code | The ATC 4th level OMOP concept code. | character |
drug | atc3_concept_name | The ATC 3rd level OMOP concept name. | character |
drug | atc3_concept_code | The ATC 3rd level OMOP concept code. | character |
drug | atc2_concept_name | The ATC 2nd level OMOP concept name. | character |
drug | atc2_concept_code | The ATC 2nd level OMOP concept code. | character |
drug | atc1_concept_name | The ATC 1st level OMOP concept name. | character |
drug | atc1_concept_code | The ATC 1st level OMOP concept code. | character |
Example 2: Event dataset
Table of all co-reported events with the MedDRA vocabulary in KidSIDES.
table_ <- "event"
df <-
dplyr::tbl(con,table_) %>%
dplyr::collect()
df %>% dim()
#> [1] 16941 22
df %>% lobstr::obj_size()
#> 4.41 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
event | meddra_concept_name_4 | The MedDRA system organ class concept name. | character |
event | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. | int |
event | neventreports | The number of adverse event reports in Pediatric FAERS. | int |
event | meddra_concept_class_id_1 | The MedDRA preferred term concept class identifier. | character |
event | meddra_concept_class_id_2 | The MedDRA higher level concept class identifier. | character |
event | meddra_concept_class_id_3 | The MedDRA higher level greater term concept class identifier. | character |
event | meddra_concept_class_id_4 | The MedDRA system organ class concept class identifier. | character |
event | meddra_concept_code_1 | The MedDRA preferred term concept code identifier. | character |
event | meddra_concept_code_2 | The MedDRA higher level concept code identifier. | character |
event | meddra_concept_code_3 | The MedDRA higher level greater term concept code identifier. | character |
event | meddra_concept_code_4 | The MedDRA system organ class concept code identifier. | character |
event | meddra_concept_id_2 | The MedDRA higher level concept identifier. | int |
event | meddra_concept_id_3 | The MedDRA higher level greater term concept identifier. | int |
event | meddra_concept_id_4 | The MedDRA system organ class concept identifier. | int |
event | meddra_concept_name_1 | The MedDRA preferred term concept name. Same as 'meddra_concept_name' | character |
event | meddra_concept_name_2 | The MedDRA higher level concept name. | character |
event | meddra_concept_name_3 | The MedDRA higher level greater term concept name. | character |
event | relationship_id_12 | The relationship identifier between columns *1 and *2; should be 'Is a' denoting 1-to-1 mapping. | character |
event | relationship_id_23 | The relationship identifier between columns *2 and *3; should be 'Is a' denoting 1-to-1 mapping. | character |
event | relationship_id_34 | The relationship identifier between columns *3 and *4; should be 'Is a' denoting 1-to-1 mapping. | character |
event | soc_category | The customized category to represent meddra_concept_name_4 events more broadly as used in the manuscript. Developed in consultation with https://admin.new.meddra.org/sites/default/files/guidance/file/intguide_21_0_english.pdf. | character |
event | pediatric_adverse_event | Whether this event concept (meddra concept id) was defined by MedDRA 19th edition vocabulary as a pediatric-specific adverse event. One (1) indicates yes and zero (0) indicates no. The list of events were curated from this site: https://www.meddra.org/paediatric-and-gender-adverse-event-term-lists. | int |
Example 3: Drug safety report datasets
Table of report characteristics for drugs reports with events in KidSIDES.
table_ <- "ade_raw"
#dataset size for most frequent drug
dplyr::tbl(con,table_) %>%
dplyr::filter(atc_concept_id=="21603929") %>%
dplyr::collect() %>%
lobstr::obj_size()
#> 9.97 MB
#dataset size for least frequent drug
dplyr::tbl(con,table_) %>%
dplyr::filter(atc_concept_id=="21600407") %>%
dplyr::collect() %>%
lobstr::obj_size()
#> 4.21 kB
#datasets using pre-selected drugs and events
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(
atc_concept_id==drug_id_
) %>%
dplyr::collect()
df %>% dim()
#> [1] 28300 23
df %>% lobstr::obj_size()
#> 5.69 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(
atc_concept_id==drug_id_ &
meddra_concept_id==event_id_
) %>%
dplyr::collect()
df %>% dim()
#> [1] 505 23
df %>% lobstr::obj_size()
#> 147.40 kB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
ade_raw | safetyreportid | The unique identifier for the report. | character |
ade_raw | ade | Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. | character |
ade_raw | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
ade_raw | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. | int |
ade_raw | nichd | This is the NICHD-defined child development stage. Defined in https://doi.org/10.1542/peds.2012-0055I. | character |
ade_raw | sex | The reported sex. | character |
ade_raw | reporter_qualification | The type of reporter. | character |
ade_raw | receive_date | The date the report was first submitted. | date |
ade_raw | XA | GAM covariate name for the ATC 1st level concept name 'ALIMENTARY TRACT AND METABOLISM' | float |
ade_raw | XB | GAM covariate name for the ATC 1st level concept name 'BLOOD AND BLOOD FORMING ORGANS' | float |
ade_raw | XC | GAM covariate name for the ATC 1st level concept name 'CARDIOVASCULAR SYSTEM' | float |
ade_raw | XD | GAM covariate name for the ATC 1st level concept name 'DERMATOLOGICALS' | float |
ade_raw | XG | GAM covariate name for the ATC 1st level concept name 'GENITO URINARY SYSTEM AND SEX HORMONES' | float |
ade_raw | XH | GAM covariate name for the ATC 1st level concept name 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS' | float |
ade_raw | XJ | GAM covariate name for the ATC 1st level concept name 'ANTIINFECTIVES FOR SYSTEMIC USE' | float |
ade_raw | XL | GAM covariate name for the ATC 1st level concept name 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS' | float |
ade_raw | XM | GAM covariate name for the ATC 1st level concept name 'MUSCULO-SKELETAL SYSTEM' | float |
ade_raw | XN | GAM covariate name for the ATC 1st level concept name 'NERVOUS SYSTEM' | float |
ade_raw | XP | GAM covariate name for the ATC 1st level concept name 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS' | float |
ade_raw | XR | GAM covariate name for the ATC 1st level concept name 'RESPIRATORY SYSTEM' | float |
ade_raw | XS | GAM covariate name for the ATC 1st level concept name 'SENSORY ORGANS' | float |
ade_raw | XV | GAM covariate name for the ATC 1st level concept name 'VARIOUS' | float |
ade_raw | polypharmacy | The number of drugs reported. | int |
Example 4: Pediatric drug safety signal dataset
Table of drug-event observations including signal chharacteristics in KidSIDES.
table_ <- "ade"
#Significant signals
df <-
dplyr::tbl(con,table_) %>%
dplyr::filter(gt_null_99==1) %>%
dplyr::collect()
df %>% dim()
#> [1] 19438 9
df %>% lobstr::obj_size()
#> 2.81 MB
df %>% head(1000) %>% DT::datatable(options = list(pageLengt=5,scrollX = TRUE))
table | field | description | type |
---|---|---|---|
ade | ade | Primary key. This is the unique identifier of an adverse drug event (drug-event). It is a combination of the atc_concept_id and the meddra_concept_id. | character |
ade | atc_concept_id | The ATC 5th level OMOP concept identifier. | int |
ade | meddra_concept_id | The MedDRA preferred term OMOP concept identifier. In the event table, this would be equivalent in 'meddra_concept_id_1'. | int |
ade | cluster_id | The identifier for the cluster group assigned to a drug-event by our data-driven clustering approach. See the manuscript's methods for details. | character |
ade | gt_null_statistic | The boolean value indicating whether at least one stage's score was greater than nominal significance (the 90 percent confidence interval was above 0). | float |
ade | gt_null_99 | The boolean value indicating whether at least one stage's score was greater than significance by the null model, as referenced in the paper (the score was greater than the 99th percentile of the null distribution of randomly co-reported drugs and events). | float |
ade | max_score_nichd | The child development stage that had the highest risk score for the drug-event. | float |
ade | cluster_name | The dynamics name given to the identfier of a cluster group. This is descriptive of the risk trend across stages, from birth through adolescence. | character |
ade | ade_nreports | The number of reports of the drug and event co-occurring | character |
Example 5: Pediatric drug safety signal time series dataset
Table of drug safety signals across childhood in KidSIDES.