Documentation
  1. Database
  2. Lab results importation example
  • Introduction
    • Purpose of this documentation
    • Contribute to this documentation
  • Database
    • Database structure
    • Lab results importation example
    • Database query
  • Data integration [FR]
    • Recensement des projets
    • Recensement des sites
    • Consolidation de la nomenclature des analytes
    • Description des contaminants
    • Déterminer la taille des échantillons par site
    • Injections des données

On this page

  • Data importation procedure
  • Database backup
  • Loading library dependancy
  • Loading the data report within the R environment
  • Initialize the database connection
  • Data injection
    • Report and project metadata importation
    • Data preparation
    • Site importation
  • Species importation
  • Lab measurements injection
    • Field samples
    • Lab samples
    • Lab field samples
    • Lab measurements
  1. Database
  2. Lab results importation example

Lab results importation example

Author

Steve Vissault

Published

April 29, 2024

Data importation procedure

To maintain the referential integrity of the data, the data injection must be done in a certain order. For example, all sites must be entered first before the associated samples can be declared.

Here is the expected injection order according to the database structure:

  1. Projects
  2. Reports
  3. Sites
  4. Species
  5. Field samples
  6. Lab samples
  7. Lab measurement

All the data intended to populate the tables was consolidated during the previous steps. We will now proceed with importing them into the database.

This document allows the import of laboratory measurements for the MET-Thg-23-19 report.

Database backup

The first step is to ensure that we can restore the database if the import introduces errors into the database.

db_path <- "Z:/07-Données BD/Database/contaminants-rlavoie-eccc.sqlite"

db_bak <- stringr::str_replace(
    db_path, 
    pattern = ".sqlite", 
    replacement = glue::glue("-{format(Sys.Date(), '%Y%m%d')}.bak.sqlite")
)

file.copy(db_path, db_bak)

Loading library dependancy

if(!require(devtools)) install.packages("devtools")
if(!require(toxbox)) devtools::install_github("ECCC-lavoie-ecotox/toxbox.git")

Loading the data report within the R environment

target_path_lab_results <- "Z:/03-Labo/Results Reports/À entrer dans la BD/MET-THg-23-19 - RALA01-2023.xlsx"

# Informations sur les échantillons
sampleInfoLab <- readxl::read_xlsx(target_path_lab_results, "SampleInfo")

# Informations sur les mesures
measurementsLab <- readxl::read_xlsx(target_path_lab_results, "SampleData") 

Initialize the database connection

con <- toxbox::init_con()

Data injection

Report and project metadata importation

report <- data.frame(
    id_report = "MET-Thg-23-19",
    id_project = "RALA01-2023",
    report_date = "2024-07-31",
    report_access_path = stringr::str_replace("Z:/03-Labo/Results Reports/À entrer dans la BD/MET-THg-23-19 - RALA01-2023.xlsx", "À entrer dans la BD", "Saisie dans la BD")
)

We first validate if the project ID already exists in the database.

toxbox::search_tbl(con, "project", id_project = "RALA01-2023")

It already exists in the database. Therefore, we can import the report metadata that will be attached to this project.

DBI::dbWriteTable(con, "report", report, append = TRUE)

Data preparation

Gather field sample informations

field_sample <- sampleInfoLab |> dplyr::select(
    id_lab_sample = SampleID,
    id_field_sample = ClientID,
    collection_date = CollectionDate,
    id_site = Location,
    id_species = Species,
    tissue = Tissue,
    age = Age
)

Gather lab meaurements

measurements <- measurementsLab |> dplyr::select(
    id_lab_sample = SampleID,
    id_field_sample = ClientID,
    pmoisture = `% Moisture`,
    value = `Total Mercury (µg/g (dry))`
) |> dplyr::mutate(
    id_analyte = "thg_dw"
)

Site importation

We must establish correspondance with existing sites in database and create the ones that are not already present in the database.

First, we isolate the sites present in the results sent by the laboratory.

(sitesLab <- dplyr::distinct(sampleInfoLab, Location, Latitude, Longitude))

We search the database for the existing sites using keywords.

betchouanes_site_db <- toxbox::search_tbl(con, "sites", id_site = "%betchouanes%")
mingan_site_db <- toxbox::search_tbl(con, "sites", id_site = "%mingan%")

After searching the database for sites, we can see that the Betchouanes site is already recorded in the database, but the Longue-Pointe-de-Mingan site is not. Therefore, we add this site.

add_site <- sitesLab |> 
    dplyr::filter(Location == "Longue-Pointe-de-Mingan") |> 
    dplyr::select(
        id_site = Location,
        lat = Latitude,
        lon = Longitude
    ) |>
    dplyr::mutate(
        province = 'Québec',
        srid = 4326
    )

DBI::dbWriteTable(con, "sites", add_site, append = TRUE)

We verify that the site has indeed been added to the database.

toxbox::search_tbl(con, "sites", id_site = "Longue-Pointe-de-Mingan")
field_sample <- field_sample |>
    dplyr::mutate(
        id_site = dplyr::case_when(
            stringr::str_detect(id_site, "Ile a Calculot des Betchouanes") == TRUE ~ betchouanes_site_db$id_site,
            .default = id_site
        )
    )

Species importation

We list the species, starting with those present in the laboratory results.

unique(field_sample$id_species)

We compare it with the list of species present in the database.

DBI::dbReadTable(con, "species")

We declare a data.frame with the missing species

new_species <- tibble::tribble(
   ~org_species, ~id_species, ~organism, ~genus, ~species, ~vernacular_fr, ~vernacular_en,
  "Alose sp.", "SHSP", "Fish", "Alosa",   "Alosa sp.", "Alose", "Shad",
  "Gadidae sp.", "COSP", "Fish", "Gadidae", "Gadidae sp.", "Morue", "Cod",
  "Lompenie tachete", "DASH", "Fish", "Gadidae", "Leptoclinus maculatus", "Lompénie tachetée", "Daubed shanny"
)

We add those missing species in the database

DBI::dbWriteTable(con, "species", dplyr::select(new_species, -org_species)
, append = TRUE)

We visually confirm that the new species are indeed present in the database.

DBI::dbReadTable(con, "species")

We match the species codes from the laboratory report with those in the database.

field_sample <- field_sample |> dplyr::mutate(
    id_species = dplyr::case_when(
        id_species == "Alose sp." ~ "SHSP",
        id_species == "Capelan" ~ "CAPE",
        id_species == "Gadidae sp." ~ "COSP",
        id_species == "Lancon" ~ "SAND",
        id_species == "Lompenie tachete" ~ "DASH",
        .default = NA
    )
) |> dplyr::filter(!is.na(id_species))

Lab measurements injection

The previous steps have updated the reference tables for sites and species.

We can now proceed to import the data on samples and measurements.

Field samples

dplyr::select(field_sample, id_field_sample, collection_date, id_site, id_species, age, tissue) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "field_sample", data, append = TRUE))()

Lab samples

dplyr::select(field_sample, id_lab_sample) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "lab_sample", data, append = TRUE))()

Lab field samples

This table links the laboratory samples with the field samples. This join table is necessary as lab samples can hold several field sample (often the case with eggs).

dplyr::select(field_sample, id_lab_sample, id_field_sample) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "lab_field_sample", data, append = TRUE))()

Lab measurements

Finaly, all the parent table (field_sample, lab_sample, lab_field_sample) have been recorded in the database. We can now import the lab measurements record. We have to manualy declare the MDL/LOD value in order to assess if the measurement value is cencored or not.

# Set method detection limit
MDL <- 0.0001

dplyr::select(measurements, id_lab_sample, id_analyte, value, percent_moisture = pmoisture) |> 
    # The line below assess if the value is censored or not
    dplyr::mutate(is_censored = ifelse(value > MDL, 0, 1)) |>
    dplyr::mutate(id_report = report$id_report) |> 
    (\(data) DBI::dbWriteTable(con, "lab_measurement", data, append = TRUE))()
Source Code
---
title: "Lab results importation example"
author: "Steve Vissault"
date: "2024-04-29"
execute: 
  eval: false
---

## Data importation procedure

To maintain the referential integrity of the data, the data injection must be done in a certain order. For example, all sites must be entered first before the associated samples can be declared.

Here is the expected injection order according to the database structure:

1. Projects
2. Reports
5. Sites
6. Species
7. Field samples
8. Lab samples
9. Lab measurement

All the data intended to populate the tables was consolidated during the previous steps. We will now proceed with importing them into the database.

This document allows the import of laboratory measurements for the MET-Thg-23-19 report.

## Database backup

The first step is to ensure that we can restore the database if the import introduces errors into the database.

```{r}
db_path <- "Z:/07-Données BD/Database/contaminants-rlavoie-eccc.sqlite"

db_bak <- stringr::str_replace(
    db_path, 
    pattern = ".sqlite", 
    replacement = glue::glue("-{format(Sys.Date(), '%Y%m%d')}.bak.sqlite")
)

file.copy(db_path, db_bak)
```

## Loading library dependancy

```{r}
if(!require(devtools)) install.packages("devtools")
if(!require(toxbox)) devtools::install_github("ECCC-lavoie-ecotox/toxbox.git")
```

## Loading the data report within the R environment

```{r}
target_path_lab_results <- "Z:/03-Labo/Results Reports/À entrer dans la BD/MET-THg-23-19 - RALA01-2023.xlsx"

# Informations sur les échantillons
sampleInfoLab <- readxl::read_xlsx(target_path_lab_results, "SampleInfo")

# Informations sur les mesures
measurementsLab <- readxl::read_xlsx(target_path_lab_results, "SampleData") 
```

## Initialize the database connection

```{r}
con <- toxbox::init_con()
```

## Data injection

### Report and project metadata importation

```{r}
report <- data.frame(
    id_report = "MET-Thg-23-19",
    id_project = "RALA01-2023",
    report_date = "2024-07-31",
    report_access_path = stringr::str_replace("Z:/03-Labo/Results Reports/À entrer dans la BD/MET-THg-23-19 - RALA01-2023.xlsx", "À entrer dans la BD", "Saisie dans la BD")
)
```

We first validate if the project ID already exists in the database.  

```{r}
toxbox::search_tbl(con, "project", id_project = "RALA01-2023")
```

It already exists in the database. Therefore, we can import the report metadata that will be attached to this project.
```{r}
DBI::dbWriteTable(con, "report", report, append = TRUE)
```

### Data preparation

#### Gather field sample informations

```{r}
field_sample <- sampleInfoLab |> dplyr::select(
    id_lab_sample = SampleID,
    id_field_sample = ClientID,
    collection_date = CollectionDate,
    id_site = Location,
    id_species = Species,
    tissue = Tissue,
    age = Age
)
```

#### Gather lab meaurements

```{r}
measurements <- measurementsLab |> dplyr::select(
    id_lab_sample = SampleID,
    id_field_sample = ClientID,
    pmoisture = `% Moisture`,
    value = `Total Mercury (µg/g (dry))`
) |> dplyr::mutate(
    id_analyte = "thg_dw"
)
```

### Site importation

We must establish correspondance with existing sites in database and create the ones that are not already present in the database.

First, we isolate the sites present in the results sent by the laboratory.

```{r}
(sitesLab <- dplyr::distinct(sampleInfoLab, Location, Latitude, Longitude))
```

We search the database for the existing sites using keywords.

```{r}
betchouanes_site_db <- toxbox::search_tbl(con, "sites", id_site = "%betchouanes%")
mingan_site_db <- toxbox::search_tbl(con, "sites", id_site = "%mingan%")
```

After searching the database for sites, we can see that the Betchouanes site is already recorded in the database, but the Longue-Pointe-de-Mingan site is not. Therefore, we add this site.

```{r}
add_site <- sitesLab |> 
    dplyr::filter(Location == "Longue-Pointe-de-Mingan") |> 
    dplyr::select(
        id_site = Location,
        lat = Latitude,
        lon = Longitude
    ) |>
    dplyr::mutate(
        province = 'Québec',
        srid = 4326
    )

DBI::dbWriteTable(con, "sites", add_site, append = TRUE)
```

We verify that the site has indeed been added to the database.

```{r}
toxbox::search_tbl(con, "sites", id_site = "Longue-Pointe-de-Mingan")
```

```{r}
field_sample <- field_sample |>
    dplyr::mutate(
        id_site = dplyr::case_when(
            stringr::str_detect(id_site, "Ile a Calculot des Betchouanes") == TRUE ~ betchouanes_site_db$id_site,
            .default = id_site
        )
    )
``` 

## Species importation 

We list the species, starting with those present in the laboratory results.

```{r}
unique(field_sample$id_species)
```

We compare it with the list of species present in the database.

```{r}
DBI::dbReadTable(con, "species")
```

We declare a `data.frame` with the missing species

```{r}
new_species <- tibble::tribble(
   ~org_species, ~id_species, ~organism, ~genus, ~species, ~vernacular_fr, ~vernacular_en,
  "Alose sp.", "SHSP", "Fish", "Alosa",   "Alosa sp.", "Alose", "Shad",
  "Gadidae sp.", "COSP", "Fish", "Gadidae", "Gadidae sp.", "Morue", "Cod",
  "Lompenie tachete", "DASH", "Fish", "Gadidae", "Leptoclinus maculatus", "Lompénie tachetée", "Daubed shanny"
)
```

We add those missing species in the database

```{r}
DBI::dbWriteTable(con, "species", dplyr::select(new_species, -org_species)
, append = TRUE)
```

We visually confirm that the new species are indeed present in the database.

```{r}
DBI::dbReadTable(con, "species")
```

We match the species codes from the laboratory report with those in the database. 

```{r}
field_sample <- field_sample |> dplyr::mutate(
    id_species = dplyr::case_when(
        id_species == "Alose sp." ~ "SHSP",
        id_species == "Capelan" ~ "CAPE",
        id_species == "Gadidae sp." ~ "COSP",
        id_species == "Lancon" ~ "SAND",
        id_species == "Lompenie tachete" ~ "DASH",
        .default = NA
    )
) |> dplyr::filter(!is.na(id_species))
```

## Lab measurements injection

The previous steps have updated the reference tables for sites and species.

We can now proceed to import the data on samples and measurements.

### Field samples 

```{r}
dplyr::select(field_sample, id_field_sample, collection_date, id_site, id_species, age, tissue) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "field_sample", data, append = TRUE))()
```

### Lab samples

```{R}
dplyr::select(field_sample, id_lab_sample) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "lab_sample", data, append = TRUE))()
```

### Lab field samples

This table links the laboratory samples with the field samples. This join table is necessary as lab samples can hold several field sample (often the case with eggs).

```{R}
dplyr::select(field_sample, id_lab_sample, id_field_sample) |> 
    dplyr::distinct() |> 
    (\(data) DBI::dbWriteTable(con, "lab_field_sample", data, append = TRUE))()
```

### Lab measurements

Finaly, all the parent table (`field_sample`, `lab_sample`, `lab_field_sample`) have been recorded in the database. We can now import the lab measurements record. We have to manualy declare the MDL/LOD value in order to assess if the measurement value is cencored or not.

```{R}
# Set method detection limit
MDL <- 0.0001

dplyr::select(measurements, id_lab_sample, id_analyte, value, percent_moisture = pmoisture) |> 
    # The line below assess if the value is censored or not
    dplyr::mutate(is_censored = ifelse(value > MDL, 0, 1)) |>
    dplyr::mutate(id_report = report$id_report) |> 
    (\(data) DBI::dbWriteTable(con, "lab_measurement", data, append = TRUE))()
```