R: Convert Docker stats output into tabular form

The Docker stats command prints out entries with units and as proportions.

You can choose the format, so it can be convenient to poll the command and print out periodically into jsonl:

{“cpu”: “68.56%”, “memory”: “26.67MiB / 5.651GiB”, “netIO”: “0B / 0B”, “blockIO”: “0B / 0B”}
{“cpu”: “0.25%”, “memory”: “26.67MiB / 5.651GiB”, “netIO”: “0B / 0B”, “blockIO”: “0B / 0B”}
{“cpu”: “0.30%”, “memory”: “26.51MiB / 5.651GiB”, “netIO”: “0B / 0B”, “blockIO”: “0B / 0B”}

For this jsonl format, the following R code will convert to dataframes:

library(dplyr)
library(purrr)
library(jsonlite)
library(tidyr)
library(measurements)
library(stringr)

convert_to_bytes <- function(value) {
    digits <- gsub("[^0-9.]", "", value)
    units <- gsub("[0-9.]", "", value)
    multiplier <- switch (units,
                          'GiB' = 1024^3,
                          'MiB' = 1024^2,
                          'KiB'= 1024,
                          'B' = 1)
    
    as.double(digits) * multiplier
}

result <- files %>% 
    map(function(file) { paste0(folder, "/", file ) }) %>%
    map(function(file) { 
        lines <- readLines(file)
        lines <- lapply(lines, fromJSON)
        lines <- lapply(lines, unlist)
        stats <- bind_rows(lines)
        if (any(names(stats) == 'cpu')) {
            stats$index <- 1:nrow(stats)
            stats <- stats %>% 
                mutate(cpu = as.numeric(gsub("%", "", cpu))) %>%
                separate(col = memory, into = c("memory_timepoint", "max_memory"), sep = " / ") %>%
                separate(col = netIO, into = c("net_io_timeout", "max_net_io"), sep = " / ") %>%
                separate(col = blockIO, into = c("block_io_timepoint", "max_block_io"), sep = " / ") %>%
                rowwise() %>%
                mutate(memory_timepoint = convert_to_bytes(memory_timepoint)) %>%
                mutate(max_memory = convert_to_bytes(max_memory)) %>%
                mutate(net_io_timeout = convert_to_bytes(net_io_timeout)) %>%
                mutate(max_net_io = convert_to_bytes(max_net_io)) %>%
                mutate(block_io_timepoint = convert_to_bytes(block_io_timepoint)) %>%
                mutate(max_block_io = convert_to_bytes(max_block_io))
        }
        stats
    })


Some output:

[[3]]
# A tibble: 3 × 8
# Rowwise: 
    cpu memory_timepoint  max_memory net_io_timeout max_net_io block_io_timepoint max_block_io index
  <dbl>            <dbl>       <dbl>          <dbl>      <dbl>              <dbl>        <dbl> <int>
1 68.6         27965522. 6067715047.              0          0                  0            0     1
2  0.25        27965522. 6067715047.              0          0                  0            0     2
3  0.3         27797750. 6067715047.              0          0                  0            0     3

Leave a Reply

Your email address will not be published. Required fields are marked *