cgmanalysis: An R package for descriptive analysis of continuous glucose monitor data

Authors: Tim Vigers ^aff001; Christine L. Chan ^aff001; Janet Snell-Bergeon ^aff002; Petter Bjornstad ^aff001; Philip S. Zeitler ^aff001; Gregory Forlenza ^aff002; Laura Pyle ^aff001
Authors place of work: Section of Pediatric Endocrinology, University of Colorado School of Medicine, Aurora, Colorado, United States of America ^aff001; Barbara Davis Center, University of Colorado School of Medicine, Aurora, Colorado, United States of America ^aff002; Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, United States of America ^aff003
Published in the journal: PLoS ONE 14(10)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0216851

Summary

Continuous glucose monitoring (CGM) is an essential part of diabetes care. Real-time CGM data are beneficial to patients for daily glucose management, and aggregate summary statistics of CGM measures are valuable to direct insulin dosing and as a tool for researchers in clinical trials. Yet, the various commercial systems still report CGM data in disparate, non-standard ways. Accordingly, there is a need for a standardized, free, open-source approach to CGM data management and analysis. A package titled cgmanalysis was developed in the free programming language R to provide a rapid, easy, and consistent methodology for CGM data management, summary measure calculation, and descriptive analysis. Variables calculated by our package compare well to those generated by various CGM software, and our functions provide a more comprehensive list of summary measures available to clinicians and researchers. Consistent handling of CGM data using our R package may facilitate collaboration between research groups and contribute to a better understanding of free-living glucose patterns.

Keywords:

Data management – Glucose – Blood sugar – Software tools – Open source software – Programming languages

Introduction

Continuous glucose monitoring (CGM) technology has transformed diabetes care over the past 15 years by allowing clinicians to measure free-living glucose patterns. During this period, CGM use has increased from < 5% of patients to almost 50% in some age groups [1]. With recent reports detailing the benefits of CGM time in range metrics as predictive of long-term vascular outcomes [2] and as an indicator of glucose management or estimated hemoglobin A1c (HbA1c) [3], CGM use will likely continue to increase in both research and clinical settings. Despite the increasing use of CGM for treatment and research, a standardized, free, open-source approach to data management and analysis is lacking [4].

CGM manufacturers use proprietary algorithms to create reports and calculate summary measures for patients and clinicians. As a result, it may be difficult to compare results obtained using different CGM devices and to understand the sources of variability that could influence CGM outcomes. In addition, research questions may require summary measures that are not available in accompanying reports (e.g., use of a different cut-point for hyperglycemia). Furthermore, use of the summary values provided by each CGM platform sometimes requires that data be entered by hand into a database or spreadsheet prior to analysis. This is a time-consuming and error prone process that will benefit from automation. The use of a free and open source program to summarize raw sensor glucose values will enable researchers to define their own variables of interest and standardize calculation of summary measures across different CGM devices.

There have already been a few attempts to develop such systems, including the EasyGV macro-enabled Excel workbook [5], AGP Report (agpreport.org), and Tidepool (tidepool.org). However, there are reports suggesting that EasyGV poorly matches other calculations of mean amplitude of glycemic excursion (MAGE) [6], and it does not permit the various definitions of a significant excursion (i.e. greater than 1 standard deviation (SD), 2 SDs, etc.). Although Tidepool appears to be an excellent option for patients and clinicians, it is not free for use in research, and many smaller investigator-initiated studies cannot afford the additional expense. Also, their open source code requires significant coding knowledge in multiple programming languages which limits accessibility and widespread use. Finally, Zhang et al. [7] released the CGManalyzer package for R; however, the package was removed from the CRAN repository because problems with the software were not corrected.

To address this need, we have developed a package written entirely in the statistical programming language R (R Foundation for Statistical Computing, Vienna, Austria). R software is free and can be obtained at: https://www.r-project.org/. The package currently works with data from Diasend (www.diasend.com), Dexcom (www.dexcom.com), iPro 2 (http://professional.medtronicdiabetes.com/ipro2-professional-cgm), Libre (www.freestylelibre.us), and Carelink (www.medtronicdiabetes.com/products/carelink-personal-diabetes-software), with plans to add support for other platforms as CGM technology advances. Additionally, data can be manually formatted to work with these functions if necessary. The package is available on The Comprehensive R Archive Network (CRAN) under the name ‘cgmanalysis’ (https://cran.r-project.org/web/packages/cgmanalysis/index.html) and the source code can be found at https://github.com/childhealthbiostatscore/R-Packages, which allows for version control and forking if users need to modify the code to alter functionality. A short user guide (https://github.com/childhealthbiostatscore/R-Packages/blob/master/CGM%20Analysis/cgmanalysis%20New-User%20Guide.docx) explains how to install and run the software.

Summary measures of glycemia

Although CGM is not a new technology, there is still debate regarding the advantages and disadvantages of various CGM metrics for use in clinical care and as research outcomes. The American Diabetes Association (ADA) recently proposed a set of key metrics for reporting CGM data [8], all of which are calculated by our code, in addition to the glucose management indicator (GMI) [3], time in range [2], and other variables proposed by Hernandez et al. [4]. An easy method to calculate these important summary variables from a variety of sources of CGM data has the potential to contribute to the standardization of the use of these metrics. A list of summary variables produced by our default code is available in Table 1. The code can be easily modified to include further variables of interest, to be released in future version updates. Further, because the package is open source, individual users can create their own modifications.

Methods

Package design

Our package consists of three simple functions: cleandata(), cgmvariables(), and cgmreport(). The data cleaning function iterates through a directory of CGM data exports and produces new files that then serve as input to the CGM variable calculator and the CGM report generator. The initial directory can contain files from different sources, as the function identifies the relevant timestamp and glucose values for each file format. By default, the cleaning function will fill in gaps in glucose data less than 20 minutes long using linear interpolation. It will also remove 24-hour periods containing gaps larger than 20 minutes, so that there will be an equal number of daytime and nighttime values, important for calculating some variables, such as AUC. The user can specify a different maximum gap to fill by interpolation and can also choose whether to remove days with larger gaps. For example,

cleandata(“path/to/inputdirectory”,

“path/to/outputdirectory”)

will clean the data using the default settings, while

cleandata(“path/to/inputdirectory”,

“path/to/outputdirectory”,

removegaps = FALSE, gapfill = TRUE, maximumgap = 30)

will fill in gaps shorter than 30 minutes but will not remove the 24-hour chunks containing larger gaps. Ideally, the CGM data should be exported and then cleaned using this package, and not manually edited. However, if a file does require manual data editing, these functions will work on the three-column format detailed in the package documentation. Examples of data pre- and post-cleaning are available on figshare (https://figshare.com/projects/cgmanalysis_An_R_package_for_descriptive_analysis_of_continuous_glucose_monitor_data/64973) and in the package’s “extdata” directory.

Once the data have been cleaned, the CGM variables described in Table 1 are calculated using the cgmvariables() function. By default, blood glucose must be above a threshold for at least 35 minutes or below a threshold for at least 10 minutes to count as an excursion, but these parameters can be changed by the user if necessary. Likewise, daytime (e.g. for daytime vs. nighttime AUC or maximum glucose) is defined as 6:00 to 22:00 by default, but these can be set depending on user needs. MAGE is calculated using Baghurst’s algorithm [9], which we have coded in R. By default, the function includes blood glucose excursions greater than 1 SD from the mean in calculation of MAGE, but there are options for 1.5 SD and 2 SD as well. For example,

cgmvariables(“path/to/inputdirectory”,

“path/to/outputdirectory”)

will produce summary measures using the default settings above, while

cgmvariables(“path/to/inputdirectory”,

“path/to/outputdirectory”,

daystart = 8, dayend = 23, magedef = “2sd”)

will produce summary measures using 2 SD as the threshold for MAGE excursions, and daytime defined as 8:00 to 23:00.

Our code was originally written to produce data tables for upload to a Research Electronic Data Capture (REDCap) database [10], which influenced the selection of variable names in the final output. These names can be changed in the code itself or by simply editing the function’s output. These variables are stored in separate columns of a new data frame (the function’s output), with each record identified by the patient ID.

In addition to producing calculated variables, our package can also plot CGM data in a few ways. First, the function concatenates all the CGM data in the specified directory into one data table and plots the aggregate data in the style of the standard AGP report (http://www.agpreport.org), the aggregate daily overlay (ADO). This method uses Tukey running median smoothing [11] after rounding each timepoint to the nearest 10-minute mark, then plots the median, inter-quartile range, and 5 and 95 percentiles at each time of day (with plans to add more options in the future). The package also produces a similar aggregate plot with a Loess-smoothed (locally estimated scatterplot smoothing) average [12–14] overlaid on points representing every single glucose value. For smaller data sets, this type of plot gives a meaningful overview of daily glucose trends. Finally, the third type of plot uses a Loess-smoothed average for each patient with glucose values color-coded by participant. The current default y axis range for each plot is 0–400 mg/dL, but this can be altered manually. For example,

cgmreport(“path/to/inputdirectory”,

“path/to/outputdirectory”, yaxis = c(70,300))

will produce plots with a y axis range of 70–300 mg/dL.

Comparison of cgmanalysis package and proprietary software

Our functions were compared to proprietary CGM software using clinically collected data from iPro 2, Carelink 670G, Dexcom Clarity, and Diasend. The data were exported from each platform, formatted using the cleandata() function, then summarized using the cgmvariables() and cgmreport() functions. The data were not cleaned prior to plotting and summary variable calculation, and summary variable parameters were altered from default (e.g. defining an excursion as 15 minutes above or below threshold for iPro 2 data) in order to better match the CGM results. Because each CGM device provides different and limited summary variables, we were only able to compare a small subset of our package’s output and were not able to directly test more complex variables, such as MAGE or CONGA.

Results

Fig 1 is an example of the ADO plot made using approximately 25,000 simulated CGM values, and Fig 2 is the version of the ADO with Loess smoothing, using the same data as in Fig 1. Fig 3 is the patient-specific plot, made with a subset of the simulated data.

**Fig. 1. Aggregate Daily Overlay (Tukey Smoothing).**

**Fig. 2. Aggregate Daily Overlay (Loess Smoothing).**

**Fig. 3. Daily Overlay per Subject (LOESS Smoothing).**

Table 2 shows the results of summary variable comparisons between four different proprietary CGM devices and our cgmanalysis package. Most of the differences in these comparisons are small and the result of rounding. Overall the package appears to be capable of reproducing proprietary calculations when run with non-default settings, although in the comparison to the iPro 2, there was a difference of 1 high excursion.

Figs 4–7 show the comparisons of the graphical outputs produced by the proprietary software and the cgmanalysis package. In the graphs produced by the cgmanalysis package, glycemic patterns at each hour of the day are clearly visible and match the CGM device outputs well. However, some of the proprietary software appear to apply different smoothing algorithms, resulting in slightly different patterns across time.

Discussion

The summary variables produced by the cgmanalysis package match those from the proprietary software for all platforms assessed, and differences are mainly due to rounding discrepancies. Compared to the iPro 2, the number of high excursions differed by 1. Without access to the iPro algorithms we are unable to determine why these counts disagree, but the difference is not likely of clinical significance. The graphical outputs from the cgmanalysis package are similar to the CGM device output in terms of the glycemic patterns by hour of day, although there are small differences, likely due to different smoothing algorithms.

There are several limitations to our comparison of the cgmanalysis package to the proprietary software output. CGM devices only calculate a few summary variables, and accordingly it is difficult to test this package cohesively. Also, gold standard calculations do not exist for many of these variables, which makes verifying our results difficult. We hope that by making this package freely available and open source, these limitations will be minimized through widespread testing. Perhaps the greatest limitation to the software itself is the lack of an easy to use graphical user interface (GUI), which may prevent its use by clinicians with limited programming experience. We have included detailed documentation in the CRAN package, as well as a new-user guide on GitHub, but using the package still requires enough technical knowledge that it may be inaccessible to some users. None of the authors are software engineers, and the package is undoubtedly less efficient than it could be. Again, we hope that the free and open source nature will contribute significantly to improving the code over time, both as a result of outside contributions and our own planned updates.

In conclusion, our software provides a standardized, free, open-source approach to manage and analyze CGM data, enabling sharing of data across technology platforms, collaboration between research groups, and more effective use of the growing pool of CGM data. The advantage of using R functions rather than licensed statistical software, or a web-based or desktop application, is that R is freely available and open source. Clinicians or investigators can alter the code according to their needs and anyone can contribute to the development of the program, as CGM research and technology advance.

Zdroje

1. DeSalvo DJ, Miller KM, Hermann JM, Maahs DM, Hofer SE, Clements MA, et al. Continuous glucose monitoring and glycemic control among youth with type 1 diabetes: International comparison from the T1D Exchange and DPV Initiative. Pediatr Diabetes 2018; 19(7): 1271–1275. doi: 10.1111/pedi.12711 29923262

2. Beck RW, Bergenstal RM, Riddlesworth TD, Kollman C, Li Z, Brown AS, et al. Validation of Time in Range as an Outcome Measure for Diabetes Clinical Trials. Diabetes Care 2019; 42(3): 400–405. doi: 10.2337/dc18-1444 30352896

3. Bergenstal RM, Beck RW, Close KL, Grunberger G, Sacks DB, Kowalski A, et al. Glucose Management Indicator (GMI): A New Term for Estimating A1C From Continuous Glucose Monitoring. Diabetes Care 2018; 41(11): 2275–2280. doi: 10.2337/dc18-1581 30224348

4. Hernandez TL, Barbour LA. A standard approach to continuous glucose monitor data in pregnancy for the study of fetal growth and infant outcomes. Diabetes Technol Ther 2013; 15(2): 172–9. doi: 10.1089/dia.2012.0223 23268584

5. Hill NR, Oliver NS, Choudhary P, Levy JC, Hindmarsh P, Matthews DR. Normal reference range for mean tissue glucose and glycemic variability derived from continuous glucose monitoring for subjects without diabetes in different ethnic groups. Diabetes Technol Ther 2011; 13(9): 921–8. doi: 10.1089/dia.2010.0247 21714681

6. Sechterberger MK, Luijf YM, Devries JH. Poor agreement of computerized calculators for mean amplitude of glycemic excursions. Diabetes Technol Ther 2014; 16(2): 72–5. doi: 10.1089/dia.2013.0138 24191760

7. Zhang XD, Zhang Z, Wang D. CGManalyzer: an R package for analyzing continuous glucose monitoring studies. Bioinformatics 2018; 34(9): 1609–1611. doi: 10.1093/bioinformatics/btx826 29315360

8. Danne T, Nimri R, Battelino T, Bergenstal R, Close KL, DeVries JH, et al. International Consensus on Use of Continuous Glucose Monitoring. Diabetes Care 2017; 40(12): 1631–1640. doi: 10.2337/dc17-1600 29162583

9. Baghurst PA. Calculating the mean amplitude of glycemic excursion from continuous glucose monitoring data: an automated algorithm. Diabetes Technol Ther 2011; 13(3): 296–302. doi: 10.1089/dia.2010.0090 21291334

10. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42(2): 377–81. doi: 10.1016/j.jbi.2008.08.010 18929686

11. Tukey JW. Exploratory data analysis. 1st ed. Reading MA: Addison-Wesely; 1970.

12. Chambers JM, Hastie T. Statistical models in S. Boca Raton, FL: Chapman & Hall/CRC; 1992.

13. Wood SN. mgcv: GAMs and generalized ridge regression for R. R News 2001; 1(2): 20–25.

14. O'Sullivan F, Yandell BS, Raynor WJ. Automatic Smoothing of Regression Functions in Generalized Linear Models. J Am Stat Assoc 1986; 81(393): 96–103.

cgmanalysis: An R package for descriptive analysis of continuous glucose monitor data

Summary

Keywords:

Introduction

Summary measures of glycemia

Methods

Package design

Comparison of cgmanalysis package and proprietary software

Results

Discussion

Zdroje

PLOS One

Eozinofilní zánět a remodelace

Svět praktické medicíny 1/2025 (znalostní test z časopisu)

Hypertrofická kardiomyopatie: Moderní přístupy v diagnostice a léčbě

Vliv funkčního chrupu na paměť a učení

Současné možnosti léčby obezity