`fgpm_factory.Rd`

This function enables the smart exploration of the solution space of potential structural configurations of a funGp model, and the consequent selection of a high quality configuration. funGp currently relies on an ant colony based algorithm to perform this task. The algorithm defines the solution space based on the levels of each structural parameter currently available in the fgpm function, and performs as smart exploration of it. More details on the algorithm are provided in a dedicated technical report. funGp might evolve in the future to include improvements in the current algorithm or alternative solution methods.

fgpm_factory( sIn = NULL, fIn = NULL, sOut = NULL, ind.vl = NULL, ctraints = list(), setup = list(), time.lim = Inf, nugget = 1e-08, n.starts = 1, n.presample = 20, par.clust = NULL, pbars = TRUE )

sIn | an optional matrix of scalar input values to train the model. Each column must match an input variable and each row a training point. Either scalar input coordinates (sIn), functional input coordinates (fIn), or both must be provided. |
---|---|

fIn | an optional list of functional input values to train the model. Each element of the list must be a matrix containing to the set of curves corresponding to one functional input. Either scalar input coordinates (sIn), functional input coordinates (fIn), or both must be provided. |

sOut | a vector (or 1-column matrix) containing the values of the scalar output at the specified input points. |

ind.vl | an optional numerical matrix specifying which points in the three structures above should be
used for training and which for validation. If provided, the optimization will be conducted in terms of
the hold-out Q2, which comes from training the model with a subset of the points, and then estimate the
prediction error in the remaining points. In that case, each column of |

ctraints | an optional list specifying the constraints of the structural optimization problem. Valid
entries for this list are: |

setup | an optional list indicating the value for some parameters of the structural optimization
algorithm. The ant colony optimization algorithm available at this time allows the following entries: |

time.lim | an optional number specifying a time limit in seconds to be used as stopping condition for the structural optimization. |

nugget | an optional variance value standing for the homogeneous nugget effect. A tiny nugget might help to overcome numerical problems related to the ill-conditioning of the covariance matrix. Default is 1e-8. |

n.starts | an optional integer indicating the number of initial points to use for the optimization of the hyperparameters. A parallel processing cluster can be exploited in order to speed up the evaluation of multiple initial points. More details in the description of the argument par.clust below. Default is 1. |

n.presample | an optional integer indicating the number of points to be tested in order to select the
n.starts initial points. The n.presample points will be randomly sampled from the hyper-rectangle defined by: |

par.clust | an optional parallel processing cluster created with the |

pbars | an optional boolean indicating if progress bars should be displayed. Default is TRUE. |

An object of class Xfgpm containing the data structures linked to the structural optimization
of a funGp model. It includes as the main component, an object of class fgpm corresponding to the
optimized model. It is accessible through the `@model`

slot of the Xfgpm object.

Betancourt, J., Bachoc, F., Klein, T., Idier, D., Pedreros, R., and Rohmer, J. (2020),
"Gaussian process metamodeling of functional-input code for coastal flood hazard assessment".
*Reliability Engineering & System Safety*, **198**, 106870.
[RESS]
[HAL]

Betancourt, J., Bachoc, F., Klein, T., and Gamboa, F. (2020),
Technical Report: "Ant Colony Based Model Selection for Functional-Input Gaussian Process Regression. Ref. D3.b (WP3.2)".
*RISCOPE project*.
[HAL]

Betancourt, J., Bachoc, F., and Klein, T. (2020),
R Package Manual: "Gaussian Process Regression for Scalar and Functional Inputs with funGp - The in-depth tour".
*RISCOPE project*.
[HAL]

Dubrule, O. (1983),
"Cross validation of kriging in a unique neighborhood".
*Journal of the International Association for Mathematical Geology*, **15**, 687-699.
[MG]

***** plotX for diagnostic plots for a fgpm_factory output and selected model;

***** plotEvol for a plot of the evolution of the model selection algorithm in fgpm_factory;

***** get_active_in for post-processing of input data structures following a fgpm_factory call;

***** predict for predictions based on a funGp model;

***** simulate for simulations based on a funGp model;

***** update for post-creation updates on a funGp model.

# calling fgpm_factory with the default arguments__________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # \donttest{ # optimizing the model structure with fgpm_factory (~12 seconds) xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut) plotLOO(xm@model) # plotting the model # building the model with the default fgpm arguments to compare m1 <- fgpm(sIn = sIn, fIn = fIn, sOut = sOut) plotLOO(m1) # plotting the model # assessing the quality of the model # in the absolute and also w.r.t. the other explored models plotX(xm) # checking the evolution of the algorithm plotEvol(xm) # } # \donttest{ # improving performance with more iterations_______________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # default of 15 iterations (~12 seconds) xm15 <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut) # increasing to 25 iterations (~20 seconds) xm25 <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, setup = list(n.iter = 25)) # plotting both models plotLOO(xm15@model) plotLOO(xm25@model) # } # \donttest{ # custom solution space____________________________________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # setting up the constraints myctr <- list(s_keepOn = c(1,2), # keep both scalar inputs always on f_keepOn = c(2), # keep f2 always active f_disTypes = list("2" = c("L2_byindex")), # only use L2_byindex distance for f2 f_fixDims = matrix(c(2,4), ncol = 1), # f2 projected in dimension 4 f_maxDims = matrix(c(1,5), ncol = 1), # f1 projected in dimension max 5 f_basTypes = list("1" = c("B-splines")), # only use B-splines projection for f1 kerTypes = c("matern5_2", "gauss")) # test only Matern 5/2 and Gaussian kernels # calling the funGp factory with specific constraints (~17 seconds) xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, ctraints = myctr) # verifying constraints with the log of some successfully built models cbind(xm@log.success@sols, "Q2" = xm@log.success@fitness) # } # \donttest{ # custom heuristic parameters______________________________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # defining the heuristic parameters mysup <- list(n.iter = 30, n.pop = 12, tao0 = .15, dop.s = 1.2, dop.f = 1.3, delta.f = 4, dispr.f = 1.1, q0 = .85, rho.l = .2, u.gbest = TRUE, n.ibest = 2, rho.g = .08) # calling the funGp factory with a custom heuristic setup (~17 seconds) xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, setup = mysup) # verifying heuristic setup through the details of the Xfgpm object unlist(xm@details$param) # } # \donttest{ # stopping condition based on time_________________________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # setting up a sufficiently large number of iterations mysup <- list(n.iter = 2000) # defining time budget mytlim <- 60 # calling the funGp factory with time limit (~60 seconds) xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, setup = mysup, time.lim = mytlim) # } # \donttest{ # passing fgpm arguments through fgpm_factory______________________________________________ # generating input and output data set.seed(100) n.tr <- 32 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # calling the funGp factory with custom fgpm parameters (~25 seconds) xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, nugget = 0, n.starts = 3, n.presample = 12) # NOTE: in the run above, some models crash. This happens because we set the nugget to 0 # and some input points become duplicates when some variables are removed from # the model. We strongly recommend to always run fgpm_factory with at least a # small nugget in order to prevent loss of configurations. By default fgpm_factory # runs with 1e-8, which is enough in most cases. xm@log.crashes # } # \donttest{ # parallelization in the model factory_____________________________________________________ # generating input and output data set.seed(100) n.tr <- 243 sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)), x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)), x5 = seq(0,1,length = n.tr^(1/5))) fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22)) sOut <- fgp_BB7(sIn, fIn, n.tr) # calling fgpm_factory in parallel cl <- parallel::makeCluster(2) xm.par <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, par.clust = cl) # (~260 seconds) parallel::stopCluster(cl) # NOTE: in order to provide progress bars for the monitoring of time consuming processes # ran in parallel, funGp relies on the doFuture and future packages. Parallel processes # suddenly interrupted by the user tend to leave corrupt connections. This problem is # originated outside funGp, which limits our control over it. On section 4.1 of the # of funGp, we provide a temporary solution to the issue and we remain attentive in # case it appears a more elegant way to handle it or a manner to suppress it. # # funGp manual: https://hal.archives-ouvertes.fr/hal-02536624 # }