NEWS


qpcR 1.4

Changes and bug-fixes in functions:

qpcR 1.3

New functions: none Changes and bug-fixes in functions:

New functions: * included 'expSDM' as a new model that fits an exponential model from 1...SDM. The parameters obtained are => a = N0, Eff = exp(b). * added 8 different smoothers for smoothing qPCR data prior to fitting. These are are implemented in 'modlist' with parameters to be set with "smoothPAR": "lowess": Lowess smoothing, see lowess, parameter in smoothPAR: f. "supsmu": Friedman's SuperSmoother, see supsmu, parameter in smoothPAR: span. "spline": Smoothing spline, see smooth.spline, parameter in smoothPAR: spar. "savgol": Savitzky-Golay smoother, qpcR:::savgol, parameter in smoothPAR: none. "kalman": Kalman smoother, see arima, parameter in smoothPAR: none. "runmean": Running mean, see qpcR:::runmean, parameter in smoothPAR: wsize. "whit": Whittaker smoother, see qpcR:::whittaker, parameter in smoothPAR: lambda. "ema": Exponential moving average, see qpcR:::EMA, parameter in smoothPAR: alpha. Whittaker (taken from the 'ptw' package) and EMA are recursive filters that have been implemented in C and are found in /src/smth.c It can be debated if smoothers are of any avail. The author of this package advocates the use of "spline", "savgol" or "whit" because these three smoothers have the least influence on overall curve structure.

Changes and bug-fixes in functions: * 'modlist' now sets check = NULL when nonlinear models such as cm3/makX/spl3/lin2 are used. * 'propagate' has a new second-order Taylor expansion implemented which (partly) compensates for nonlinearities in error propagation. * fixed a bug in 'ratiobatch' and 'ratioPar' when sample sizes > 10 and single runs were supplied. * fixed a bug in 'getPar' when first run did not fit, added cm3 as model. * changed rlm to lm for starting parameters in all sigmoidal models (problems with 0 baseline). * included conversion of data.frame into matrix in 'ratiocalc' to eliminate strange behavior in case of no replicates (bug kindly provided by Lucy Ternent). * covariance matrix calculation within 'propagate' is now based on "pairwise.complete.obs", because "complete.obs" resulted in inconsistent results depending on sample order (bug kindly provided by Gottfried Martin). As this can result in covariance matrices that are not positive definite, a transformation using Matrix:::nearPD() has been included. * fixed a bug in 'ratiocalc' that resulted in unrealistic Monte Carlo Simulations when using mak/cm3 models.

Removals: * removed 'batchstat', 'curvemean', 'rep2mod', b3/l3 models, 'calib' function with "tilted threshold curve" (and renamed 'calib2' to 'calib'), 'HQIC', 'meanlist', 'neill.test' and 'pcropt2' because they were of more or less academic value but in practice barely useful.

Gargantuan update!

New functions: *pcrfit has been revamped and is now based solely on the nlsLM function of the "minpack.lm" package. Fitting by this Levenberg-Marquardt implementation is so fast and robust in its convergence characteristics that pre-optimization of starting values by optim could safely be removed. *pcrfit can also now use a "weighting expression" to obtain weighted fits. So in addition to using a vector with weights (such as "weights =?1:10"), five different expressions can be used to obtain specific weights:?"x" reflects the predictor values?(cycles), "y" the response (fluorescence) values, "error" the error (standard deviation) of replicates, "fitted" the fitted values and "resid" the residual values of the fit. In the latter two cases the data is fit unweighted, fitted/residual values are extracted and the data refit with these values as weights. Example (weighting by inverse variance): pcrfit(reps, 1, 2, model = l4, weights =?"1/error^2"). See 'Details' of this function in the qpcR?manual. The expression can also be used in a batch format with modlist. *mak2/mak3 models have been redesigned so that they fit without a grid-search over the parameters. This was made possible with the new pcrfit version. The older versions that conduct a grid-search have been renamed to mak2i/mak3i and should be used in cases where mak2/mak3 give convergence errors. They have also been optimized in respect to their starting parameters. Added four new models: lin2 is a bilinear model (two linear models connected by an exponential) from Peter Buchwald (see here) and which is fit from cycle 1 to first derivative max (FDM). The range can also be tweaked by the "offset" argument in pcrfit. cm3 is a recent model developed by Carr &?Moore (see here) which, like mak2/mak3, belongs to the family of mechanistic models (no estimation of Cq/efficiency values but direct calculation of initial template fluorescence (D0) from the fit). In contrast to mak2/mak3, cm3 fits the complete sigmoid and not a subset thereof. It can also be used as a model within ratiocalc/pcrbatch. spl3 is a new cubic spline function, which like all cubic splines treats every point in the curve as exact (gives zero residual value). It was implemented for comparison purposes. Finally, linexp was implemented, a linear-exponential model with a sloped baseline: a * exp(b??x) + k * x + c. This model can also be used in exponential fitting by expfit, by setting model =?"linexp". *propagate can now do Monte-Carlo simulations based on a truncated multivariate normal distribution. For each of the input variables, a 95% confidence interval is calculated. The bounds of these intervals are plugged into function qpcR:::tmvrnorm (taken from tmvtnorm:::rtmvnorm.rejection) and a truncated multivariate distribution is obtained which is then processed in the usual way. The reason for using a truncated form is that the distribution of the result will have less extreme values (that affect the confidence interval) and also often less skewness. *Added ratioPar, which is effectively a version of ratiobatch for external data (Cq/efficiency values). It conducts the complete batch ratio calculation and error propagation approach as ratiobatch, but is specifically tailored to work with vectors of external Cq and efficiency values, as obtained from many qPCR softwares. *Added 6 new datasets in the package and on the 'Datasets' page: karlen1, karlen2, karlen3 => dilution data from Karlen et al. (2007). reps384 (a 380-replicate set), dil4reps94 (four 10-fold dilutions with 94 replicates each) and competimer (a setup with 7 concentrations of inhibitor) from Ruijter et al. (2012).

Changes and bug-fixes in functions: *pcrfit has a new argument "offset", which defines an offset cycle from the second derivative max (SDM) and defines the cut-off value for fitting mak2/mak3 models. It replaces the parMAK function, which has been removed. *qpcR:::fetchData has been revamped so PRESS can be run on models with several predictor variables. *efficiency has a new "baseline" method ("baseline =?TRUE") which fits the sigmoidal model, extracts the baseline parameter ("c"), subtracts this value from the data and then refits the model. This is useful for qPCR runs with significant baseline values (TaqMan etc...) since this has substantial impact on efficiency estimation. "baseline" is now also present as an argument in modlist and pcrbatch and substitutes the former "backsub". *efficiency argument?"type" has been renamed to "method", which makes it compatible within the efficiency function (avoids double argument collision). *expGrowth model has been tweaked to add a small value (1E-12) to zero fluorescence data which avoids -Inf errors when logarithmizing. *A bug was removed in efficiency that resulted in Inf's and wrong positions for SDM. *resplot now displays standardized residuals by default. *expfit now fits an exponential model between cycle 1 and SDM?per default, and not between outlier cycle/SDM as before. This can be changed with argument "method", however. *Argument?"refmean" has been set to FALSE in function ratiobatch, so reference gene averaging is per default switched off. *A terminating check for 1?< Efficiency < 2 has been removed from ratiocalc, as some curves gave efficiencies of 2.02 or so... *Added the "basefac" argument to modlist/pcrbatch/efficiency. It is useful in tweaking the baseline value obtained from averaging the baseline cycles. Example: "baseline =?1:10" & "basefac =?0.95" will calculate 0.95 * mean(F1:F10). *Modified predict.pcrfit so that predictions from mak2/mak3/cm3/spl3 models don't give gradient errors (because they are not differentiable).

Gigantomaniac update!

New functions: *After many requests, a function for multiple reference gene normalization has been added (although I'm not really a fan of this approach): refmean. This function averages the expression of several reference genes for the same sample according to the method described by Vandesompele et al. (2002). The implementation in this package is based on ARITHMETIC averaging of the threshold cycles prior to ratio calculation, as opposed to the published method which conducts GEOMETRIC averaging of relative expression values (quantities). Both methods result in the same, due to logarithmic identity of the arithmetic and geometric mean: (prod(Exi))^(1/n) = (1/n) * logE(prod(E^xi)) = (1/n) * sum(xi). See here and the extensive documentation and 'Examples' in the manual that compare this implementation with results from the 'geNorm' software. Essentially, reference genes for the same sample such as "r1c1, r1c1, r2c1, r2c1" are replaced by a vector of same length and equidistant values which have the overall grand mean and standard deviation obtained from the reference set. This is done by internal function qpcR:::makeStat, which calculates a shifted and scaled Z-transformation. This resulting averaged vector is then used in ratiocalc to calculate ratios, hence a complete error propagation treatment is included in the reference gene averaging and ratio calculation (as advocated in Nordgard et al., 2006). The refmean function has been integrated into ratiobatch with refmean =?TRUE, so by default reference gene averaging per sample is turned on. *Added LRE, a function to analyze qPCR data by the "linear regression of efficiency" method from Rutledge et al. (2008). This is an iterative method similar to sliwin, but while sliwin regresses cycle number versus log(fluorescence), LRE regresses fluorescence versus efficiency. Hence, the former is based on assuming constant efficiency, while the latter assumes cycle-dependent efficiency. *Added the Hannan-Quinn Information Criterion as function HQIC. This function penalizes n more than BIC: -2 * logLik(model) + 2 * k * log(log(n)). This is just for comparison purposes... *Added 5 new datasets in the package and on the 'Datasets' page: vermeulen1 & vermeulen2 => High-throughput data (1280 runs of 64 genes) and the corresponding dilution data for calibration curves, taken from Vermeulen et al. (2009). lievens1, lievens2, lievens3 => 5-fold dilution and inhibition experiments with 18 replicates each from Lievens et al. (2012).

Changes and bug-fixes in functions: *modlist and pcrbatch have a new "exclude" parameter in which the user can define by some character vector which samples to automatically delete prior to fitting (i.e. by empty names or a regular expression). *pcrbatch now has a new parameter "method", in which the user can define which methods to use and concatenate as results. Example: method = c("sigfit", "sliwin", "mak3") will do consecutive fitting of a sigmoidal model, window-of-linearity and a mechanistic model. See new 'Examples' there. *sliwin has an added "verbose" parameter. *plot.pcrfit has a new plot type: "image". This is a "heatmap"-like display which is suitable for visualizing high-throughput qPCR data (> 200 runs). See 'Screenshots' page. The qPCR raw fluorescence values are "flattened" and the values displayed by a color map ("heat.colors"). The user is encouraged to try this function and email me for further modifications/improvements. *LR has been renamed to llratio to not confuse it with the new LRE function as described above. *The internal parameters in sliwin have been changed a bit, which results in better borders of the fit. *pcrimport and pcrimport2 have now by default set "check.names = FALSE", so that after import "r1c1, r1c1" is not transformed into "r1c1, r1c1.1" which made ratiocalc fail. *Modified the resplot function so that it displays a sum-of-residuals for replicate data. *pcrGOF now automatically calculates a (reduced) chi2 value by fitchisq in case of replicates and has its internal PRESS function made fail-safe by 'tryCatch'. *pcropt1 has been modified so that the result matrix is displayed as a rank-colored bubble plut, by use of the new internal function qpcR:::bubbleplot. *pcrsim now takes a fitted object of class "pcrfit" as input. *The takeoff function for calculating the "take-off point" according to Tichopad et al. (2003) has been revamped. *In pcrfit, the check for negative eigenvalues of the Hessian matrix has been removed because internal nls function does enough checking. *BIC is now included in evidence to calculate the evidence ratio. *replist has a new parameter "doFit". If it is set to "FALSE", the replicate data is just aggregated WITHOUT refitting. This can be useful for plotting only the replicate raw data. *The manual has been completely overworked for consistent syntax.

Tyrannosauric update!

* Added another mechanistic model: chag. This was developed by Alexander Chagovetz and presented as a poster at the qPCR 2011 meeting in Europe. The poster can be downloaded here. Similar to mak2/mak3, PCR data is modelled by a mechanistic recurrence relation, derived from solving partial differential equations describing PCR mechanistics such as annealing, elongation, reagent depletion, etc. As all other functions, it can be called by pcrfit(yourdata, 1, 2, chag) or likewise. In contrast to mak2/mak3, chag models the complete sigmoidal curve. Information about this model can be found with ?chag. It is still experimental...
* A normalized version of mak3 has been added: mak3n. Normalizing the qPCR data within [0, 1] results in the starting parameters for the fitting process being in a reasonably constant range. Hence, less iterations are needed for the convergence of the fit parameters (~ 150 ms/curve).
* All mak models can now be tweaked with function parMAK(). See help there.
* Dataset 'htPCR' has been added, courtesy of Roman Bruno. This is a qPCR dataset with 8858 (!) runs, coming from a Fluidigm system. It is ideal for testing high-throughput algorithms and methods for automatic fail-checking of qPCR data.
* replist has been upgraded so it automatically removes failed replicate fits instead of stopping.
* made cbind.na et al. invisible, as they will be transferred to another package. They can still be called with qpcR:::cbind.na().
* PRESS, pcrboot and pcrimport have been updated (minor bugs).
* Datasets 'sisti1' and 'sisti2' have been removed (they were partially redundant with the 'guescini' datasets).
* Function outlier (identification of the first significant cycle in the exponential phase) has been renamed to takeoff in order to not confuse it with sigmoidal or kinetic outliers.
* sliwin has been revamped. It can now do an iterative baseline optimization similar to the one described in Ruijter et al. (2009). The baseline is iterated in a user-defined range (100 steps) and for each baseline value, the best sliding window is found. The criterion for this has also been updated: either maximum R2 or the baseline delivering the least variation in the slope of all sliding windows. The sliding window size can also be iterated. Is much slower if baseline optimization is selected, but efficiency and F0 values are, according to Ruijter et al. (2009), estimated more realistically.
* KOD is now the main function for sigmoidal and kinetic outlier detection. While the former is useful for identifying if single PCR reactions failed to amplify (lack of sigmoidal structure), the latter is used in context of replicate runs for the identification of reactions that deviate significantly from the remaining replicates. The included methods are: uni1 => kinetic outlier according to Bar et al. (2003), based on efficiency; uni2 => sigmoidal structure detection by comparing first and second derivative maxima; multi1 => kinetic outlier method #1 according to Tichopad et al. (2010), using first and second derivative maxima; multi2 => method #2 according to Tichopad et al. (2010), using the slope at FDM and Fmax as parameters; multi3 => kinetic outlier method as in Sisti et al. (2010) using Fmax, slope at FDM and fluorescence at FDM as parameters. Please read the 'Details' section in ?KOD.
* SOD (sigmoidal outlier detection) has been removed and included in modlist. See below.
* modlist and replist have been significantly upgraded. A labeling vector can be supplied to modlist, which is automatically updated (that is, labels are removed), when remove = TRUE and certain PCR runs are sigmoidal outliers. This way, labels for defining control genes and genes-of-interest are automatically updated prior to using ratiocalc or ratiobatch. Sigmoidal outlier detection (SOD) is now included in modlist and switched on by default, so that failed runs are automatically tagged. If remove = "fit", they are removed from the final model list (and an optional label vector updated). Parameters can now be supplied to outlier detection (parKOD), smoothing (smoothPAR) and fitting (optPAR). Likewise, replist has now kinetic outlier detection (KOD) included, to either detect / tag kinetic outliers or remove them. The default is being switched off. A new names = "first" option has been added in order to name replicate models by the first run in the replicate set. 

Monstrous update!

* The code to propagate has been revamped.
* meltcurve now contains original and analyzed data. More plotting options have been added (i.e. when data was already in first derivative format). Peak areas (by internal function qpcR:::peakArea) can now be calculated and the user can define a cutoff value for the peaks as a quality criterion.
* 'minqa' has been removed as a fitting algorithm, because it was never used (Levenberg-Marquardt method is so robust, that 'minqa' was never called...).
* Added getPar as a function for quick extraction of efficiencies/threshold cycles from high-throughput data (8500 runs take about 150 seconds). Request from Roman Bruno.
* replist has been revamped in its code and now automatically removes unfitted runs and their entries in the 'group' definition.
* modlist can now do internal smoothing and scaling. It now tags failed (in respect to sigmoidal structure) fits, so that these, together with the original 'group' definition, are automatically omitted from ratiocalc or ratiobatch.
* ratiocalc and ratiobatch can now use vectors of external efficiencies/threshold cycles (as acquired by other methods...) for ratio calculation. Request from Prof. John Fowler.
* pcrfit can now handle replicates (as in modlist, but only for one set of replicates!).
* Removed baro5, w3 and w4 as sigmoidal models, because they turned out to be of minor importance in sigmoidal fitting to qPCR data => high residuals in fits.
* resplot can now do a fancy curve/residuals overlay plot with two y-axes.
* The most important changes:
* Function pcrimport was completely redesigned. By querying the user through a number of steps, any qPCR data independent of hardware/software can be imported. The data is transformed into the format necessary for qpcR by rotating data (if cycles are in rows), eliminating unneeded columns/rows, normalizing the reporter dye data (i.e. SybrGreen) by a reference dye (i.e. ROX) etc. This way, the user can define a 'format template' which is automatically stored on the hard drive and can be reloaded in future analyses. This also includes the option of batch analyzing many datasets in a directory using the same template. Several datasets (format01.txt and format02a-d, in directories 'add01' and 'add02' of the package) have been added to illustrate the features of the new design. Please read and try the examples given in the 'Examples' section of '?pcrimport'.
* Added the two seven-parameter models b7 and l7 (logistic and log-logistic) as a result from a collaboration with Prof Joel Tellinghuisen. These two new models are extremely interesting as they model the important parts of the amplification curve with outstanding small residuals and the overall goodness-of-fit (as measured by R-square, BIC, AICc etc) is significantly better than with all other models. The 7-parameter models are a mixture of a linear model (k1*x), a quadratic model (k2*x^2) and a 5-parameter sigmoidal (see '?b7'). Especially the quadratic term results in a very good fit with small residual values in the important exponential region. Accordingly, mselect has been modified to include these models in model selection. The user is encouraged to try these two new models on his/her data, and observe, if mselect chooses these two models. This is often the case: datasets 'rutledge': 62/120; reps: 20/28; guescini1: 34/84; batsch1: 7/15; sisti1: 28/72; boggy: 11/16.
* Through a collaboration with Gregory Boggy, his mechanistic 'mak2' model described here has been implemented within the usual fitting framework as mak2. As a further extension, we developed a mak3 model which improves the performance of mak2 by adding a slope parameter. See '?mak3' for details and also examples The advantage of these implicit models (meaning that the fluorescence is not modeled as being a function the cycle numbers but from the fluorescence value of the preceeding cycle: F(n) = F(n-1) + k*log(F(n-1)/k) ), is that F(0) is deduced from a nonlinear least-squares fitting and makes efficiencies/threshold cycles superfluous. The estimated quantity D0 can be directly used for calibration curve analysis (see paper) and even for direct calculation of expression ratios by R = D01/D02. Because of this, all important functions pcrfit, mselect, pcrbatch, ratiocalc and ratiobatch have been modified to include this model. Please read the documentation to these functions to get more information. 

*Added dataset 'boggy' from Boggy et al. (2010). *Implemented a function for the import of data obtained from ABI instruments (qpcR:::ABImport()). This is still experimental, hence no documentation. *nls.lm has now increase default iterations (1000). This might solve some convergence problems. *mselect has been changed to include the l6 and b6 six-parameter models. *data.frame.na has now default 'stringsAsFactors =?FALSE' to not automatically convert character vectors to factors. *ratiocalc has been altered to a less verbose output. *efficiency has been simplified in its code. *Added a function for melting curve analysis (meltcurve). It can automatically analyze melting fluorescence data from a dataframe containing temperature and corresponding fluorescence values. The melting curves are smoothed, first derivatives are calculated and peaks automatically detected. A graphical output is given as a plot matrix of the single runs and the Tm values of the products are returned as a list.

*Fixed a bug in pcrfit when NA's are present in the fluorescence values. The new version can now handle datasets with unequal number of fluorescence values (i.e. two qPCR runs with 40 and 45 cycles). *In order to handle unequal length data, I have implemented the functions cbind.na, rbind.na and data.frame.na, which can create and bind matrices from unequal length vectors. Instead of the R generic functions cbind, rbind and data.frame which replicate the vector values to length or give a stop, these three new functions fill to maximum occuring length with NA's before calling the generics. Actually, I think these are quite useful outside of qPCR?analysis... *Added function baseline which, if a six-parameter model of type l6 or b6 was used, subtracts the shift (parameter c) and baseline slope (parameter k) value from the original data and also optionally refits the baselined data with a new model. This is experimental... *Added function qpcR.news() which displays the latest changes (the same as written here).

*Fast update! Added 'minqa'?(minimum quadratic approximation) as a new option for pcrfit. This is essentially a derivative free optimization method that might succeed in fitting curves where other methods fail... *modlist now tags the names of runs in case of failed fitting (name) and can also optionally remove them. In addition, SOD also tags the names (name) in case that the runs lack sigmoidal curvature. *plot.pcrfit has been revamped, so that all graphical parameters can be tweaked. It now also displays failed fitting with RED names and outliers as defined by KOD or SOD with BLUE?names.

*Added more functionality in respect to 'outlier detection'. Function SOD identifies 'bad' runs (those that failed to really amplify) by inspecting if the amplification curve has any sigmoidal structure. It does so by checking if the first derivative maximum of the curve is within the next 10 cycles of the second derivative maximum. I had a look at several hundered curves and noticed that the FDM is usually never more than 3 cycles away from the SDM if amplification occurs (have a look at the documentation to this function!). It is unpublished, but works well anyhow. rep2mod is a new function to convert an object of class 'replist' back to a 'modlist'?preserving the grouping structure as an attribute. This can be useful sometimes... *To handle the new SOD function, modlist, replist, is.outlier and plot.pcrfit have been modified to show information on those runs that were tagged as 'outlier'.

qpcR 1.2

*sliwin failed when some fluorescence values were 0 (because of log(0)?=?-Inf). This was fixed. *ratiocalc now delivers an error message if propagate fails. *modlist has been modified so that functions such as PRESS or pcrboot which use an update function work in a batch format with 'lapply'. *Weighted nonlinear fitting has been added to pcrfit. This is useful as only the baseline region and cycles up to the second derivative cycle number (cpD2) are the really relevant parts of the curve. I am in the process of investigating the benefit of weighting in qPCR analysis... *Rsq now returns R2 for weighted fits (of class 'nls', 'lm', 'glm', 'nlme', 'drc' etc). Added l6 and b6 as six-parameter log-logistic and logistic models. The additional term k? x is supposed to model the baseline region better. See under ?b6. Again, I'm investigating this...

qpcR 1.1

qpcR 1.0