CPMGFit

The program cpmgfit is aLevenberg-Marquardt non-linear least squares fitting program designed for fitting CPMG relaxation dispersion data to characterize chemical exchange phenomena in NMR spectroscopy. It can be used in an interactive or batch mode. Output from the program is intended to be graphed using the GRACE program. Output is directed to standard output and can be redirected to a file or piped directly into XMGR.

Usage

The program is executed with the following command:

cpmgfit [-help -grid -raw -jack -noerror -debug -xmgr -r seed -m filename -f filename]

in which the bracketed entries are options described below. The output can be piped to XMGR using the command string:

cpmgfit [options] -xmgr -f filename | xmgr -type xydy -source stdin

The output can be simultaneously directed to a file and piped to XMGR using the command string:

cpmgfit [options] -xmgr -f filename | tee file.output | xmgr -type xydy -source stdin

-help
Using this option alone will provide a brief synopsis of the program.
-grid
By default, curvefit assumes that the user will provide initial guesses for the parameter values to be optimized. If the -grid option is specified, then the the initial values will be chosen by a grid search. The user must provide lower bound, upper bound and number of grid steps to use for each parameter.
-jack
By default, curvefit provides estimates of the uncertainties in the fitted parameters both from the covariance matrix and from a Monte Carlo simulation. If -jack is specified, then a jackknife simulation is performed instead of the Monte Carlo simulation.
-debug
Echo the input back to the terminal for debugging purposes.
 
-noerror
By default, curvefit assumes that data will be input as (x, y, dy) triples (where dy is the uncertainty in y). If -noerror is specified, then only (x,y) pairs will be read. -noerror implies -jack because the Monte Carlo simulations cannot be done without uncertainties being provided.
-xmgr
Produce output for plotting using the XMGR software package.
-raw
The data will be read from a plain file without any header information. The user will be prompted for other information.
-r seed
Specifies an initial seed for the random number generator. If USE_GETSEED is defined in the Makefile for the program, then entering a seed is not necessary because the program will use the system clock to generate the seed. The routine getseed.f may need to be re-written for use on systems other than Silicon Graphics in order to use the USE_GETSEED compiler option.
-m filename
Specifies a filename to be used for writing the values of the fitted paramters determined for each Monte Carlo step in the Monte Carlo simulation of uncertainties. This file is useful for performing error analysis during subsequent calculations using fitted parameters.
-f filename
Specifies a filename to be read for input, rather than interactive data entry from the terminal. The format of the input file is given below.

Input File Format

The input file format used with the -f flag is given below. In general, the input consists of a keyword (title, function, xmgr, or data) followed by input values (that may extend over more than one line as described below). Strings that contain spaces must be enclosed in double quotes. The program is case sensitive. Lines starting with "#" are treated as comments. Input is free format and blank lines, tabs, and blanks are ignored.
title        string
fields       #fields field_1 .... field_n

function     string
   string  [opt/fix] real     real     integer
                     .
                     .
                     .
   string   [opt/fix] real     real     integer
xmgr
   string
      .
      .
      .
   string
data
   real     real     real     real
              .
              .
              .
   real     real     real     real
The input associated with each keyword is described below:
title string
A title for the data of up to 80 characters
fields
This keyword is followed by the number of static magnetic fields used for data collection and then by the values of the static magnetic fields in units of tesla.
function string
String is one of the defined function names. The functions are listed using the command curvefit -help. The function keyword is followd by M input lines, one for each parameter in the function. In the default mode, each line consists of a string defining the name of the parameter and a real number defining the initial guess for the parameter. If -grid is specified, the parameter string is followed by three entries: a real number defining the lower bound for the parameter, a real number defining the upper bound for the parameter and an integer defining the number of search steps between the lower and upper bounds. The string defining the name of the parameter can be followed by an optional string with the value 'fix' or 'opt'. If the keyword 'opt' is specified, then the following values on the line are treated exactly as if the string 'opt' were omitted. If the string value is 'fix', then the next two fields are read as fixed values, not to be optimized, of the parameter and the error in the fixed parameter (which can be set to zero). The error is used to perform a Monte Carlo simulation of the uncertainties in the fitted parameters due to variations in the fixed parameters.
xmgr
This keyword is followed by options used for XMGR. Each of these inputs consists of the keyword @ followed by a string setting some XMGR option. See the sample input files provided or the XMGR help pages for more information. If the -xmgr option is not specified, these entries are ignored.
data
This keyword is followed by N lines of data, one line for each data point. The data is input as (x, y, dy, field) quadruples, unless the -noerror option is specified. In this case, the data is input as (x,y, field) triples. x = 1/tcp, y = R2(1/tcp), in which tcp is the delay between 180 degree pulses in the CPMG experiment and R2(1/tcp) is the relaxation rate constant; dy is the experimental uncertainty in R2(1/tcp). If no uncertainties are specified for the y values, then the data may need to be scaled to obtain good convergence of the program.

Pre-defined Functions

The following fitting functions are pre-defined in the program:
 General CPMG function for all time scales
     R2(1/tcp)=f(R2,papb,dw,kex)
     Function: Full_CPMG
     Parameters: R2, papb, dw, kex
 Fast limit cpmg function
     R2(1/tcp)=R2+Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))
     Function: CPMG 
     Parameters: R2, Rex, Tau
 Ishima and Torchia approximation for skewed populations and all time scales
     R2(1/tcp)=R2+Rex/(1+Tau*sqrt(144/tcp**4+PaDw**4))
     Function: Ishima' 
     Parameters: R2, Rex, PaDw, Tau
 Full CPMG function without R2 offset
     R2(1/tcp)=f(papb,dw,kex)'
     Function: Full_Rex
     Parameters: papb, dw, kex
 Fast limit cpmg function without R2 offset
     R2(1/tcp)=Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))
     Function: Rex 
     Parameters: Rex, Tau

Fast limit cpmg function for 3 sites
     R2(1/tcp)=R2+Sum{Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))}
     Function: 3-site_CPMG 
     Parameters: R2,Rex1, Tau1, Rex2, Tau2

Fast limit cpmg function for 3 sites without R2 offset
     R2(1/tcp)=Sum{Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))}
     Function: 3-site_CPMG 
     Parameters: R2,Rex1, Tau1, Rex2, Tau2

In these expressions, pa and pb are the site populations, dw is the chemical shift difference between sites (in units of 1/s), kex = 1/Tau is the exchange rate constant, and Rex = pa pb dw^2 / kex. References for the above fitting functions are given below. Instructions for adding additional functions are given below.

Sample Input and Output Files

A sample input file is given below:

title cys38


fields 2 11.74 14.1


function CPMG
R2 4 10 20
Rex 0 100.0 100
tex 0 10.0 100


xmgr
@ XAXIS LABEL "1/tcp (1/ms)"
@ YAXIS LABEL "R2(tcp) (1/s)"
@ XAXIS TICKLABEL FORMAT DECIMAL
@ YAXIS TICKLABEL FORMAT DECIMAL
@ XAXIS TICKLABEL CHAR SIZE 0.8
@ YAXIS TICKLABEL CHAR SIZE 0.8
@ WORLD XMIN 0.0

data
1.000	 7.39 	 0.07	11.74
0.667	 8.35 	 0.10	11.74
0.500	 9.29 	 0.06	11.74
0.250	 11.24 	 0.24	11.74
0.152	 12.49 	 0.23	11.74
0.0926	 13.23 	 0.15	11.74
0.0155	 15.07 	 0.07	11.74
#
1.000	 8.38 	 0.38	14.08
0.667	 9.64 	 0.14	14.08
0.500	 11.53 	 0.31	14.08
0.250	 14.08 	 0.30	14.08
0.152	 15.78 	 0.14	14.08
0.0926	 16.56 	 0.34	14.08
0.0155	 18.69 	 0.41	14.08


If the -raw flag is used, the input file format consists only of data in (x,y,dy,field) quadruples or (x,y,field) triples:
   real     real     real     real
              .
              .
              .
   real     real     real     real
A sample input file is given below:
1.000	 7.39 	 0.07	11.74
0.667	 8.35 	 0.10	11.74
0.500	 9.29 	 0.06	11.74
0.250	 11.24 	 0.24	11.74
0.152	 12.49 	 0.23	11.74
0.0926	 13.23 	 0.15	11.74
0.0155	 15.07 	 0.07	11.74

1.000	 8.38 	 0.38	14.08
0.667	 9.64 	 0.14	14.08
0.500	 11.53 	 0.31	14.08
0.250	 14.08 	 0.30	14.08
0.152	 15.78 	 0.14	14.08
0.0926	 16.56 	 0.34	14.08
0.0155	 18.69 	 0.41	14.08

The output from the program is given below for the above input file:

# Least-Squares Curve Fitting Results
#
# title    cys38                                                                           
# function CPMG                
# equation y=R2+Rex*(1 - 2*Tau*x*tanh(1/(2*Tau*x)))
# points         14
# X2               57.2108
# X2(red)           5.2010
#
# Parameter            Fitted_Value  Fitted_Error     Sim_value     Sim_error
# R2                           6.4222       0.1075          6.4234       0.1046
# Rex                          8.6279       0.1047          8.6274       0.1045
# Tau                          0.7754       0.0223          0.7760       0.0210
#
# %tile     X2
#          0.0500         4.8460
#          0.1000         5.6655
#          0.1500         6.7263
#          0.2000         7.2215
#          0.2500         8.0515
#          0.3000         8.5073
#          0.3500         9.0708
#          0.4000         9.6372
#          0.4500        10.1745
#          0.5000        10.6080
#          0.5500        11.1242
#          0.6000        11.5896
#          0.6500        12.3084
#          0.7000        12.9874
#          0.7500        13.7660
#          0.8000        14.6704
#          0.8500        16.0189
#          0.9000        17.8107
#          0.9500        20.1585
#

Additional output is appended to the above if the -xmgr option is specified. The output is mostly self-explanatory. The reduced chi-square variable [X2(red)] is given by X2/(N-M), in which X2 is the chi-square value for the best-fit model, N is the number of data points, and M is the number of parameters in the function. If Monte Carlo simulations are performed, the distribution of X2 is estimated and the percentiles of the distribution (from 5% to 95%) are reported. The XMGR output produced using the above input file appears below.

Batch processing

A simple script 'batch_curve' is provided with the distribution that will process all the data files in the current working directory with a designated extension as part of their file names. Output files will be written to disk and the fitted data will be displayed using XMGR. After viewing a given data set, exit XMGR to enable the script to proceed to the next data set. For example, the command:

batch_curve DATA

will fit all files with names of the form string.DATA and produce output files string.DATA.out.

References

The linear least squares algorithm, non-linear Levenberg-Marquardt fitting algorithm and Monte Carlo simulation procedure are adapted from the text Numerical Recipes (Press, et al, 1992). This source should be consulted for additional information concerning the algorithms. The jackknife simulation procedure is described in Data Analysis and Regression. A Second Course in Statistics (Mosteller and Tukey, 1977). References for the functional forms for curve-fitting relaxation dispersion data are found in the review article "Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules" (Palmer, Kroenke, and Loria; Methods Enzymol. 339:204-238 (2001)).

The Function Library

The library of defined functions is maintained in the file funclib.f. This file contains three FORTRAN subprograms that must be modified to add additional functions the the library. The proper locations for modifying the funclib.f file are indicated in the file itself.

initf(nfuncs,fnames,feqs,nparms,parnam)

This subroutine provides basic information concerning the defined functions. To add a function, perform the following steps:

func(fname,x,a,M)

This subroutine returns the value of the function fname at a point x. New functions are added by adding an 'elseif' block in the subroutine:
      elseif (fname.eq.'string') then
          func='expression'
in which string is the function name and expression is the FORTRAN representation of the function. The parameters of the function are a(1) to a(M).

fgrad(fname,x,a,dyda,M)

This routine returns an array dyda containing the derivatives of the function fname with respect to the parameters a(1) to a(M) at the point x. New functions are added by adding an 'elseif' block in the subroutine:
      elseif (fname.eq.'string') then
          dyda(1)='expression'
                .
                .
                .
          dyda(M)='expression'
in which string is the function name and expression is the FORTRAN representation of the derivatives of the function with respect to a(1) to a(M).

Compilation

The program is written in FORTRAN 77 and is compiled simply by typing make on the command line. The compiler type and flags are set as necessary in the Makefile. As discussed below, the BLAS library must be installed on the workstation.

License

Copyright (C) 2001 Arthur G. Palmer

Arthur G. Palmer
Department of Biochemistry and Molecular Biophysics
Columbia University
630 West 168th Street
New York, NY 10032

email: agp6@columbia.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the

Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA.

 

This program uses some software routines copyrighted by Numerical Recipes Software. You should obtain a license from Numerical Recipes if you do not have one already. You can obtain an academic workstation license by sending your name, address, email address, workstation hostname, workstation internet address, workstation brand and model number, and a check for $50.00 to

Numerical Recipes Software
P.O. Box 243
Cambridge, MA 02238

 

Be certain to state that you want the FORTRAN version of the software. You will also need the BLAS library installed on your workstation. This library is normally supplied by the workstation vendor, or can be obtained from www.netlib.org.

History