CPMGFit
The program cpmgfit is aLevenberg-Marquardt non-linear least squares fitting
program designed for fitting CPMG relaxation dispersion data to characterize chemical
exchange phenomena in NMR spectroscopy. It can be used in an interactive or batch
mode. Output from the program is intended to be graphed using the GRACE
program. Output is directed to standard output and can be redirected to a file
or piped directly into XMGR.
Usage
The program is executed with the following command:
cpmgfit [-help -grid -raw -jack -noerror -debug -xmgr -r seed -m
filename -f filename]
in which the bracketed entries are options described below.
The output can be piped to XMGR using
the command string:
cpmgfit [options] -xmgr -f filename | xmgr -type xydy -source stdin
The output can be simultaneously directed to a file and piped to XMGR using
the command string:
cpmgfit [options] -xmgr -f filename | tee file.output | xmgr -type xydy -source
stdin
- -help
- Using this option alone will provide a brief synopsis of the program.
- -grid
- By default, curvefit assumes that the user will provide initial guesses
for the parameter values to be optimized. If the -grid option is specified,
then the the initial values will be chosen by a grid search. The user must
provide lower bound, upper bound and number of grid steps to use for each
parameter.
- -jack
- By default, curvefit provides estimates of the uncertainties in the fitted
parameters both from the covariance matrix and from a Monte Carlo simulation.
If -jack is specified, then a jackknife simulation is performed instead of
the Monte Carlo simulation.
- -debug
- Echo the input back to the terminal for debugging purposes.
-
- -noerror
- By default, curvefit assumes that data will be input as (x, y, dy) triples
(where dy is the uncertainty in y). If -noerror is specified, then only (x,y)
pairs will be read. -noerror implies -jack because the Monte Carlo simulations
cannot be done without uncertainties being provided.
- -xmgr
- Produce output for plotting using the XMGR
software package.
- -raw
- The data will be read from a plain file without any header information.
The user will be prompted for other information.
- -r seed
- Specifies an initial seed for the random number generator. If USE_GETSEED
is defined in the Makefile for the program, then entering a seed is not necessary
because the program will use the system clock to generate the seed. The routine
getseed.f may need to be re-written for use on systems other than Silicon
Graphics in order to use the USE_GETSEED compiler option.
-
- -m filename
- Specifies a filename to be used for writing the values of the fitted paramters
determined for each Monte Carlo step in the Monte Carlo simulation of uncertainties.
This file is useful for performing error analysis during subsequent calculations
using fitted parameters.
- -f filename
- Specifies a filename to be read for input, rather than interactive data
entry from the terminal. The format of the input file is given below.
Input File Format
The input file format used with the -f flag is given below. In general, the
input consists of a keyword (title, function, xmgr, or data) followed by
input values (that may extend over more than one line as described below).
Strings that contain spaces must be enclosed in double quotes. The program is case sensitive.
Lines starting with "#" are treated as comments.
Input is free format and blank lines, tabs, and blanks are ignored.
title string
fields #fields field_1 .... field_n
function string
string [opt/fix] real real integer
.
.
.
string [opt/fix] real real integer
xmgr
string
.
.
.
string
data
real real real real
.
.
.
real real real real
The input associated with each keyword is described below:
- title string
- A title for the data of up to 80 characters
- fields
- This keyword is followed by the number of static magnetic fields used for
data collection and then by the values of the static magnetic fields in units
of tesla.
- function string
- String is one of the defined function names. The functions are listed using
the command curvefit -help. The function keyword is followd by M input
lines, one for each parameter in the function. In the default mode, each line
consists of a string defining the name of the parameter and a real number
defining the initial guess for the parameter. If -grid is specified,
the parameter string is followed by three entries: a real number defining
the lower bound for the parameter, a real number defining the upper bound
for the parameter and an integer defining the number of search steps between
the lower and upper bounds.
The string defining the name of the parameter can be followed by an optional string with the value 'fix' or 'opt'. If the keyword 'opt' is specified, then the following values on the line are treated exactly as if the string 'opt' were omitted. If the string value is 'fix', then the next two fields are read as fixed values, not to be optimized, of the parameter and the error in the fixed parameter (which can be set to zero). The error is used to perform a Monte Carlo simulation of the uncertainties in the fitted parameters due to variations in the fixed parameters.
- xmgr
- This keyword is followed by options used for XMGR. Each of these inputs
consists of the keyword @ followed by a string setting some XMGR option. See
the sample input files provided or the XMGR help pages for more information.
If the -xmgr option is not specified, these entries are ignored.
- data
- This keyword is followed by N lines of data, one line for each data point.
The data is input as (x, y, dy, field) quadruples, unless the -noerror
option is specified. In this case, the data is input as (x,y, field) triples.
x = 1/tcp, y = R2(1/tcp), in which tcp is the delay between 180 degree pulses
in the CPMG experiment and R2(1/tcp) is the relaxation rate constant; dy is
the experimental uncertainty in R2(1/tcp). If no uncertainties are specified
for the y values, then the data may need to be scaled to obtain good convergence
of the program.
Pre-defined Functions
The following fitting functions are pre-defined in the program:
General CPMG function for all time scales
R2(1/tcp)=f(R2,papb,dw,kex)
Function: Full_CPMG
Parameters: R2, papb, dw, kex
Fast limit cpmg function
R2(1/tcp)=R2+Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))
Function: CPMG
Parameters: R2, Rex, Tau
Ishima and Torchia approximation for skewed populations and all time scales
R2(1/tcp)=R2+Rex/(1+Tau*sqrt(144/tcp**4+PaDw**4))
Function: Ishima'
Parameters: R2, Rex, PaDw, Tau
Full CPMG function without R2 offset
R2(1/tcp)=f(papb,dw,kex)'
Function: Full_Rex
Parameters: papb, dw, kex
Fast limit cpmg function without R2 offset
R2(1/tcp)=Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))
Function: Rex
Parameters: Rex, Tau
Fast limit cpmg function for 3 sites
R2(1/tcp)=R2+Sum{Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))}
Function: 3-site_CPMG
Parameters: R2,Rex1, Tau1, Rex2, Tau2
Fast limit cpmg function for 3 sites without R2 offset
R2(1/tcp)=Sum{Rex*(1 - (2*Tau/tcp)*tanh(1/(2*Tau/tcp)))}
Function: 3-site_CPMG
Parameters: R2,Rex1, Tau1, Rex2, Tau2
In these expressions, pa and pb are the site populations, dw is the chemical
shift difference between sites (in units of 1/s), kex = 1/Tau is the exchange
rate constant, and Rex = pa pb dw^2 / kex. References for the above fitting
functions are given below. Instructions for adding additional functions are
given below.
Sample Input and Output Files
A sample input file is given below:
title cys38
fields 2 11.74 14.1
function CPMG
R2 4 10 20
Rex 0 100.0 100
tex 0 10.0 100
xmgr
@ XAXIS LABEL "1/tcp (1/ms)"
@ YAXIS LABEL "R2(tcp) (1/s)"
@ XAXIS TICKLABEL FORMAT DECIMAL
@ YAXIS TICKLABEL FORMAT DECIMAL
@ XAXIS TICKLABEL CHAR SIZE 0.8
@ YAXIS TICKLABEL CHAR SIZE 0.8
@ WORLD XMIN 0.0
data
1.000 7.39 0.07 11.74
0.667 8.35 0.10 11.74
0.500 9.29 0.06 11.74
0.250 11.24 0.24 11.74
0.152 12.49 0.23 11.74
0.0926 13.23 0.15 11.74
0.0155 15.07 0.07 11.74
#
1.000 8.38 0.38 14.08
0.667 9.64 0.14 14.08
0.500 11.53 0.31 14.08
0.250 14.08 0.30 14.08
0.152 15.78 0.14 14.08
0.0926 16.56 0.34 14.08
0.0155 18.69 0.41 14.08
If the -raw flag is used, the input file format consists only of data in (x,y,dy,field)
quadruples or (x,y,field) triples:
real real real real
.
.
.
real real real real
A sample input file is given below:
1.000 7.39 0.07 11.74
0.667 8.35 0.10 11.74
0.500 9.29 0.06 11.74
0.250 11.24 0.24 11.74
0.152 12.49 0.23 11.74
0.0926 13.23 0.15 11.74
0.0155 15.07 0.07 11.74
1.000 8.38 0.38 14.08
0.667 9.64 0.14 14.08
0.500 11.53 0.31 14.08
0.250 14.08 0.30 14.08
0.152 15.78 0.14 14.08
0.0926 16.56 0.34 14.08
0.0155 18.69 0.41 14.08
The output from the program is given below for the above input file:
# Least-Squares Curve Fitting Results
#
# title cys38
# function CPMG
# equation y=R2+Rex*(1 - 2*Tau*x*tanh(1/(2*Tau*x)))
# points 14
# X2 57.2108
# X2(red) 5.2010
#
# Parameter Fitted_Value Fitted_Error Sim_value Sim_error
# R2 6.4222 0.1075 6.4234 0.1046
# Rex 8.6279 0.1047 8.6274 0.1045
# Tau 0.7754 0.0223 0.7760 0.0210
#
# %tile X2
# 0.0500 4.8460
# 0.1000 5.6655
# 0.1500 6.7263
# 0.2000 7.2215
# 0.2500 8.0515
# 0.3000 8.5073
# 0.3500 9.0708
# 0.4000 9.6372
# 0.4500 10.1745
# 0.5000 10.6080
# 0.5500 11.1242
# 0.6000 11.5896
# 0.6500 12.3084
# 0.7000 12.9874
# 0.7500 13.7660
# 0.8000 14.6704
# 0.8500 16.0189
# 0.9000 17.8107
# 0.9500 20.1585
#
Additional output is appended to the above if the -xmgr option
is specified. The output is mostly self-explanatory. The reduced chi-square
variable [X2(red)] is given by X2/(N-M), in which X2 is the chi-square value
for the best-fit model, N is the number of data points, and M is the number of
parameters in the function. If Monte Carlo simulations are performed, the
distribution of X2 is estimated and the percentiles of the distribution (from
5% to 95%) are reported.
The XMGR output produced using the above input file appears below.
Batch processing
A simple script 'batch_curve' is provided with the distribution that will
process all the data files in the current working directory
with a designated extension as part of their
file names. Output files will be written to disk and the fitted data will be displayed using XMGR. After viewing a given data set, exit XMGR to enable the script to proceed to the next data set.
For example, the command:
batch_curve DATA
will fit all files with names of the form string.DATA and produce output files string.DATA.out.
References
The linear least squares algorithm, non-linear Levenberg-Marquardt fitting algorithm
and Monte Carlo simulation procedure are adapted from the text Numerical Recipes
(Press, et al, 1992). This source should be consulted for additional information
concerning the algorithms. The jackknife simulation procedure is described in
Data Analysis and Regression. A Second Course in Statistics (Mosteller
and Tukey, 1977). References for the functional forms for curve-fitting relaxation
dispersion data are found in the review article "Nuclear magnetic resonance
methods for quantifying microsecond-to-millisecond motions in biological macromolecules"
(Palmer, Kroenke, and Loria; Methods Enzymol. 339:204-238 (2001)).
The Function Library
The library of defined functions is maintained in the file funclib.f.
This file contains three FORTRAN subprograms that must be modified to add
additional functions the the library.
The proper locations for modifying the funclib.f file are
indicated in the file itself.
initf(nfuncs,fnames,feqs,nparms,parnam)
This subroutine provides basic information concerning the defined functions.
To add a function, perform the following steps:
- Increase the variable nfuncs by one. This is the total number of
defined functions.
- Set fnames(i) = 'string', where string is the function name
without embedded spaces or tabs and i is the index of the function in the
library.
- Set feqs(i) = 'string', where string is the functional form y=f(x) without
embedded spaces and i is the index of the function in the library.
- Set nparms(i) = M, in which M is equal to the number of parameters in the function.
- For k=1 to M, set parnam(k,i) = 'string' in which string is the name of the parameter, without embedded spaces, and i is the index of the function in the
library.
func(fname,x,a,M)
This subroutine returns the value of the function fname
at a point x. New functions
are added by adding an 'elseif' block in the subroutine:
elseif (fname.eq.'string') then
func='expression'
in which string is the function name and expression
is the FORTRAN representation
of the function. The parameters of the function are a(1) to a(M).
fgrad(fname,x,a,dyda,M)
This routine returns an array dyda containing the derivatives of the
function fname
with respect to the parameters a(1) to a(M) at the point x. New
functions are added by adding an 'elseif' block in the subroutine:
elseif (fname.eq.'string') then
dyda(1)='expression'
.
.
.
dyda(M)='expression'
in which string is the function name and expression
is the FORTRAN representation
of the derivatives of the function with respect to a(1) to a(M).
Compilation
The program is written in FORTRAN 77 and is
compiled simply by typing make on the command line. The compiler type and flags are set as necessary in the Makefile. As discussed below, the BLAS library must be installed
on the workstation.
License
Copyright (C) 2001 Arthur G. Palmer
Arthur G. Palmer
Department of Biochemistry and Molecular Biophysics
Columbia University
630 West 168th Street
New York, NY 10032
email:
agp6@columbia.edu
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the
Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA.
This program uses some software routines copyrighted by Numerical Recipes Software.
You should obtain a license from Numerical Recipes if you do not have one already.
You can obtain an academic workstation license by sending your name, address,
email address, workstation hostname, workstation internet address, workstation
brand and model number, and a check for $50.00 to
Numerical Recipes Software
P.O. Box 243
Cambridge, MA 02238
Be certain to state that you want the FORTRAN version of the software. You
will also need the BLAS library installed on your workstation. This library
is normally supplied by the workstation vendor, or can be obtained from www.netlib.org.
History
- Version 1.0 - Initial release August 3, 2001
- Version 1.1 -Updated for Linux (4/26/02)
- Version 1.2 - December 15, 2006
- Version 1.21 - fixed error with Monte Carlo output - August 2, 2008
- Version 1.30 - added 3-site CPMG functions, but not released publically
- Version 1.40 - added option to fix parameter values - May 8, 2009
- Version 1.42 - minor update
- Version 1.43 - minor update, September 15, 2009