scala - scale together multiple observations of reflections
scala HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]
Keyworded input summary
References
Input and Output files
Examples
Release Notes
Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
TAILS correction
Data from Denzo
Data harvesting
This program scales together multiple observations of reflections, and merges multiple observations into an average intensity.
Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of PHI (the rotation angle), or expressed as BATCH (image) number. In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction, either expanded as spherical harmonics, or as a interpolated three-dimensional function of Phi and the spatial coordinates of the measured spot on the detector. Such three-dimensional scaling is typically ill-determined, see below for discussion of this. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ), the interpolated three-dimensional correction is similar to that described by Kabsch (Ref Kabsch (1988)).
The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates an weighted mean of the observations of the same reflection, after rejecting the outliers.
The program does three passes through the data:
Normally anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)
The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help.
Partially recorded reflections are by default included the scaling pass, as well as included in the final analysis and merging. They may optionally be excluded from the scaling (controlled by the command INTENSITIES), and excluded from the final analysis (controlled by the command FINAL). Note that this default has changed from earlier versions
The different options for the treatment of partials are set by either the PARTIALS command, effective for both scaling & merging stages; or separately for the scaling stage only (INTENSITIES command) or for the merging stage only (FINAL command).
Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.
For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags to accept or reject partial sets according to their reliability.
Summed partials:
All the parts are summed (after applying
scales) to give the total intensity, provided some checks are passed.
The number of reflections failing the checks is printed. You
should make sure that you are not losing too many reflections in these
checks.
Scaled partials:
In this option, each individual partial observation scaled up by the inverse
FRACTIONCALC, provided that the fraction is greater than <minimum_fraction>
[default = 0.5].
Data integrated with Denzo may be scaled and merged with Scala as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 4 for more details.
Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to
$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.scala
The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.
See also Data Harvesting.
In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:
ANALYSE, ANOMALOUS, BINS, CYCLES, DAMP, DNAME, DUMP, EXCLUDE, FILTER, FINAL, HISTORY, INITIAL, INSCALE, INTENSITIES, LINK, NODUMP, NOHARVEST, NOSCALE, ONLYMERGE, OUTPUT, OVERLAPMAP, PARTIALS, PNAME, PRINT, PRIVATE, REJECT, RESOLUTION, RESTORE, RSIZE, RUN, SCALES, SDCORRECTION, SKIP, SMOOTHING, TIE, TITLE, [UN]FIX, UNLINK, USECWD, WIDTH, XYBINS
Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The subkeys allow definition of a run in a flexible way. The definition of a run may use several RUN commands.
Examples:
RUN 1 BATCH 1 TO 10000 RUN 1 INCLUDE BATCH 1 TO 200 EXCLUDE 77 79 132 RUN 2 CRYSTAL 2 RUN 3 INCLUDE RANGE 0 TO 90 EXCLUDE RANGE 45 TO 48
Define layout of scales. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.
Modifiers for input standard deviations: these are modified to
sd(I) corrected = Sdfac * sqrt(sd(I)**2 + SdB*Ihl' + (Sdadd*Ihl)**2)
where Ihl is the intensity (SdB may be omitted in the input and its use is not recommended). Default values are 1.0, 0.0, 0.02.
RUN <run_number>
Define run to which this command applies: the run must have been previously
defined. If no run is defined, it applies to all runs. Different values
may be specified for fully recorded reflections (FULL) and for partially
recorded reflections (PARTIAL), or the same values may be used for both
(BOTH), e.g.
sdcorrection full 1.4 0.11 part 1.4 0.05
The keyword NOADJUST stops the automatic adjustment of the Sdfac parameters from the normal probability analysis at the beginning of the merge stage [default is ADJUST] (this applies to all runs)
With the output options SEPARATE or POSTREF, the modified Sds are written to the output file in columns SIGIC [& SIGIPRC if IPR is present]. These columns will be used by Postref but ignored on reinput to Scala.
Select the way in which partials are treated in both scaling and merging. These settings may be overridden separately for the scaling and merging steps with the INTENSITIES and FINAL commands respectively.
By default, partials are included (summed) in both scaling and in merging.
Intensities selection for scaling: which intensities to use, whether to keep Bijvoet pairs separate, and treatment of partials in scaling:
(a) Intensity selection options:
Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. Note this applies to all stages of the program, scaling & averaging.
(b) Treatment of Bijvoet-related observations
By default, all observations (I+ & I-) are treated alike in scaling. This is normally the correct thing to do, since the anomalous differences are usually small and randomly positive and negative. In a case with large anomalous differences and high redundancy, it may be better to keep the I+ & I- observations separate in the scaling. Note that typically this will severely reduce the scaling overlaps between different parts of the data, and is not recommended except in special cases.
(c) Options for treatment of partials in scaling (overrides options given under PARTIALS):
Set whether partially recorded reflections should be used in scaling, & if so, whether to use summed or scaled partials. By default summed partials are used in scaling as well as fulls. See introduction above for a description of the use of partially recorded reflections. Treatment of partials in the final averaging stage is defined with the FINAL command
Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging (FINAL) passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. If the BYRUN flag is set, rejection & deviation calculations are done only between observations from the same run. This is only sensible if each run is essentially a complete dataset, for instance a group of MAD datasets could be scaled together, treating each one as a RUN.
If ANOMALOUS ON is set, then by default the outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged "@" in the ROGUES file. In the scaling step, the outlier check includes all observations, unless anomalous observations are kept separate in scaling (INTENSITIES ANOMALOUS: this is an unusual option for special cases only).
The test for outliers is described in Appendix 5
Controls the treatment of anomalous scattering information in the merging step. Note that the option of selecting matching anomalous pairs is not recommended for normal use: it is likely to lead to seriously incomplete data in many cases, and the results should be compared carefully with those with the MATCH option switched off.
Definition of symmetry:-
Set resolution limits in Angstrom, either order, optionally for individual runs (in which case this command MUST come after definition of the run). The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If the RUN subkeyword is omitted, the limit applies to all runs. [Default use all data]
Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout
Only do the merge step, no initial analysis, no scaling (== INITIAL NONE; NOSCALE). Note that this will usually need to be combined with a RESTORE command.
Read initial scales from a SCALES file from a previous run of Scala (scales are normally dumped on every cycle, see DUMP). The number of scales defined for each run this time should typically be the same as in the dump, although a set of scale factors along ROTATION or DETECTOR may be extrapolated to additional batches which were not present in the initial scaling. The file may contain scales for runs which are not used this time, but new runs may not be added. RESTORing from a scale file which does not properly correspond to the run which generated the file is liable to give silly results. No initial analysis pass will be done unless the command INITIAL ANALYSE is given.
Define method of setting initial scales
Define amount of printing
Define number of refinement cycles, convergence limit, and weighting scheme for scale refinement
Set intensity limits or positional limits for excluding observations.
I .lt. sd(I) * SDMIN
.or. I .gt. sd(I) * SDMAX
.or. I .gt. ABSMAX
Restrain pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) or in detector plane (DETECTOR = secondary beam) to have the same value, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended (a) if scales are varied across the detector, eg TIE DETECTOR 0.1, or (b) for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.01
Control what goes in the output file. Three types of output file may be produced: (a) AVERAGE, average intensity for each hkl (I+ & I-). (b) SEPARATE, observations from input file with scale calculated, for re-input to Scala (or Postref, see POSTREF option) (c) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected.
A reference batch is always excluded from the final statistics, even if it is included in the output file (only possible with the SEPARATE option).
Other options:
Select whether or not to use summed or scaled partials in the final analysis after scale determination. If this command is missing, summed partials will be included if the input file contains a FRACTIONCALC column.
Option to fix or free TAILS parameters: by default V & A1 are free, A0 is fixed [default A0 = 0.0]. Fixing A0 may help for low resolution data particularly.
run_2 will use the same SURFACE (or SECONDARY) or TAILS parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. LINK TAILS ALL will use the same tails parameters for all runs for which TAILS parameters are refined. The keyword ALL will be assumed if omitted.
Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted, e.g. UNLINK TAILS [ALL] will use seperate tails parameters for each run.
Allow a subset of reflections to be used during the initial cycles of scaling, to speed up the program. For the first N_skip_cycles, only every N_skip'th unique reflection will be used. N_skip_cycles defaults = Ncycle-2, and the program will force 2 more cycles with all data if convergence is reached while reflections are still being skipped. You should check that convergence has been reached with all observations, particularly if the number of observations used in the early cycles is small.
Define filter level, & damp level. In the minimization, shifts corresponding to eigenvalues .lt. <Filter> are removed, <Damp> is added to all eigenvalues. [Default 1.0e-6, 0.0]
Define number of resolution bins for analysis [default 10]
Define number of bins across detector, x (=XDET) and y (YDET). Only used if XDET, YDET columns are present in input file <Ny> defaults to <Nx>. XYBINS 0 turns off analysis [default Nx = Ny = 20]
Set smoothing factors ("variances" of weights)
Don't do any scaling, just the final analysis (equivalent to CYCLES 0)
Dump all scale factors to a file after each cycle. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line. DUMP is set by default, but may be turned off with the NODUMP command.
No dump of scales to file. Default is DUMP.
This command controls the normal probability analyses
Define optional line to be added to the history records in the file. This is in addition to a line giving the date and time of the run, which is always added. Only one optional history line may be added.
Write the overlap matrix from the initial analysis to a map file assigned to MAPOUT. Note that the initial analysis is not done if the RESTORE option is used or INITIAL NONE is set.
Select binning mode on intensity
In each case, <mid-point> is the limit for the middle bin.
Compulsory columns:
H K L indices
M/ISYM partial flag, symmetry number
BATCH batch number
I intensity (integrated intensity)
SIGI sd(intensity) (integrated intensity)
Optional columns:
XDET YDET position on detector of this reflection: these
may be in any units (e.g. mm or pixels), but the
range of values must be specified in the
orientation data block for each batch. If
these columns are absent, the scale may not be
varied across the detector (i.e. only SCALES
DETECTOR 1 is valid)
ROT rotation angle of this reflection ("Phi"). If
this column is absent, only SCALES BATCH is valid.
IPR intensity (profile-fitted intensity)
SIGIPR sd(intensity) (profile-fitted intensity)
SCALE previously calculated scale factor (e.g. from
previous run of Scala). This will be applied
on input
SIGSCALE sd(SCALE)
TIME time for B-factor variation (if this is
missing, ROT is used instead)
MPART partial flag from Mosflm
FRACTIONCALC calculated fraction, required to SCALE PARTIALS
LP Lorentz/polarization correction (already applied)
H K L IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-)
Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.
SCALE & SIGSCALE - the calculated scale factor & its sd (this may be applied in another run of Scala). SCALE will be = 0.0 for reflections outside the resolution cutoff, if they are included in the output file (option OUTPUT KEEP) (see example)
SIGIC [, SIGIPRC] - the corrected standard deviations of I [and IPR], as altered by SDCORR commands. These columns are only written if a SDCORRECTION command is given to Scala.
If the OUTPUT POSTREF option is given, then also the columns IMEAN SIGIMEAN ISUM SIGISUM are added
IMEAN mean of fully-recorded reflections
ISUM summed partials (partials only)
Output columns:
H,K,L REDUCED or ORIGINAL indices (see OUTPUT options)
M/ISYM Symmetry number (REDUCED), = 1 for ORIGINAL indices
BATCH batch number as for input
I, SIGI scaled intensity & sd(I)
SCALE scale factor applied
SIGSCALE sd(SCALE)
NPART number of parts, = 1 for fulls, negated for scaled
partials, i.e. = -1 for scaled single part partial
TIME copied from input if present
XDET,YDET copied from input if present
ROT copied from input if present (averaged for
multi-part partials)
FRACTIONCALC total fraction (if present in input file)
LP copied from input if present
(I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
*** this is at present written is a format for plotting program xmgr ***
set crystal = "tfn2"
scala hklin ${crystal}_srs \
hklout ${crystal}_merge \
scales ${crystal}_${run}.scales \
rogues ${crystal}_${run}.rogues \
normplot ${crystal}_${run}.norm \
<< eof
run 1 all
intensities partial # we have few fulls
cycles 20
anomalous off # this is a native set
sdcorrection 1.3 0.02 # from a previous run
# try it with and without the tails correction: this is with
scales rotation spacing 10 bfactor on tails
reject 4 # reject outliers more than 4sd from mean
exclude eprob 1e-8 # reject very large observations, if probability
# .lt. 10**-8 (== Emax 4.3)
eof
#!/bin/csh -f
#
# Scale data from Mosflm, merge with Scala
#
scala hklin jpa_example hklout jpa_example_sc \
scales jpa.scales \
rogues jpa.rogues \
normplot jpa.norm \
anomplot jpa.anom \
<< eof-1
run 1 batch 2001 to 2049
run 2 batch 2051 to 2100
cycles 8
sdcorr 1.5 0.03
scales batch bfactor on
reject merge 4
anomalous on
partials test 0.95 1.05 maxwidth 6
eof-1
#!/bin/csh -f
#
#scala
#
cd /scr0/fm1/Temp
#
##
#==== Sort native output from Mosflm together
##
sort:
sortmtz hklout m6c8_sort.mtz << end_sort
H K L M/ISYM BATCH I SIGI
m6c8a1.mtz
m6c8a2.mtz
end_sort
#
##
#==== scale native data together, no Bfactor, smooth scale on rotation
#==== merge native
##
scala hklin m6c8_sort.mtz hklout m6c8_scala <<EOF
run 1 batch 1 to 90000
title frozen native monoclinic m6c8
scales bfactor off rotation spacing 5
resolution 25 6.1
anomalous off
reject merge 4
sdcorr 1.3 0.04
EOF
#
# Convert native data into form suitable for reinput to Scala
combat hklin m6c8_scala hklout m6c8_r << eof-r
input mtzi
labin I=IMEAN SIGI=SIGIMEAN
batch 1
eof-r
#
##
#==== Sort derivative data together
##
sort:
sortmtz hklout m6cb3_sort.mtz << end_sort
H K L M/ISYM BATCH I SIGI
m6cb3b.mtz
m6cb3c.mtz
end_sort
#
##
#==== Combine together merged native & sorted derivative data, by
# interleaving reflection records
# Must resort data after this step
##
mtzutils:
mtzutils hklin2 m6cb3_sort.mtz \
hklin1 m6c8_r \
hklout m6cb3_resort << eof-m
merge
eof-m
#
sortmtz hklin temp_m6cb3_resort hklout m6cb3_resort << eof-m
H K L M/ISYM BATCH
eof-m
#
##
#==== Scale and merge derivative data, using native data as reference (run 1)
# Allow scale factor to vary across detector, but with some restraints
# The reference data (native) is omitted from the output file
##
scala hklin m6cb3_resort.mtz hklout m6cb3_scala <<EOF
run 1 batch 1 reference
run 2 batches 10 to 23156 exclude 23152 # reject one duff batch
run 3 batches 23157 to 90000
title frozen native monoclinic m6cb3
scales bfactor off rotation spacing 5 detector 3 3
tie detector 0.1
resolution 25 6.1
reject merge 4
sdcorr 1.1 0.005
EOF
#
#
#
#exit
trunc:
truncate hklin m6cb3_scala \
hklout /ss3/fm1/Mutase/Derivs_FzM/m6cb3_F <<end-trunc
truncate
wilson
resolution 25 6.1
nresidue 1400
labout F=FM623 SIGF=SIGFM623 DANO=DANOM623 SIGDANO=SIGDANOM623
end-trunc
#!/bin/csh -f
# Define a base name for files created in this script
set name = dfxe_3d
# Input filenames for the 4 datasets at different wavelengths
set l1 = dfxe_1 # peak
set l2 = dfxe_2 # inflection
set l3 = dfxe_3 # hard remote
set l4 = dfxe_4 # 1A wavelength
# Angular spacing for smoothed Bfactor
set spacing = 10 # Bfactor spacing
### Optional branch point for restarts
###goto l1
# Sort together the initial data files
sortmtz hklout ${name}_all << eof-s
H K L M/ISYM BATCH
${l1}.mtz
${l2}.mtz
${l3}.mtz
${l4}.mtz
eof-s
###=== Step 1 ==========================================================
###=== Scale reference set to itself
###=== In this case, the reference set is all wavelengths
scale:
set ln = ref
set ll = ${name}_${ln}
scala hklin ${name}_s hklout ${ll}_sc \
scales ${ll}.scales \
rogues ${ll}.rogues \
normplot ${ll}.norm \
anomplot ${ll}.anom \
<< eof_sc${ll}
run 1 batch 1000 to 1999
run 2 batch 2000 to 2999
run 3 batch 3000 to 3999
run 4 batch 4000 to 4999
scales rotation spacing ${spacing}
anomalous off
eof_sc${ll}
###=== Step 2 ==========================================================
###=== Reformat reference data add to unmerged data
###===
rf:
# Reformat reference set for reinput (Imean only)
combat hklin ${ll}_sc hklout ${ll}_rr << eof-rr
input mtzi
batch 1
labin I=IMEAN SIGI=SIGIMEAN
eof-rr
# Put back together with main data
mtzutils hklin1 ${ll}_rr hklin2 ${name}_s hklout ${name}_all << eof-u
merge
eof-u
# Must resort after mtzutils
sortmtz hklin temp_${name}_all hklout ${name}_all << eof-m
H K L M/ISYM BATCH
eof-m
###=== Step 3 ==========================================================
###=== Scale all data relative to reference set
###=== In this case, this is done in two stages:-
###=== 1. batch scaling, to take out any discontinuities between images
###=== 2. smooth scaling, varying scales across the detector
###=== This 2nd step provides some correction for absorption
###=== If there are no discontinuities in scales between adjacent batches,
###=== step 1 should be omitted
###===
scale_1:
# 1. batch scaling
set run = rel1
set rel1 = ${name}_${run}
scala hklin ${name}_all hklout ${rel1} \
scales ${run}.scales \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-r1
title Batch scale against reference
# Define runs: run 10 (batch 1) is the merged reference
run 10 batch 1 reference
run 1 batch 1000 to 1999
run 2 batch 2000 to 2999
run 3 batch 3000 to 3999
run 4 batch 4000 to 4999
# Use partials in scaling, as there are not many fulls
intensities partial
#
scales batch brotation spacing ${spacing} bfactor on
output separate reference # output data as input, with scales added
# keep reference data for the second scaling stage
eof-r1
scale_2:
# 2. smoothed 3D scaling
set run = rel2
set rel2 = ${name}_${run}
scala hklin ${rel1} hklout ${rel2} \
scales ${run}.scales \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-r2
title Smoothed scale against reference
run 10 batch 1 reference
run 1 batch 1000 to 1999
run 2 batch 2000 to 2999
run 3 batch 3000 to 3999
run 4 batch 4000 to 4999
intensities partial
# Scales varying across the detector ( 3 x 3 scales)
scales rotation spacing ${spacing} detector 3 3
tie detector 0.2
output separate # Omit reference data this time
eof-r2
###=== Step 4 ==========================================================
###=== Split out each wavelength & merge
###=== No scaling is done here (it's already been done)
l1:
set ln = 1
set run = l${ln}
scala hklin ${name}_rel2 hklout ${name}_${run} \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-${run}
title Merged, lambda ${run}
run 1 batch 1000 to 1999
scales constant
anomalous on
onlymerge
reject 4
eof-${run}
###=== Step 5 ==========================================================
###=== Convert I to F, do Wilson plot
###===
truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${run}
###=== Repeat step 4 & 5 for L2
###===
l2:
set ln = 2
set run = l${ln}
scala hklin ${name}_rel2 hklout ${name}_${run} \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-${run}
title Merged, lambda ${run}
run 1 batch 2000 to 2999
scales constant
anomalous on
onlymerge
reject 4
eof-${run}
truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${run}
###=== Repeat step 4 & 5 for L2
###===
l3:
set ln = 3
set run = l${ln}
scala hklin ${name}_rel2 hklout ${name}_${run} \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-${run}
title Merged, lambda ${run}
run 1 batch 3000 to 3999
scales constant
anomalous on
onlymerge
reject 4
eof-${run}
truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${run}
###=== Repeat step 4 & 5 for L2
###===
l4:
set ln = 4
set run = l${ln}
scala hklin ${name}_rel2 hklout ${name}_${run} \
normplot ${run}.norm \
anomplot ${run}.anom \
rogues ${run}.rogues \
<< eof-${run}
title Merged, lambda ${run}
run 1 batch 4000 to 4999
scales constant
anomalous on
onlymerge
reject 4
eof-${run}
truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${run}
set name = dfxe_3d_l
###=== Step 6 ==========================================================
###=== Sort together merged data for all wavelength, outputting a
###=== single record for each hkl
###=== For each wavelength, store amplitude F & sigF,
###=== anomalous difference DANO (= F+ - F-) & sigDANO,
###=== and ISYM flag which shows if both F+ & F- were measured
cad hklout ${name}_fcad \
hklin1 ${name}1_f \
hklin2 ${name}2_f \
hklin3 ${name}3_f \
hklin4 ${name}4_f << eof-c
labin file_number 1 E1=F1 E2=SIGF1 E3=DANO1 E4=SIGDANO1 E5=ISYM1 E6=F1(+) E7=SIGF1(+) E8=F1(-) E9=SIGF1(-)
labin file_number 2 E1=F2 E2=SIGF2 E3=DANO2 E4=SIGDANO2 E5=ISYM2 E6=F2(+) E7=SIGF2(+) E8=F2(-) E9=SIGF2(-)
labin file_number 3 E1=F3 E2=SIGF3 E3=DANO3 E4=SIGDANO3 E5=ISYM3 E6=F3(+) E7=SIGF3(+) E8=F3(-) E9=SIGF3(-)
labin file_number 4 E1=F4 E2=SIGF4 E3=DANO4 E4=SIGDANO4 E5=ISYM4 E6=F4(+) E7=SIGF4(+) E8=F4(-) E9=SIGF4(-)
eof-c
###=== Step 7 ==========================================================
###=== Re-scale together all wavelengths, get statistics on dispersive
###=== and anomalous differences
scaleit:
scaleit hklin ${name}_fcad hklout dfxe_3d_fsc << eof-sci
labin FP=F4 SIGFP=SIGF4 \
FPH1=F1 SIGFPH1=SIGF1 DPH1=DANO1 SIGDPH1=SIGDANO1 \
FPH2=F2 SIGFPH2=SIGF2 DPH2=DANO2 SIGDPH2=SIGDANO2 \
FPH3=F3 SIGFPH3=SIGF3 DPH3=DANO3 SIGDPH3=SIGDANO3 \
FPH4=F1(+) SIGFPH4=SIGF1(+) \
FPH5=F1(-) SIGFPH5=SIGF1(-) \
FPH6=F2(+) SIGFPH6=SIGF2(+) \
FPH7=F2(-) SIGFPH7=SIGF2(-) \
FPH8=F3(+) SIGFPH8=SIGF3(+) \
FPH9=F3(-) SIGFPH9=SIGF3(-)
refine anisotropic
eof-sci
# Set a load of symbols
set crystal = m6d02
set ss3 = /nfs/al/ss3/fm1/Mad
set ss5 = /nfs/al/ss5/fm1/Mtz
set scr0 = /nfs/al/scr0/fm1/Mad
set scr1 = /nfs/al/scr1/fm1/Mad
set file = alldata
set resolution = 2.8
set title = " MMCM m6d02 - all the data scaled together"
# Editable branches for restarts
goto scala1
#goto scala2
#goto merging
sortmtz hklout Mtz/${crystal}_${file}_sc0 << eof1
H K L M/ISYM BATCH
... . . lots of files . .
eof1
# First scaling run - slope mode
# each batch has three parameters, a relative B-factor, and two scale
# factors, for the beginning & end of each batch. The scale for a
# particular reflection is interpolated between the 2 scales for that
# batch according to its "phi" value. This allows for a (linear) decay
# on incident beam intensity during the exposure
#
scala1:
scala hklin Mtz/${crystal}_${file}_sc0 \
hklout Mtz/${crystal}_${file}_sc1 \
scales Mtz/${crystal}_${file}_sc1.scales \
mapout Mtz/${crystal}_overlap_sc1 << eof-sc1
Title ${title} - 1st run
run 1 include batch 1000 to 1999
run 2 include batch 2000 to 2999
run 3 include batch 3000 to 3999
run 4 include batch 4000 to 4999
run 5 include batch 5000 to 5999 exclude 5190 5191 # exclude 2 duff batches
run 6 include batch 6000 to 6999
scales slope bfactor on detector 1 # batch slope scaling
tie rotation 0.1 # tie scale-factor pairs
resolution 20 ${resolution}
intensities scale_partial 0.5 # use scaled partials in scaling
sdcorrection 1.25 0.01 # these fudge factors were
# determined from a previous run in Scala
cycles 8
reject 8 # outlier test on 8sd
reject byrun
exclude sdmin 1 # omit weak reflections from scaling
overlapmap # write overlap matrix to MAPOUT
output separate # No merging
eof-sc1
# second run - across the detector
# We've now taken out any discontinuities of scale between batches
# Now apply a smoothly-varying scale factor along Phi & across the
# detector on top of the first batch scale. The input file (hklin) for
# this is the output file from the first scaling, and contains
# that result in the SCALE column which is applied here on input.
#
scala2:
scala hklin Mtz/${crystal}_${file}_sc1 \
hklout Mtz/${crystal}_${file}_sc2 << eof-sc2
Title ${title} - 2nd run
run 1 include batch 1000 to 1999
run 2 include batch 2000 to 2999
run 3 include batch 3000 to 3999
run 4 include batch 4000 to 4999
run 5 include batch 5000 to 5099
run 6 include batch 5100 to 5999 exclude 5190 5191
run 7 include batch 6000 to 6999
scales rotation spacing 5 bfactor off detector 4 4 # smooth scaling
tie detector 0.1 rotation 0.1 # restrain the scales from varying too much
initial none # skip initial pass through data
# set initial scales = 1.0
sdcorrection 1.25 0.01
cycles 10 reject 1 # reject outliers on all cycles
reject 8 # but not too tight
reject byrun
exclude sdmin 1
resolution ${resolution} 20
dump Mtz/${crystal}_${file}_sc2a.scales
output separate # Still no merging
eof-sc2
#exit
# third run - outlier rejection on individual datasets
# Read the output file from the 2nd scaling pass, and just pick out
# one data set (mostly one run). Outliers are removed from the
# output file here
# Outliers are listed in the file assigned to ROGUES
#
# This should be repeated for each dataset (run)
# (note that runs 5 and 6 belong to the same dataset)
# proceed to do individual merging
merging:
set reject = 10
set agr_reject = 10
set sdfac = 1.2
set sdadd = 0.002
set lambda = 1
set range = "1000 to 1999"
set title = " MMCM m6d02 wavelength Lambda ${lambda}"
# individual data set run - final outlier rejection
scala_l1:
scala hklin Mtz/${crystal}_${file}_sc3 \
hklout ${scr1}/${crystal}_l${lambda}_rj${reject}_sc4 \
rogues ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.rogues \
normplot ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.norm \
anomplot ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.anom \
<< eof-l1
Title ${title} - 4th scaling run
run 1 include batch ${range}
scales rotation spacing 5 bfactor off detector 4 4
sdcorrection $sdfac $sdadd
initial none
noscale # Skip scaling, just do merge
final partials # use summed partials in statistics (default)
resolution ${resolution}
reject scale $reject
reject merge ${agr_reject}
eof-l1
truncate:
truncate hklin ${scr1}/${crystal}_l${lambda}_rj${reject}_sc4 \
hklout ${ss3}/${crystal}_l${lambda}_trun_rj${reject}_sc4 << eof-l1
TITLE ${title}
labout F=FPTL${lambda} SIGF=SIGFPTL${lambda} DANO=DANOPTL${lambda} -
SIGDANO=SIGDANOPTL${lambda} ISYM=ISYMPT${lambda}
wilson
nresidue 2600
eof-l1
. . . repeat for other datasets . .
Partially recorded reflections may optionally be used in scaling (controlled by the command INTENSITIES), and in the final analysis (controlled by the command FINAL). The default is to include summed partials in both scaling and the final analysis and merging.
Different options for the treatment of partials are set for both scaling & merging stages by the PARTIALS command, or separately for the scaling stage (INTENSITIES command) and the merging stage (FINAL command). Partials may either be summed (subkeyword PARTIALS, with various options), or scaled (subkeyword SCALE_PARTIALS): in the latter case, each part is treated independently of the others. If summed partials are used in scaling with the SCALES BATCH option, the FRACTIONCALC is used to partition the effects of the different scales for the two halves. In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 31, 32, 33), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.
For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags below to accept or reject partial sets according to their reliability.
Summed partials:
All the parts are summed (after applying scales) to give the total intensity,
provided some checks are passed. The options to use partials as well as
fulls are defined separately for the scaling and merging steps on the INTENSITIES
and FINAL commands. The parameters for the checks are set by the PARTIALS
command for both stages, or separately on the INTENSITIES and FINAL commands.
The number of reflections failing the checks is printed. You should make
sure that you are not losing too many reflections in these checks.
By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .
TEST 1.2 1.2 CORRECT 0.5
will scale up all partials with a total fraction between 0.5 & 1.2
TEST 0.95 1.05
will accept summed partials 0.95->1.05, no scaling
TEST 0.95 1.05 CORRECT 0.4
will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95
Note that a profile-fitted intensity, if present in the file as a separate IPR column, will not be used for a scaled partial, unless the PARTIALS USE_PROFILE flag is set.
Scaled partials:
In this option, each individual partial observation scaled up by the inverse
FRACTIONCALC, provided that the fraction is greater than <minimum_fraction>
[default = 0.5].
For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing
Sum( whl * ( Ihl - ghl * Ih )**2 ) Ref
where Ih is the current best estimate of the "true" intensity
Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)
Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.
The inverse scale factor ghl is derived as follows:
ghl = Thl * Chl * Shl
where Thl is an optional relative B-factor contribution, Chl is a scale factor (1-dimensional or 3-dimensional (ie DETECTOR option)), and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY or SURFACE options).
a) B-factor (optional)
For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . ... tn). Then for an observation measured at time tl
B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))
where Bi are the B-factors at time ti
delt = tl - ti
p(delt) = exp ( - (delt)**2 / Vt )
Vt is "variance" of weight, & controls the smoothness
of interpolation
Thl = exp ( + 2 s B )
s = (sin theta / lambda)**2
An alternative anisotropic B-factor may be used to correct for anisotropic fall-off of scattering. This is parameterized on the components of the scattering vector (divided by 2 for compatibility with the normal definition of B) in two directions perpendicular to the Xray beam (y & z in the "Cambridge" coordinate frame with x along the beam).
Thl = exp ( + 2[uy**2 Byy + 2 uy uz Byz + uz**2 Bzz])
where uy, uz are the components of d*/2
Byy, Byz, Bzz are functions of time ti or batch as for the isotropic Bfactor. The principle components of B (Bfac_min, Bfac_max) are also printed.
b) Scale factors
For each run, scale factors Cxyz are determined at positions (x,y) on the detector, at intervals on rotation angle z. Then for an observation at position (x0, y0, z0),
Chl(x0, y0, z0) =
Sum(z)[p(delz){Sum(xy)[q(delxy)*Cxyz]/Sum(xy)[q(delxy)]}/Sum(z)[p(delz)]
where delz = z - z0
p(delz) = exp(-delz**2/Vz)
q(delxy)= exp(-((x-x0)**2 + (y-y0)**2)/Vxy)
Vz, Vxy are the "variances" of the weight & control the smoothness
of interpolation
For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor (or set of scale factors across the detector) for each batch. The SLOPE (not recommended) option has two scale factors per batch, with the scale interpolated linearly between the beginning and end according to the rotation angle of the reflection.
c) Anisotropy factor
The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of either the secondary beam (SECONDARY correction) in the camera spindle frame, or the diffraction vector in the crystal frame (SURFACE option).
s = [Phi] [UB] h
s2 = s - s0
s2' = [-Phi] s2
Polar coordinates:
s2' = (x y z)
PolarTheta = arctan(sqrt(x**2 + y**2)/z)
PolarPhi = arctan(y/x)
where [Phi] is the spindle rotation matrix
[-Phi] is its inverse
[UB] is the setting matrix
h = (h k l)
(x y z) = [Q][B] h
Polar coordinates:
PolarTheta = arctan(sqrt(x**2 + y**2)/z)
PolarPhi = arctan(y/x)
where [Q] is a permutation matrix to put
h, k, or l along z (see POLE option)
[B] is the orthogonalization matrix
h = (h k l)
Shl = 1 + Sum[l=1,lmax] Sum[m=-l,+l] Clm Ylm(PolarTheta,PolarPhi)
where Ylm is the spherical harmonic function for
the direction given by the polar angles
Clm are the coefficients determined by
the program
Notes:For many crystals, the reflection profile on rotation ("phi") is not a simple closed curve, but has long tails due at least in part to thermal diffuse scattering (TDS): the amount of this depends on the crystal, and is larger at high resolution than at low resolution. If all reflections were scanned through the same angle, then equal amounts of this diffuse scattering would be included in each reflection. However, in typical "coarse sliced" data collection schemes, where the image rotation width is larger than the reflection width, reflections are recorded on a variable number of images, 1, 2, 3 etc, and different amounts of the tails are included in the integrated intensity. This generally leads to a negative "partial bias", increasing with resolution, i.e. the apparent intensities of partially recorded reflections are higher than equivalent fulls.
The TAILS correction is an attempt to correct for the different truncation of tails, by using a simple (crude) model of thermal diffuse scattering, although the correction only attempts to correct for the different truncation, and does not attempt to correct for diffuse scattering itself.
Some of the ideas used are based on suggestions by R.H.Blessing, Cryst. Reviews, 1, 3-58 (1987), but he should not be blamed for this.
This is a brief account of method (see code & comments in subroutine dffscn for more details):-
Dq = Dphi * xsi,
where xsi is the radius from the rotation axis
If the half width of the reflection (including tails) is v (another refineable parameter), and 2v > Dq, then part of the tails will be truncated.
Taking a simple model of the shape of the tails as a triangle of base width 2v, height in the middle h (h = J * alpha / v), then the area in the tails (= tail intensity) and the intensity truncated by the restricted scan range can be calculated. Then the corrected ("true") intensity J can be calculated
For full scan:
J = I / (1 + alpha)
For truncated scan (missing parts of tails C1 & C2)
J = I / (1 + alpha*(1 - C1 - C2))
The correction applied is thus
I' = I * (1 + alpha) / (1 + alpha*(1 - C1 - C2))
DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from Scalepack as input to Scala (in which case the following keyword record has to be included in Scalepack: no merge original index).
Both of these options have some problems:
ALSO: the DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using program COMBAT.
The test for outliers is as follows:
Delta(hl) = |Chi|
|Ihl - ghl Iother| / sqrt[sigIhl**2 + (ghl*sdIother)**2]
against sdrej2, where Iother = the other observation
Delta(hl) =
|Ihl - ghl <I>n-1>| / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
Many changes from version 1.x.x
Recommended usage:
FINAL PARTIALS CHECK TEST 0.95 1.05 # for Mosflm
FINAL PARTIALS TEST 0.95 1.05 # for Denzo (but FractionCalc
# is rather unreliable)
Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.