SCALA (CCP4: Supported Program)

NAME

scala - scale together multiple observations of reflections

SYNOPSIS

scala HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

Keyworded input summary
References
Input and Output files
Examples
Release Notes

DESCRIPTION

Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
TAILS correction
Data from Denzo
Data harvesting

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity.

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of PHI (the rotation angle), or expressed as BATCH (image) number. In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction, either expanded as spherical harmonics, or as a interpolated three-dimensional function of Phi and the spatial coordinates of the measured spot on the detector. Such three-dimensional scaling is typically ill-determined, see below for discussion of this. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ), the interpolated three-dimensional correction is similar to that described by Kabsch (Ref Kabsch (1988)).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates an weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does three passes through the data:

  1. a scaling pass: firstly, there is an initial estimate of the scales, then the scale parameters are refined
  2. an analysis pass to analyse discrepancies and adjust the standard deviation estimates
  3. a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as a copy of the input file with evaluated scales appended to each observation.

Normally anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help.

  1. If successive images are collected with the same detector (on-line detector) or equivalent detectors, and the beam intensity is steady or smoothly varying, then use a smoothed scaling options. Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly changing incident beam flux. This may often be the case for synchrotron data if "dose" mode is not used.
  2. It is possible to "mix-and'match" options. For instance The best option for synchrotron data may be e.g. SCALES BATCH BFACTOR ON BROTATION SPACING 5, which will make the Bfactor variation smooth, but the scales discontinuous by batch.
  3. If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as differents RUNs.
  4. The output from one run of Scala may be used as the input for a second (or more) round of scaling, with different scale layout if desired. Thus a BATCH scaling could be followed by a smooth scaling varying across the detector. The first run appends a SCALE column in the first output file, which will be applied on input to the second scaling (using the first hklout file as hklin for the second run). The SCALE column output from the second scaling will be the product of the input (1st) scale x new scale (2nd).
  5. Since SCALA can use different data sets for scaling and merging it can be used to find well parameterised "LOCAL" scales. For instance, if you have an existing data set and are now working with some isomorphous derivative set, it is sensible to include the existing data as a REFERENCE set flagged as a different RUN. This can be used at the scaling pass, but omitted from the merging pass. When scaling multiple MAD data sets they should all be scaled in one pass, then each wavelength merged seperately.
  6. Always use a B-factor correction, unless you know there is no significant radiation damage, or the data are only low-resolution, or you are allowing the scale to vary across the detector. The variation of scale across the detector is generally ill-determined, and it must not be combined with Bfactor refinement (since it doing part of the same thing). The B-factor may be combined with the SECONDARY beam correction.
  7. It is only sensible to attempt to define scale factors varying with secondary beam direction or position on the detector if (a) the crystal has a high symmetry, or (b) you have data from rotation about more than one axis, or (c) you are scaling to a reference data set to native scaling.
Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

Each of the stages can be individually activated or suppressed. Particularly useful options are:

Partially recorded reflections

See appendix 1

Partially recorded reflections are by default included the scaling pass, as well as included in the final analysis and merging. They may optionally be excluded from the scaling (controlled by the command INTENSITIES), and excluded from the final analysis (controlled by the command FINAL). Note that this default has changed from earlier versions

The different options for the treatment of partials are set by either the PARTIALS command, effective for both scaling & merging stages; or separately for the scaling stage only (INTENSITIES command) or for the merging stage only (FINAL command).

Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags to accept or reject partial sets according to their reliability.

Summed partials:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].

Scaling algorithm

See appendix 2

TAILS correction

The TAILS (SCALES .. TAILS) correction may be used to improve poor partial bias: this is an attempt to allow for the difference in scan width between fulls and partials. A partial is measured across twice (or 3 times etc) the rotation width of a full, so more of the diffuse scattering tails are included in the intensity, leading to an under-estimation of the fulls relative to partials. This correction is not robust, and the parameters may be unstable: you should always try first without this correction, and check that it really does imptove the data statistics, without applying ridiculously large corrections. See appendix 3 for more details.

Data from Denzo

Data integrated with Denzo may be scaled and merged with Scala as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 4 for more details.

Data harvesting

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.scala

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.

See also Data Harvesting.

KEYWORDED INPUT - SUMMARY

Summary classification of keywords

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ANALYSE, ANOMALOUS, BINS, CYCLES, DAMP, DNAME, DUMP, EXCLUDE, FILTER, FINAL, HISTORY, INITIAL, INSCALE, INTENSITIES, LINK, NODUMP, NOHARVEST, NOSCALE, ONLYMERGE, OUTPUT, OVERLAPMAP, PARTIALS, PNAME, PRINT, PRIVATE, REJECT, RESOLUTION, RESTORE, RSIZE, RUN, SCALES, SDCORRECTION, SKIP, SMOOTHING, TIE, TITLE, [UN]FIX, UNLINK, USECWD, WIDTH, XYBINS

RUN <Nrun> [<subkeys>]

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The subkeys allow definition of a run in a flexible way. The definition of a run may use several RUN commands.

Subkeys:
REFERENCE
This run is a reference set, i.e. it will be given a single scale factor = 1.0 (an input scale factor in the SCALE column will still be applied if present)
BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
Define a list of batches, or a range of batches, to be included in or excluded from the run. If batches are included in more than one run definition, the last definition will take priority.
ALL
Include all batches
CRYSTAL | <c1> <c2> <c3> ..... | <c1> TO <c2> |
Define a list or a range of "crystal" numbers to be included in the run. Crystal numbers are not usually defined at present, so this option is not very useful.
INCLUDE | EXCLUDE
Set include/exclude flag for a following RANGE or BATCH keyword. Excluded batches or ranges will be omitted from the output file.
RANGE <r1> TO <r2>
Rotation range to include or exclude

Examples:

  RUN 1 BATCH 1 TO 10000
  RUN 1 INCLUDE BATCH 1 TO 200 EXCLUDE 77 79 132
  RUN 2 CRYSTAL 2
  RUN 3 INCLUDE RANGE 0 TO 90 EXCLUDE RANGE 45 TO 48

SCALES [<subkeys>]

Define layout of scales. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:
RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>
Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 10]
BATCH
Set "Batch" mode, no interpolation along rotation (primary) axis. If only one scale factor is used on the detector (i.e. DETECTOR 1), this makes the scaling equivalent to Rotavata. This is compulsory if a ROT column is not present in the input file.
BFACTOR ON | OFF | ANISOTROPIC
Switch Bfactors on or off [default ON]. The ANISOTROPIC activates anisotropic Bfactors (NOT RECOMMENDED): beware that the parameters for this option is likely to be poorly determined. Note that the anisotropic correction is centrosymmetric. Bfactor refinement will be switched off by default if the scales are allowed to vary across the detector (qv DETECTOR).
BROTATION [|TIME] <Ntime> | SPACING <delta_time>
Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used. SCALES BATCH BROTATION SPACING 5 make the Bfactor variation smooth, but the scales discontinuous by batch.
SECONDARY [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order L-max. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction and and B-factor (eg ROTATION SPACING 10 BFACTOR ON SECONDARY 6).
SURFACE [<Lmax>] [POLE <h|k|l>]
Local correction expanded on direction in hkl space (ie crystal frame) in spherical harmonics up to maximum order L-max. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). The polar axis may be specified here (eg POLE L), or will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups). If you want to do 3-dimensional scaling, the SECONDARY or DETECTOR options are prefereable: this option should only be used if the diffraction geometry information required to work out the beam directions is not available.
DETECTOR <Nscales_X> [<Nscales_Y>] | SPACING <delta_X> [<delta_Y>]
Define layout of scale factors on detector (i.e. secondary beam), either as number of scales in each direction (along XDET & YDET), or (if SPACING keyword present) as interval on XDET & YDET. The values for Y default equal to those for X is not specified. This option assumes that the detector positions are recorded in the input file (columns XDET, YDET), in any units (mm or pixels). If you allow the scale to vary across the detector (anything other than DETCTOR 1, the default), then by default Bfactor refinement is switched off, since the combination is likely to be unstable [Default 1 scale, i.e. no variation of scale across detector]
CONSTANT
One scale for each run (equivalent to ROTATION 1 DETECTOR 1 1)
TAILS [<v> [<a0> [<a1>]]]
Apply correction for diffuse scattering (reflection tails) for this run. This can only be used with summed partials (INTENSITIES PARTIALS: this is the default). See introduction for explanation. Initial values for the parameters v, a0 & a1 may be given following the keyword.
v
width of tails in reciprocal space (A**-1) [default = 0.01]
a0
fraction of intensity in diffuse peak at theta = 0 [default = 0.0, fixed]
a1
slope of intensity fraction against (sin theta/lambda)**2 [default = 10]

Parameters may be fixed using the FIX command, or the same set used by different runs as defined by the LINK command. These controls may be required to avoid the parameters going wild.
SLOPE
NOT RECOMMENDED. Set "Slope" mode, like Batch, except that each batch has different scales at the beginning and end of the rotation range. The value used for each reflection is interpolated linearly according to the "Rotation" (phi) value. SLOPE implies BATCH mode. Be careful with this option: does it really improve the data? It is unlikely to work well if the mosaicity is large. TIE ROTATION may be used to restrain the difference in scales.

SDCORRECTION [RUN <RunNumber>] [[NO]ADJUST]
[FULL | PARTIAL | BOTH] <Sdfac> [<SdB>] <Sdadd>

Modifiers for input standard deviations: these are modified to

        sd(I) corrected = Sdfac * sqrt(sd(I)**2 + SdB*Ihl' + (Sdadd*Ihl)**2)

where Ihl is the intensity (SdB may be omitted in the input and its use is not recommended). Default values are 1.0, 0.0, 0.02.

RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both (BOTH), e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

The keyword NOADJUST stops the automatic adjustment of the Sdfac parameters from the normal probability analysis at the beginning of the merge stage [default is ADJUST] (this applies to all runs)

With the output options SEPARATE or POSTREF, the modified Sds are written to the output file in columns SIGIC [& SIGIPRC if IPR is present]. These columns will be used by Postref but ignored on reinput to Scala.

PARTIALS [NO]CHECK [NO]TEST [<lower_limit> <upper_limit>] CORRECT <minimum_fraction>] [NO]GAP MAXWIDTH <maximum_width> SCALE_PARTIAL <minimum_fraction> USE_PROFILE

Select the way in which partials are treated in both scaling and merging. These settings may be overridden separately for the scaling and merging steps with the INTENSITIES and FINAL commands respectively.

By default, partials are included (summed) in both scaling and in merging.

Subkeys:
[NO]CHECK
do [not] check for consistency of MPART flags (if present, i.e. from Mosflm). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
USE_PROFILE
use profile-fitted intensity even for scaled partials

INTENSITIES

[INTEGRATED | PROFILE | PR_PART]
[[NO]ANOMALOUS]
[FULLS | ONLYFULLS | SCALE_PARTIAL <minimum_fraction>
| PARTIALS [ [NO]CHECK | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction> ] [ [NO]GAP ] [MAXWIDTH <maximum_width>] ] ]

Intensities selection for scaling: which intensities to use, whether to keep Bijvoet pairs separate, and treatment of partials in scaling:

(a) Intensity selection options:

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. Note this applies to all stages of the program, scaling & averaging.

Subkeys:
INTEGRATED
summation integrated intensity I.
PROFILE
profile-fitted intensity IPR [default if present]. Note that this will not be used for scaled partials unless PARTIALS USE_PROFILE is set.
PR_PART
profile IPR for fullys, integrated for partials

(b) Treatment of Bijvoet-related observations

By default, all observations (I+ & I-) are treated alike in scaling. This is normally the correct thing to do, since the anomalous differences are usually small and randomly positive and negative. In a case with large anomalous differences and high redundancy, it may be better to keep the I+ & I- observations separate in the scaling. Note that typically this will severely reduce the scaling overlaps between different parts of the data, and is not recommended except in special cases.

Subkeys:
ANOMALOUS
keep I+ and I- observations separate in scaling
NOANOMALOUS
use I+ and I- together in scaling [default]

(c) Options for treatment of partials in scaling (overrides options given under PARTIALS):

Set whether partially recorded reflections should be used in scaling, & if so, whether to use summed or scaled partials. By default summed partials are used in scaling as well as fulls. See introduction above for a description of the use of partially recorded reflections. Treatment of partials in the final averaging stage is defined with the FINAL command

Subkeys:
FULLS
use fully recorded observations only, & previously summed partials (from MOSFLM ADDPART)
ONLYFULLS
use fulls only: exclude previously summed partials (from MOSFLM)
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
PARTIALS
use summed partials in scaling (if present) [this is the default]. The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the scaling step only (not merging):
[NO]CHECK
do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial

REJECT
[SCALE | MERGE] [BYRUN]
<Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging (FINAL) passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. If the BYRUN flag is set, rejection & deviation calculations are done only between observations from the same run. This is only sensible if each run is essentially a complete dataset, for instance a group of MAD datasets could be scaled together, treating each one as a RUN.

If ANOMALOUS ON is set, then by default the outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged "@" in the ROGUES file. In the scaling step, the outlier check includes all observations, unless anomalous observations are kept separate in scaling (INTENSITIES ANOMALOUS: this is an unusual option for special cases only).

Subkeys:
BYRUN
rejection & deviation calculations are done only between observations from the same run
SCALE
use these values for the scaling pass
MERGE
use these values for the merging (FINAL) pass
sdrej
sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]
special value for reflections measured twice [default = sdrej]
ALL
check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF)
sdrej+-
sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)[default check within I+ & I- sets only]
[sdrej2+-]
special value for reflections measured twice [default = sdrej+-]
KEEP
in merging, if two observations disagree, keep both of them [default]
REJECT
in merging, if two observations disagree, reject both of them
LARGER
in merging, if two observations disagree, reject the larger
SMALLER
in merging, if two observations disagree, reject the smaller

The test for outliers is described in Appendix 5

ANOMALOUS [OFF] [ON | ALL]

[RUN <Nrun>]
[MATCH [SPINDLE | INVERT | <hkl symmetry>]]
[PHIDIF <maximum Phi difference>]

Controls the treatment of anomalous scattering information in the merging step. Note that the option of selecting matching anomalous pairs is not recommended for normal use: it is likely to lead to seriously incomplete data in many cases, and the results should be compared carefully with those with the MATCH option switched off.

Subkeys:
OFF [default]
no anomalous used, I+ & I- observations averaged together in merging
ON | ALL
separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own
RUN <run number>
set run for this MATCH option to apply to, otherwise it applies to all runs [default]
MATCH
use only matching I+ & I- pairs in merging
Matching pairs are :-
  • (a) in same run
  • (b) related by defined symmetry (if given as SPINDLE | INVERT | <hkl symmetry>)
  • (c) not more than DeltaPhi apart (if given by PHIDIF)
  • Definition of symmetry:-

    SPINDLE
    related by negation of reciprocal index closest to spindle: this option requires full orientation data to be present in the file
    INVERT
    related by inversion of indices, i.e. -h, -k, -l
    <hkl symmetry>
    specified hkl symmetry (e.g. h, -k, l)
    PHIDIF <DeltaPhi>
    maximum difference in Phi (ROT) between matching pairs

    RESOLUTION [RUN <Nrun>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

    Set resolution limits in Angstrom, either order, optionally for individual runs (in which case this command MUST come after definition of the run). The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If the RUN subkeyword is omitted, the limit applies to all runs. [Default use all data]

    TITLE <new title>

    Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

    ONLYMERGE

    Only do the merge step, no initial analysis, no scaling (== INITIAL NONE; NOSCALE). Note that this will usually need to be combined with a RESTORE command.

    RESTORE [<Scale_file_name>]

    Read initial scales from a SCALES file from a previous run of Scala (scales are normally dumped on every cycle, see DUMP). The number of scales defined for each run this time should typically be the same as in the dump, although a set of scale factors along ROTATION or DETECTOR may be extrapolated to additional batches which were not present in the initial scaling. The file may contain scales for runs which are not used this time, but new runs may not be added. RESTORing from a scale file which does not properly correspond to the run which generated the file is liable to give silly results. No initial analysis pass will be done unless the command INITIAL ANALYSE is given.

    INITIAL MEAN | UNITY | RUN <RunNumber> <InitialScale> | NONE | ANALYSE

    Define method of setting initial scales

    Subkeys:
    MEAN
    from mean intensities by rotation range [default]
    UNITY
    set all scales = 1.0
    RUN <RunNumber> <InitialScale>
    set initial scale factor for this run If this option is used, any runs whose scales are not set explicitly will have their scales set = 1.0
    NONE
    no initial analysis pass, set all scales to unity
    ANALYSE
    force initial analysis pass even if RESTORE option is used

    PRINT [<subkey>]

    Define amount of printing

    Subkeys:
    NONE
    almost none
    BRIEF
    some more [default]
    CYCLES
    more information about each minimization cycle
    FULL
    quite a lot
    DEBUG [<reflection_interval>]
    far too much: also define reflection interval for printing
    ALLOVERLAP
    print all numbers in overlap matrix after initial pass, rather than the default condensed table
    NOOVERLAP
    no printing of overlap matrix after initial pass

    CYCLES [[NUMBER] <Ncycle>] [CONVERGE <Conv_limit>] [REJECT <Rej_cycle>] [WEIGHT VARIANCE | UNIT ]

    Define number of refinement cycles, convergence limit, and weighting scheme for scale refinement

    Subkeys:
    [NUMBER]
    maximum number of cycles [default 10]
    CONVERGE
    convergence limit (multiple of sd(param)) [default 0.3]
    REJECT
    1st cycle number for rejection of outliers [default 2] The default is not to reject outliers on the first cycle when the scales may be a long way off, but if the initial scales are reasonable (particularly if they come from a previous run) it is probably better to exclude outliers from the first cycle as well
    WEIGHT VARIANCE | UNIT
    Weighting scheme for scale refinement: VARIANCE weighting is default and usual; UNIT weights may help if the scale-factors vary over a large range (unit weights have not been much tested)

    EXCLUDE [RUN <Nrun>]
    [EMAX <maximum_E> | EPROB <minimum_probability>]
    [SDMIN <value>] [SDMAX <value>] [ABSMAX <value>]
    [ARC INSIDE|OUTSIDE <X1> <Y1> <X2> <Y2> <X3> <Y3> ... <Xn> <Yn>]
    [RECTANGLE <Xmin> <Xmax> <Ymin> <Ymax>]

    Set intensity limits or positional limits for excluding observations.

    Limits for scaling and merging passes:-
    EMAX or EPROB, ARC and RECTANGLE limits apply to all stages of the program

    Limits for scaling pass only:-
    If an observation is considered too weak (I .lt. sd(I) * SDMIN), or if an observation is too strong (I .gt. sd(I) * SDMAX .or. I .gt. ABSMAX), then all observations of that reflection are omitted from the scaling. Exclusions are not applied to a Reference run. [Default EXCLUDE SDMIN 3.0]
    These exclusions do not apply to the initial scale calculation (INITIAL MEAN), nor to the output statistics, only to the scaling. The test is only done on fully recorded observations, and against the input standard deviations (i.e. unmodified by SDCORRECTION parameters)
    Subkeys:
    RUN <Nrun>
    defines a run number (previously defined) for these exclusion parameters to apply to: else applies to all runs (the EMAX|EPROB limit applies to all runs)
    EMAX <maximum_E> | EPROB <minimum_probability>
    Define maximum normalized amplitude E allowed: this may be given either as the maximum E-value EMAX for an acentric reflection eg 8 - 10, or as the minimum allowed probability EPROB eg 1e-8 Eprob = exp (- Emax**2). Excluded reflections are listed in the log file, and in the ROGUES file. See R.Read, CCP4 Study Weekend, Sheffield 1999. [Default EMAX 10]. NOEMAX switches this test off
    SDMIN
    minimum sd multiple for inclusion
    SDMAX
    maximum sd multiple for inclusion
    ABSMAX
    maximum absolute value
    i.e. observations are excluded if:-
                    I  .lt. sd(I) * SDMIN
            .or.    I  .gt. sd(I) * SDMAX
            .or.    I  .gt. ABSMAX
    
    ARC
    defines an area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging, as a circular arc. Data are excluded either INSIDE (lower radius) or OUTSIDE (higher radius) the arc. The arc is defined by fitting a circle to the coordinates of 3 or more points: points 1 (X1,Y1) and 2 (X2,Y2) define the ends of the arc (in either order). If X1,Y1 = X2,Y2 a complete circle is excluded. A series of arcs may be defined. This option allows for the exclusion of shadows on the detector from eg backstop or cryocooler etc
    RECTANGLE
    defines a rectangular area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging. A series of rectangles may be defined.

    TIE [ROTATION [<Sd_z>]][DETECTOR [<Sd_xy>]] [SURFACE [<Sd_srf>]]

    Restrain pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) or in detector plane (DETECTOR = secondary beam) to have the same value, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended (a) if scales are varied across the detector, eg TIE DETECTOR 0.1, or (b) for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.01

    OUTPUT <subkeywords>

    Control what goes in the output file. Three types of output file may be produced: (a) AVERAGE, average intensity for each hkl (I+ & I-). (b) SEPARATE, observations from input file with scale calculated, for re-input to Scala (or Postref, see POSTREF option) (c) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected.

    A reference batch is always excluded from the final statistics, even if it is included in the output file (only possible with the SEPARATE option).

    File format options:
    NONE
    no output file written
    AVERAGE
    [default] output averaged intensities, <I+> & <I-> for each hkl
    SEPARATE
    output observations as input, but with added columns for SCALE etc. This file may be reinput to Scala for further scaling (e.g. with a different scaling model)
    POSTREF
    append columns for Postref. This option implies SEPARATE. The added columns are IMEAN SIGIMEAN ISUM SIGISUM IMEAN mean of fully-recorded reflections ISUM summed partials (partials only)
    UNMERGED
    apply scales, sum or scale partials, reject outliers, but do not average observations
    POLISH
    Write reflections also to a formatted file as well as the MTZ file (logical name SCALEPACK) in some obscure format as written by "scalepack" (or my best approximation to it). Why would anyone want to do this? If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.

    Other options:

    (a) UNMERGED options:
    ORIGINAL
    write original indices hkl: M/ISYM = 1 for all reflections
    REDUCED
    [default] hkl indices are reduced to asymmetric unit, as in input file
    (b) SEPARATE (POSTREF) options
    the following apply only to the SEPARATE (POSTREF) option, and must not precede that switch:-
    REFERENCE
    write reference batch (if present) to output file
    NOREFERENCE
    [default] omit reference batch (if present) from output file
    KEEP
    [default unless average] keep reflections outside resolution limits. The SCALE column will be set = 0.0
    KEEP SCALE
    keep reflections outside resolution limits, and calculate scales for them. This is dangerous unless the proportion of reflections omitted from scaling is small
    EXCLUDE
    [default if AVERAGE] exclude reflections outside resolution limits
    OMIT OUTLIERS
    omit rejected outliers from output file (SEPARATE & POSTREF options only). In this case a ROGUES file is written (see below) [default keep them in, but flagged in the FLAG column]
    OMIT PARTIALS [RUN <Nrun>]
    omit partially recorded reflections from output file. If no run number is given, then it applies to all runs. Multiple runs may specified on successive OUTPUT OMIT PARTIALS RUN commands
    ROGUES
    write a list of rejected reflections is written to the file ROGUES. This may be assigned on the command line. A ROGUES file is always written for the AVERAGE & UNMERGED options. [for SEPARATE, default no ROGUES file written unless OMIT OUTLIERS option used]

    FINAL [ NONE | FULLS | ONLYFULLS

    | SCALE_PARTIAL <Minimum_fraction>
    | PARTIALS [[NO]CHECK] | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP] [MAXWIDTH <maximum_width>] ]

    Select whether or not to use summed or scaled partials in the final analysis after scale determination. If this command is missing, summed partials will be included if the input file contains a FRACTIONCALC column.

    Subkeys:
    NONE
    no final analysis/output pass
    FULLS
    use fulls only (& previously summed partials, eg from MOSFLM ADDPART or Scalepack) [default if no FRACTIONCALC column]
    ONLYFULLS
    use fulls only: exclude previously summed partials (from MOSFLM)
    SCALE_PARTIALS
    use scaled partials greater than <Minimum_fraction> in the merging. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
    PARTIALS
    use summed partials in final analysis (if present). See introduction above for a description of the use of partially recorded reflections. [this is the default if FRACTIONCALC column is present] The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the merging step only (not scaling):
    [NO]CHECK
    do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
    [NO]TEST [<lower_limit> <upper_limit>]
    do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
    CORRECT [<minimum_fraction>]
    Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
    [NO]GAP
    do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set. [default GAP]
    MAXWIDTH <maximum_width>
    maximum number of parts for an acceptable summed partial

    [UN]FIX [V] [A0] [A1]

    Option to fix or free TAILS parameters: by default V & A1 are free, A0 is fixed [default A0 = 0.0]. Fixing A0 may help for low resolution data particularly.

    LINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    run_2 will use the same SURFACE (or SECONDARY) or TAILS parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. LINK TAILS ALL will use the same tails parameters for all runs for which TAILS parameters are refined. The keyword ALL will be assumed if omitted.

    UNLINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted, e.g. UNLINK TAILS [ALL] will use seperate tails parameters for each run.

    SKIP <N_skip> [[FOR] <N_skip_cycles>]

    Allow a subset of reflections to be used during the initial cycles of scaling, to speed up the program. For the first N_skip_cycles, only every N_skip'th unique reflection will be used. N_skip_cycles defaults = Ncycle-2, and the program will force 2 more cycles with all data if convergence is reached while reflections are still being skipped. You should check that convergence has been reached with all observations, particularly if the number of observations used in the early cycles is small.

    FILTER <Filter> [<Damp>]

    Define filter level, & damp level. In the minimization, shifts corresponding to eigenvalues .lt. <Filter> are removed, <Damp> is added to all eigenvalues. [Default 1.0e-6, 0.0]

    DAMP [NONE] | <Damp> <NcycDamp>

    Set damping level for shifts. <Damp> is added to all eigenvalues for the first <NcycDamp> cycles. This may be useful if the scales vary over a wide range, particularly if the scale refinement diverges at first, but is not normally recommended, as it seems to slow convergence. Default is DAMP NONE. If <NcycDamp> is omitted, the damping applies to all cycles

    BINS <Nsrange>

    Define number of resolution bins for analysis [default 10]

    XYBINS <Nx> [<Ny>]

    Define number of bins across detector, x (=XDET) and y (YDET). Only used if XDET, YDET columns are present in input file <Ny> defaults to <Nx>. XYBINS 0 turns off analysis [default Nx = Ny = 20]

    SMOOTHING <subkeyword> <value>

    Set smoothing factors ("variances" of weights)

    Subkeys:
    TIME <Vt>
    smoothing of B-factors [default 1.0]
    ROTATION <Vz>
    smoothing of scale along rotation [default 1.0]
    DETECTOR <Vxy>
    smoothing of scale on detector [default 1.0]
    PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>
    maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically]

    INSCALE OFF | ON

    Switch OFF or ON application of an input SCALE column. By default, if the input file contains a column called SCALE (e.g. from a previous run of Scala), it will be applied.

    NOSCALE

    Don't do any scaling, just the final analysis (equivalent to CYCLES 0)

    DUMP [<Scale_file_name>]

    Dump all scale factors to a file after each cycle. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line. DUMP is set by default, but may be turned off with the NODUMP command.

    NODUMP

    No dump of scales to file. Default is DUMP.

    ANALYSE [[NO]NORMAL] [[NO]PLOT] [MAXDENSITY <maximum point density>]

    This command controls the normal probability analyses

    Subkeys:
    [NO]NORMAL
    do [not] do normal probability analyses [default do them]
    [NO]PLOT
    do [not] write normal probability plot to output file with logical name DELTA [default do write file]. This file contains pairs of delta(expected), delta(observed) for fulls, then summed partials, then scaled partials
    MAXDENSITY
    maximum point density for normal probability plot. This plot includes a point for every observation, so in large datasets it can get very big. This parameter allows the sampling of the plot, so that in the central crowded part only some of the points are included in the plotfile [default 25]

    HISTORY <history line>

    Define optional line to be added to the history records in the file. This is in addition to a line giving the date and time of the run, which is always added. Only one optional history line may be added.

    OVERLAPMAP

    Write the overlap matrix from the initial analysis to a map file assigned to MAPOUT. Note that the initial analysis is not done if the RESTORE option is used or INITIAL NONE is set.

    WIDTH WILSON | LINEAR | SQUARE [<mid-point>]

    Select binning mode on intensity

    Subkeys:
    WILSON
    [default] exponential bins
    LINEAR
    linear bins
    SQUARE
    quadratic bins

    In each case, <mid-point> is the limit for the middle bin.

    PNAME <project_name>

    Project Name. In most cases, this will be inherited from the MTZ file.
    A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. An entry in the PNAME keyword should therefore be accompanied by a corresponding entry in the DNAME keyword.

    DNAME <dataset_name>

    Dataset Name. In most cases, this will be inherited from the MTZ file.
    A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. An entry in the DNAME keyword should therefore be accompanied by a corresponding entry in the PNAME keyword.

    PRIVATE

    Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').

    USECWD

    Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.

    RSIZE <row_length>

    Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

    NOHARVEST

    Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

    INPUT AND OUTPUT FILES

    Input

    HKLIN
    The input file must be sorted on H K L M/ISYM BATCH

    Compulsory columns:

            H K L           indices
            M/ISYM          partial flag, symmetry number
            BATCH           batch number
            I               intensity  (integrated intensity)
            SIGI            sd(intensity)   (integrated intensity)
    

    Optional columns:

            XDET YDET       position on detector of this reflection: these
                            may be in any units (e.g. mm or pixels), but the
                            range of values must be specified in the
                            orientation data block for each batch. If
                            these columns are absent, the scale may not be
                            varied across the detector (i.e. only SCALES
                            DETECTOR 1 is valid)
            ROT             rotation angle of this reflection ("Phi"). If
                            this column is absent, only SCALES BATCH is valid.
            IPR             intensity  (profile-fitted intensity)     
            SIGIPR          sd(intensity)   (profile-fitted intensity)
            SCALE           previously calculated scale factor (e.g. from
                            previous run of Scala). This will be applied
                            on input
            SIGSCALE        sd(SCALE)
            TIME            time for B-factor variation (if this is
                            missing, ROT is used instead)
            MPART           partial flag from Mosflm
            FRACTIONCALC    calculated fraction, required to SCALE PARTIALS
            LP              Lorentz/polarization correction (already applied)
    

    Output

    HKLOUT
    (a) Option AVERAGE
    The output file contains columns
    H K L  IMEAN SIGIMEAN  I(+) SIGI(+)  I(-) SIGI(-)
    
    

    Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.

    (b) Option SEPARATE
    The output file contains the same columns as the input, with some columns added if not previously present:-

    SCALE & SIGSCALE - the calculated scale factor & its sd (this may be applied in another run of Scala). SCALE will be = 0.0 for reflections outside the resolution cutoff, if they are included in the output file (option OUTPUT KEEP) (see example)

    SIGIC [, SIGIPRC] - the corrected standard deviations of I [and IPR], as altered by SDCORR commands. These columns are only written if a SDCORRECTION command is given to Scala.

    If the OUTPUT POSTREF option is given, then also the columns IMEAN SIGIMEAN ISUM SIGISUM are added

            IMEAN    mean of fully-recorded reflections
            ISUM     summed partials (partials only)
    
    
    (c) Option UNMERGED
    As for SEPARATE, but with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. If a separate profile-fitted intensity column IPR, SIGIPR is present in the input file as well as columns I, SIGI, only one set will be chosen, as specified. Columns defining the diffraction geometry (e.g. XDET YDET ROT TIME LP FRACTIONCALC) will be preserved in the output file.

    Output columns:

            H,K,L     REDUCED or ORIGINAL indices (see OUTPUT options)
            M/ISYM    Symmetry number (REDUCED), = 1 for ORIGINAL indices
            BATCH     batch number as for input
            I, SIGI   scaled intensity & sd(I)
            SCALE     scale factor applied
            SIGSCALE  sd(SCALE)
            NPART     number of parts, = 1 for fulls, negated for scaled
                       partials, i.e. = -1 for scaled single part partial
            TIME      copied from input if present
            XDET,YDET copied from input if present
            ROT       copied from input if present (averaged for
                        multi-part partials)
            FRACTIONCALC total fraction (if present in input file)
            LP        copied from input if present
    
    
    SCALES
    scale factors from DUMP, used by RESTORE option
    ROGUES
    list of bad agreements
    PLOT
    If SCALES SECONDARY or SURFACE options are used, graph of correction surface (Plot84 format)
    NORMPLOT
    normal probability plot from merge stage
    *** this is at present written is a format for plotting program xmgr ***
    ANOMPLOT
    normal probability plot of anomalous differences
                (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
    

    *** this is at present written is a format for plotting program xmgr ***

    SCALEPACK
    Formatted output selected by the command OUTPUT POLISH

    EXAMPLES

    1. Simple smoothed scaling
    2. set crystal = "tfn2"
      scala hklin     ${crystal}_srs  \
            hklout    ${crystal}_merge \
            scales    ${crystal}_${run}.scales \
            rogues    ${crystal}_${run}.rogues \
            normplot  ${crystal}_${run}.norm \
                  << eof 
      
      run  1 all
      
      intensities partial     # we have few fulls
      
      cycles 20
      
      anomalous off           # this is a native set
      
      sdcorrection 1.3 0.02   # from a previous run
      
      # try it with and without the tails correction: this is with
      scales   rotation spacing 10  bfactor on    tails
      
      reject 4              # reject outliers more than 4sd from mean
      exclude eprob 1e-8    # reject very large observations, if probability
                            #    .lt. 10**-8  (== Emax 4.3)
      
      eof
      
    3. Simple Batch scaling
    4. #!/bin/csh -f
      #
      # Scale data from Mosflm, merge with Scala
      #
      scala hklin jpa_example hklout jpa_example_sc \
            scales   jpa.scales \
            rogues   jpa.rogues \
            normplot jpa.norm \
            anomplot jpa.anom \
      << eof-1
      run 1 batch 2001 to 2049
      run 2 batch 2051 to 2100
      cycles 8
      sdcorr  1.5  0.03
      scales batch  bfactor on 
      reject merge 4
      anomalous on
      partials test 0.95 1.05 maxwidth 6
      eof-1
      
    5. A more complicated example, smooth scaling of native, then scaling of derivative to native
    6. #!/bin/csh -f
      #
      #scala
      #
      cd /scr0/fm1/Temp
      #
      ##
      #==== Sort native output from Mosflm together
      ##
      sort:
      sortmtz hklout m6c8_sort.mtz  << end_sort
      H K L M/ISYM BATCH I SIGI
      m6c8a1.mtz
      m6c8a2.mtz
      end_sort
      #
      ##
      #==== scale native data together, no Bfactor, smooth scale on rotation
      #==== merge native
      ##
      scala hklin m6c8_sort.mtz hklout m6c8_scala <<EOF
      run 1 batch 1 to 90000
      title frozen native monoclinic m6c8 
      scales bfactor off  rotation spacing 5
      resolution 25 6.1
      anomalous off
      reject merge  4
      sdcorr  1.3  0.04
      EOF
      #
      # Convert native data into form suitable for reinput to Scala
      combat hklin m6c8_scala hklout m6c8_r << eof-r
      input mtzi
      labin I=IMEAN SIGI=SIGIMEAN
      batch 1
      eof-r
      #
      ##
      #==== Sort derivative data together
      ##
      sort:
      sortmtz hklout m6cb3_sort.mtz  << end_sort
      H K L M/ISYM BATCH I SIGI
      m6cb3b.mtz
      m6cb3c.mtz
      end_sort
      #
      ##
      #==== Combine together merged native & sorted derivative data, by
      #     interleaving reflection records
      #     Must resort data after this step
      ##
      mtzutils:
      mtzutils hklin2 m6cb3_sort.mtz \
               hklin1 m6c8_r \
               hklout m6cb3_resort << eof-m
      merge
      eof-m
      #
      sortmtz hklin temp_m6cb3_resort hklout m6cb3_resort << eof-m
      H K L M/ISYM BATCH
      eof-m
      #
      ##
      #==== Scale and merge derivative data, using native data as reference (run 1)
      #     Allow scale factor to vary across detector, but with some restraints
      #     The reference data (native) is omitted from the output file
      ##
      scala hklin m6cb3_resort.mtz hklout m6cb3_scala <<EOF
      run 1 batch 1 reference
      run 2 batches 10 to 23156 exclude 23152          #  reject one duff batch
      run 3 batches 23157 to 90000
      title frozen native monoclinic m6cb3 
      scales bfactor off  rotation spacing 5 detector 3 3 
      tie detector 0.1
      resolution 25 6.1
      reject merge  4
      sdcorr  1.1  0.005
      EOF
      #
      #
      #
      #exit
      trunc:
      truncate  hklin  m6cb3_scala \
                hklout /ss3/fm1/Mutase/Derivs_FzM/m6cb3_F <<end-trunc
      truncate 
      wilson
      resolution  25 6.1
      nresidue   1400
      labout  F=FM623 SIGF=SIGFM623 DANO=DANOM623 SIGDANO=SIGDANOM623
      end-trunc
      
    7. Relative scaling of MAD data with a reference dataset
    8. #!/bin/csh -f
      
      # Define a base name for files created in this script
      set name = dfxe_3d
      
      # Input filenames for the 4 datasets at different wavelengths
      set l1 = dfxe_1   # peak
      set l2 = dfxe_2   # inflection
      set l3 = dfxe_3   # hard remote
      set l4 = dfxe_4   # 1A wavelength
      
      # Angular spacing for smoothed Bfactor
      set spacing = 10   # Bfactor spacing
      
      ### Optional branch point for restarts
      ###goto l1
      
      # Sort together the initial data files
      sortmtz hklout ${name}_all << eof-s
      H K L M/ISYM BATCH
      ${l1}.mtz
      ${l2}.mtz
      ${l3}.mtz
      ${l4}.mtz
      eof-s
      
      ###=== Step 1 ==========================================================
      ###===    Scale reference set to itself
      ###===    In this case, the reference set is all wavelengths
      scale:
      set ln = ref
      set ll = ${name}_${ln}
      
      scala hklin ${name}_s  hklout ${ll}_sc \
        scales   ${ll}.scales \
        rogues   ${ll}.rogues \
        normplot ${ll}.norm \
        anomplot ${ll}.anom \
           << eof_sc${ll}
      run 1 batch 1000 to 1999
      run 2 batch 2000 to 2999
      run 3 batch 3000 to 3999
      run 4 batch 4000 to 4999
      scales rotation spacing ${spacing}
      anomalous off
      eof_sc${ll}
      
      ###=== Step 2 ==========================================================
      ###===    Reformat reference data add to unmerged data
      ###===
      rf:
      # Reformat reference set for reinput (Imean only)
      combat  hklin  ${ll}_sc  hklout  ${ll}_rr << eof-rr
      input mtzi
      batch 1
      labin I=IMEAN SIGI=SIGIMEAN
      eof-rr
      
      # Put back together with main data
      mtzutils hklin1 ${ll}_rr hklin2 ${name}_s  hklout ${name}_all << eof-u
      merge
      eof-u
      
      # Must resort after mtzutils
      sortmtz hklin temp_${name}_all hklout ${name}_all << eof-m
      H K L M/ISYM BATCH
      eof-m
      
      ###=== Step 3 ==========================================================
      ###===    Scale all data relative to reference set
      ###===    In this case, this is done in two stages:-
      ###===      1. batch scaling, to take out any discontinuities between images
      ###===      2. smooth scaling, varying scales across the detector
      ###===         This 2nd step provides some correction for absorption
      ###===    If there are no discontinuities in scales between adjacent batches,
      ###===    step 1 should be omitted
      ###===
      scale_1:
      #   1. batch scaling
      set run = rel1
      set rel1 = ${name}_${run}
      scala hklin ${name}_all  hklout ${rel1} \
            scales   ${run}.scales \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-r1
      title Batch scale against reference
      #  Define runs: run 10 (batch 1) is the merged reference
      run 10 batch 1 reference
      run 1 batch 1000 to 1999
      run 2 batch 2000 to 2999
      run 3 batch 3000 to 3999
      run 4 batch 4000 to 4999
      # Use partials in scaling, as there are not many fulls
      intensities partial
      #
      scales batch brotation spacing ${spacing}  bfactor on
      output separate reference   # output data as input, with scales added
                                  # keep reference data for the second scaling stage
      eof-r1
      
      scale_2:
      #   2. smoothed 3D scaling
      set run = rel2
      set rel2 = ${name}_${run}
      scala hklin ${rel1}  hklout ${rel2} \
            scales   ${run}.scales \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-r2
      title Smoothed scale against reference
      run 10 batch 1 reference
      run 1 batch 1000 to 1999
      run 2 batch 2000 to 2999
      run 3 batch 3000 to 3999
      run 4 batch 4000 to 4999
      intensities partial
      #  Scales varying across the detector ( 3 x 3 scales)
      scales rotation spacing ${spacing} detector 3 3
      tie detector 0.2
      output separate   # Omit reference data this time
      eof-r2
      
      ###=== Step 4 ==========================================================
      ###===   Split out each wavelength & merge
      ###===   No scaling is done here (it's already been done)
      l1:
      set ln = 1
      set run = l${ln}
      scala hklin ${name}_rel2  hklout ${name}_${run} \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-${run}
      title Merged, lambda ${run} 
      run 1 batch 1000 to 1999
      scales constant
      anomalous on
      onlymerge
      reject 4
      eof-${run}
      
      ###=== Step 5 ==========================================================
      ###===   Convert I to F, do Wilson plot
      ###===   
      truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${run}
      
      ###===    Repeat step 4 & 5 for L2
      ###===   
      l2:
      set ln = 2
      set run = l${ln}
      scala hklin ${name}_rel2  hklout ${name}_${run} \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-${run}
      title Merged, lambda ${run} 
      run 1 batch 2000 to 2999
      scales constant
      anomalous on
      onlymerge
      reject 4
      eof-${run}
      
      truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${run}
      
      ###===    Repeat step 4 & 5 for L2
      ###===   
      l3:
      set ln = 3
      set run = l${ln}
      scala hklin ${name}_rel2  hklout ${name}_${run} \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-${run}
      title Merged, lambda ${run} 
      run 1 batch 3000 to 3999
      scales constant
      anomalous on
      onlymerge
      reject 4
      eof-${run}
      
      truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${run}
      
      ###===    Repeat step 4 & 5 for L2
      ###===   
      l4:
      set ln = 4
      set run = l${ln}
      scala hklin ${name}_rel2  hklout ${name}_${run} \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-${run}
      title Merged, lambda ${run} 
      run 1 batch 4000 to 4999
      scales constant
      anomalous on
      onlymerge
      reject 4
      eof-${run}
      
      truncate hklin ${name}_${run} hklout ${name}_${run}_f << eof_t${run}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(-)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${run}
      
      set name = dfxe_3d_l
      
      ###=== Step 6 ==========================================================
      ###===   Sort together merged data for all wavelength, outputting a 
      ###===   single record for each hkl
      ###===   For each wavelength, store amplitude F & sigF, 
      ###===   anomalous difference DANO (= F+ - F-) & sigDANO,
      ###===   and ISYM flag which shows if both F+ & F- were measured
      cad  hklout ${name}_fcad  \
           hklin1 ${name}1_f \
           hklin2 ${name}2_f \
           hklin3 ${name}3_f \
           hklin4 ${name}4_f       << eof-c
      labin  file_number 1  E1=F1 E2=SIGF1 E3=DANO1 E4=SIGDANO1 E5=ISYM1 E6=F1(+) E7=SIGF1(+) E8=F1(-) E9=SIGF1(-)
      labin  file_number 2  E1=F2 E2=SIGF2 E3=DANO2 E4=SIGDANO2 E5=ISYM2 E6=F2(+) E7=SIGF2(+) E8=F2(-) E9=SIGF2(-)
      labin  file_number 3  E1=F3 E2=SIGF3 E3=DANO3 E4=SIGDANO3 E5=ISYM3 E6=F3(+) E7=SIGF3(+) E8=F3(-) E9=SIGF3(-)
      labin  file_number 4  E1=F4 E2=SIGF4 E3=DANO4 E4=SIGDANO4 E5=ISYM4 E6=F4(+) E7=SIGF4(+) E8=F4(-) E9=SIGF4(-)
      eof-c
      
      ###=== Step 7 ==========================================================
      ###===   Re-scale together all wavelengths, get statistics on dispersive
      ###===   and anomalous differences
      scaleit:
      scaleit hklin ${name}_fcad  hklout dfxe_3d_fsc  << eof-sci
      labin FP=F4 SIGFP=SIGF4  \
       FPH1=F1 SIGFPH1=SIGF1 DPH1=DANO1 SIGDPH1=SIGDANO1 \
       FPH2=F2 SIGFPH2=SIGF2 DPH2=DANO2 SIGDPH2=SIGDANO2 \ 
       FPH3=F3 SIGFPH3=SIGF3 DPH3=DANO3 SIGDPH3=SIGDANO3  \
       FPH4=F1(+) SIGFPH4=SIGF1(+) \
       FPH5=F1(-) SIGFPH5=SIGF1(-) \
       FPH6=F2(+) SIGFPH6=SIGF2(+) \
       FPH7=F2(-) SIGFPH7=SIGF2(-) \
       FPH8=F3(+) SIGFPH8=SIGF3(+) \
       FPH9=F3(-) SIGFPH9=SIGF3(-) 
      refine anisotropic
      eof-sci
      
    9. scale 6 data sets together in two steps, then pick out one of them
    10. #  Set a load of symbols
      set crystal = m6d02
      
      set ss3 = /nfs/al/ss3/fm1/Mad
      set ss5 = /nfs/al/ss5/fm1/Mtz
       
      set scr0 = /nfs/al/scr0/fm1/Mad
      set scr1 = /nfs/al/scr1/fm1/Mad
      
      set file = alldata 
      set resolution = 2.8
      set title = " MMCM m6d02 - all the data scaled together"
      
      # Editable branches for restarts
      goto scala1
      #goto scala2
      #goto merging
      
      sortmtz hklout Mtz/${crystal}_${file}_sc0 << eof1
      H K L M/ISYM BATCH
      ... . . lots of files . . 
      eof1
      
      # First scaling run - slope mode
      # each batch has three parameters, a relative B-factor, and two scale
      # factors, for the beginning & end of each batch. The scale for a
      # particular reflection is interpolated between the 2 scales for that
      # batch according to its "phi" value. This allows for a (linear) decay
      # on incident beam intensity during the exposure
      #
      scala1:
      scala hklin  Mtz/${crystal}_${file}_sc0  \
            hklout Mtz/${crystal}_${file}_sc1 \
            scales Mtz/${crystal}_${file}_sc1.scales \
            mapout Mtz/${crystal}_overlap_sc1 << eof-sc1
      Title  ${title} - 1st run
      run 1 include batch 1000 to 1999
      run 2 include batch 2000 to 2999
      run 3 include batch 3000 to 3999
      run 4 include batch 4000 to 4999
      run 5 include batch 5000 to 5999 exclude 5190 5191  # exclude 2 duff batches
      run 6 include batch 6000 to 6999
      scales slope bfactor on detector 1      # batch slope scaling
      tie rotation 0.1                        # tie scale-factor pairs
      resolution 20 ${resolution} 
      intensities scale_partial 0.5           # use scaled partials in scaling
      sdcorrection 1.25  0.01         # these fudge factors were
                                      # determined from a previous run in Scala
      cycles 8
      reject 8                        # outlier test on 8sd
      reject byrun
      exclude sdmin 1                 # omit weak reflections from scaling
      overlapmap                              # write overlap matrix to MAPOUT
      output separate    # No merging
      eof-sc1
      
      # second run - across the detector
      # We've now taken out any discontinuities of scale between batches
      # Now apply a smoothly-varying scale factor along Phi & across the
      # detector on top of the first batch scale. The input file (hklin) for
      # this is the output file from the first scaling, and contains 
      # that result in the SCALE column which is applied here on input.
      #
      scala2:
      scala hklin  Mtz/${crystal}_${file}_sc1  \
            hklout Mtz/${crystal}_${file}_sc2 << eof-sc2
      Title  ${title} - 2nd run
      run 1 include batch 1000 to 1999
      run 2 include batch 2000 to 2999
      run 3 include batch 3000 to 3999
      run 4 include batch 4000 to 4999 
      run 5 include batch 5000 to 5099
      run 6 include batch 5100 to 5999 exclude 5190 5191
      run 7 include batch 6000 to 6999
      scales rotation spacing 5 bfactor off detector 4 4     # smooth scaling
      tie detector 0.1 rotation 0.1      # restrain the scales from varying too much
      initial none                       # skip initial pass through data
                                         # set initial scales = 1.0
      sdcorrection 1.25 0.01
      cycles 10 reject 1                 # reject outliers on all cycles
      reject 8                           #   but not too tight
      reject byrun
      exclude sdmin 1
      resolution ${resolution} 20
      dump  Mtz/${crystal}_${file}_sc2a.scales
      output separate  # Still no merging
      eof-sc2
      
      #exit
      
      # third run - outlier rejection on individual datasets
      #  Read the output file from the 2nd scaling pass, and just pick out
      #  one data set (mostly one run). Outliers are  removed from the
      #  output file here 
      # Outliers are listed in the file assigned to ROGUES
      #
      # This should be repeated for each dataset (run) 
      #  (note that runs 5 and 6 belong to the same dataset)
      # proceed to do individual merging 
      
      merging:
      
      set reject = 10
      set agr_reject = 10
      set sdfac = 1.2
      set sdadd = 0.002
      
      set lambda = 1
      set range = "1000 to 1999" 
      set title = " MMCM m6d02 wavelength Lambda ${lambda}"
      
      # individual data set run - final outlier rejection
      scala_l1:
      scala hklin  Mtz/${crystal}_${file}_sc3 \
            hklout ${scr1}/${crystal}_l${lambda}_rj${reject}_sc4 \
            rogues ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.rogues    \
            normplot ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.norm    \
            anomplot ${ss3}/${crystal}_l${lambda}_rj${reject}_sc4.anom    \
        << eof-l1
      Title  ${title} - 4th scaling run
      run 1 include batch ${range}
      scales rotation spacing 5 bfactor off detector 4 4    
      sdcorrection $sdfac $sdadd
      initial none
      noscale                         # Skip scaling, just do merge
      final partials                  # use summed partials in statistics (default)
      resolution ${resolution}
      reject scale $reject
      reject merge ${agr_reject}
      eof-l1
      
      
      
      truncate:
      truncate hklin ${scr1}/${crystal}_l${lambda}_rj${reject}_sc4 \
              hklout ${ss3}/${crystal}_l${lambda}_trun_rj${reject}_sc4 << eof-l1
      TITLE ${title}
      labout F=FPTL${lambda} SIGF=SIGFPTL${lambda} DANO=DANOPTL${lambda} -
            SIGDANO=SIGDANOPTL${lambda} ISYM=ISYMPT${lambda}
      wilson
      nresidue  2600
      eof-l1
      
       . . .  repeat for other datasets . . 
      
      

    REFERENCES

    1. W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
    2. P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
    3. P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
    4. R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
    5. Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
    6. Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
    7. Kay Diederichs & P. Andrew Karplus,"Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)

    Appendix 1: Partially recorded reflections

    Partially recorded reflections may optionally be used in scaling (controlled by the command INTENSITIES), and in the final analysis (controlled by the command FINAL). The default is to include summed partials in both scaling and the final analysis and merging.

    Different options for the treatment of partials are set for both scaling & merging stages by the PARTIALS command, or separately for the scaling stage (INTENSITIES command) and the merging stage (FINAL command). Partials may either be summed (subkeyword PARTIALS, with various options), or scaled (subkeyword SCALE_PARTIALS): in the latter case, each part is treated independently of the others. If summed partials are used in scaling with the SCALES BATCH option, the FRACTIONCALC is used to partition the effects of the different scales for the two halves. In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 31, 32, 33), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

    For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags below to accept or reject partial sets according to their reliability.

    Summed partials:
    All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The options to use partials as well as fulls are defined separately for the scaling and merging steps on the INTENSITIES and FINAL commands. The parameters for the checks are set by the PARTIALS command for both stages, or separately on the INTENSITIES and FINAL commands. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

    (a)
    At least two parts must be present (unless the CORRECT option is set, see (e) below)
    (b)
    not more than MAXWIDTH <maximum_width> parts must be present [default maximum_width = 5]
    (c)
    if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked unless NOTEST is specified, in which case they are rejected. NOCHECK switches off this check.
    (d)
    if the TEST option is set (default if no MPART column), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.2]
    (e)
    if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. As for the scaled partial option, this correction relies on accurate FRACTIONCALC values, so beware.
    (f)
    if the GAP option is set, partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

    By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

          TEST 1.2 1.2 CORRECT 0.5 
    

    will scale up all partials with a total fraction between 0.5 & 1.2

          TEST 0.95 1.05           
    

    will accept summed partials 0.95->1.05, no scaling

          TEST 0.95 1.05 CORRECT 0.4  
    

    will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

    Note that a profile-fitted intensity, if present in the file as a separate IPR column, will not be used for a scaled partial, unless the PARTIALS USE_PROFILE flag is set.

    Scaled partials:
    In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].


    Appendix 2: Scaling algorithm

    For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

     
            Sum( whl * ( Ihl - ghl * Ih )**2 )                  Ref  
    
    

    where Ih is the current best estimate of the "true" intensity

            Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)
    
    

    Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

    The inverse scale factor ghl is derived as follows:

            ghl = Thl * Chl * Shl
    
    

    where Thl is an optional relative B-factor contribution, Chl is a scale factor (1-dimensional or 3-dimensional (ie DETECTOR option)), and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY or SURFACE options).

    a) B-factor (optional)

    For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . ... tn). Then for an observation measured at time tl

            B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))
    
            where   Bi  are the B-factors at time ti
                    delt    = tl - ti
                    p(delt) = exp ( - (delt)**2 / Vt )
                    Vt  is "variance" of weight, & controls the smoothness
                            of interpolation
    
            Thl = exp ( + 2 s B )
                    s = (sin theta / lambda)**2
    
    

    An alternative anisotropic B-factor may be used to correct for anisotropic fall-off of scattering. This is parameterized on the components of the scattering vector (divided by 2 for compatibility with the normal definition of B) in two directions perpendicular to the Xray beam (y & z in the "Cambridge" coordinate frame with x along the beam).

            Thl = exp ( + 2[uy**2 Byy + 2 uy uz Byz + uz**2 Bzz])
    
            where  uy, uz are the components of d*/2
    
    

    Byy, Byz, Bzz are functions of time ti or batch as for the isotropic Bfactor. The principle components of B (Bfac_min, Bfac_max) are also printed.

    b) Scale factors

    For each run, scale factors Cxyz are determined at positions (x,y) on the detector, at intervals on rotation angle z. Then for an observation at position (x0, y0, z0),

            Chl(x0, y0, z0) =
       Sum(z)[p(delz){Sum(xy)[q(delxy)*Cxyz]/Sum(xy)[q(delxy)]}/Sum(z)[p(delz)]
    
    where   delz    = z - z0
            p(delz) = exp(-delz**2/Vz)
            q(delxy)= exp(-((x-x0)**2 + (y-y0)**2)/Vxy)
            Vz, Vxy are the "variances" of the weight & control the smoothness
                    of interpolation
    
    

    For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor (or set of scale factors across the detector) for each batch. The SLOPE (not recommended) option has two scale factors per batch, with the scale interpolated linearly between the beginning and end according to the rotation angle of the reflection.

    c) Anisotropy factor

    The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of either the secondary beam (SECONDARY correction) in the camera spindle frame, or the diffraction vector in the crystal frame (SURFACE option).

    1. SECONDARY beam direction
               s  =  [Phi] [UB] h
               s2 = s - s0       
               s2' = [-Phi] s2
      Polar coordinates:
               s2' = (x y z)
               PolarTheta = arctan(sqrt(x**2 + y**2)/z)
               PolarPhi   = arctan(y/x)
                                   where [Phi] is the spindle rotation matrix
                                         [-Phi] is its inverse
                                         [UB]  is the setting matrix
                                         h = (h k l)
      
    2. Crystal frame vector
      	(x y z) = [Q][B] h
      Polar coordinates:
               PolarTheta = arctan(sqrt(x**2 + y**2)/z)
               PolarPhi   = arctan(y/x)
                                   where [Q] is a permutation matrix to put
                                             h, k, or l along z (see POLE option)
                                         [B]  is the orthogonalization matrix
                                         h = (h k l)
      
    then
     Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)
    
                                 where Ylm is the spherical harmonic function for
                                           the direction given by the polar angles
                                       Clm are the coefficients determined by
                                           the program
    
    
    Notes:

    Appendix 3: TAILS correction

    For many crystals, the reflection profile on rotation ("phi") is not a simple closed curve, but has long tails due at least in part to thermal diffuse scattering (TDS): the amount of this depends on the crystal, and is larger at high resolution than at low resolution. If all reflections were scanned through the same angle, then equal amounts of this diffuse scattering would be included in each reflection. However, in typical "coarse sliced" data collection schemes, where the image rotation width is larger than the reflection width, reflections are recorded on a variable number of images, 1, 2, 3 etc, and different amounts of the tails are included in the integrated intensity. This generally leads to a negative "partial bias", increasing with resolution, i.e. the apparent intensities of partially recorded reflections are higher than equivalent fulls.

    The TAILS correction is an attempt to correct for the different truncation of tails, by using a simple (crude) model of thermal diffuse scattering, although the correction only attempts to correct for the different truncation, and does not attempt to correct for diffuse scattering itself.

    Some of the ideas used are based on suggestions by R.H.Blessing, Cryst. Reviews, 1, 3-58 (1987), but he should not be blamed for this.

    This is a brief account of method (see code & comments in subroutine dffscn for more details):-

    1. I = J ( 1 + alpha)
      where J is the Bragg intensity (true intensity) & I is the measured intensity, i.e. the TDS intensity is proportional to the Bragg intensity
    2. alpha = alpha0 + alpha1 * (sin theta / lambda)**2
      where alpha0 & alpha1 are refinable parameters. This is a simple linear isotropic model to the amount of TDS. alpha0 should be 0.0, and may be fixed as such, but allowing it to vary seems to help sometimes. Both alpha0 & alpha1 are reset if they go negative in the refinement. An extension of the model would be to make alpha anisotropic.
    3. each reflection is scanned over an angle DPhi, which is an integral multiple of the image width (Dphi = Nimages * DelPhi). A rotation by DPhi moves the reflection a distance in reciprocal space
    4.         Dq = Dphi * xsi,    
      

      where xsi is the radius from the rotation axis

      If the half width of the reflection (including tails) is v (another refineable parameter), and 2v > Dq, then part of the tails will be truncated.

      Taking a simple model of the shape of the tails as a triangle of base width 2v, height in the middle h (h = J * alpha / v), then the area in the tails (= tail intensity) and the intensity truncated by the restricted scan range can be calculated. Then the corrected ("true") intensity J can be calculated

      For full scan:

              J = I / (1 + alpha)
      

      For truncated scan (missing parts of tails C1 & C2)

              J = I / (1 + alpha*(1 - C1 - C2))
      
    5. because this model is very crude, it seems insufficiently trustworthy to use as a proper correction for TDS. It does however seem reasonable to correct for the different amounts of tails truncation, C1 & C2 ( >= 0.0)
    6. The correction applied is thus

              I' = I * (1 + alpha) / (1 + alpha*(1 - C1 - C2))
      
    7. the parameters refined are v, alpha0 (A0) and alpha1 (A1). By default, the same parameters are used for all runs (see LINK, UNLINK). refinement of the parameters seems often to be unstable. If they are being reset from negative values, try setting A0 = 0.0 (e.g. SCALES . . TAILS 0.005 0.0 30.0) and fixing A0 (FIX A0, this is the default)

    Appendix 4: Data from Denzo

    DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from Scalepack as input to Scala (in which case the following keyword record has to be included in Scalepack: no merge original index).

    Both of these options have some problems:

    ALSO: the DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using program COMBAT.


    Appendix 5: Outlier algorithm

    The test for outliers is as follows:

    (1)
    if there are 2 observations (left), then
    (a)
    for each observation Ihl, test deviation
            Delta(hl) = |Chi|
             |Ihl - ghl Iother| / sqrt[sigIhl**2 + (ghl*sdIother)**2]
    

    against sdrej2, where Iother = the other observation

    (b)
    if either Delta(hl) > sdrej2, then
    1. in scaling, reject reflection. Or:
    2. in merging,
      1. keep both (default or if KEEP subkey given) or
      2. reject both (subkey REJECT) or
      3. reject larger (subkey LARGER) or
      4. reject smaller (subkey SMALLER).
    (2)
    if there 3 or more observations left, then
    (a)
    for each observation Ihl,
    1. calculate weighted mean of all other observations <I>n-1 & its sd(<I>n-1)
    2. deviation
    3.           Delta(hl) =
             |Ihl - ghl <I>n-1>| / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
      
    4. find largest deviation max|Delta(hl)|
    5. count number of observations for which |Delta(hl)| .ge. 0 (ngt), & for which |Delta(hl)| .lt. 0 (nlt)
    (b)
    if max|Delta(hl)| > sdrej, then reject one observation, but which one?
    1. if ngt == 1 .or. nlt == 1, then one observation is a long way from the others, and this one is rejected
    2. else reject the one with the worst deviation max|Delta(hl)|
    (3)
    iterate from beginning

    RELEASE NOTES

    Version 2.7.5

    Version 2.7.4

    Version 2.7.3

    Version 2.7.2

    Version 2.7.1

    Version 2.6.4

    Version 2.6.3

    Version 2.6.2

    Version 2.6.1

    Version 2.6.0

    Version 2.5.5

    Version 2.5.4

    Version 2.5.3

    Version 2.5.2

    Version 2.5.1

    Version 2.5.0

    Version 2.4.3

    Version 2.4.2

    Version 2.4.1

    Version 2.3.2

  • Out of Phi range is warning, not fatal
  • Check for M>0 (flag set in Postref) for partials: previously didn't work with data from Postref
  • Correct labels for UNMERGED output option
  • DAMP keyword added
  • Bug fix to avoid normal probability analysis problem is no fulls
  • Version 2.3.1

  • Output labels for SEPARATE option changed to conform with CCP4 3.3 convention, i.e. I(+) and I(-) etc
  • Version 2.3.0

  • added "anomalous match" options for selecting matched I+ & I-
  • EXCLUDE does not check reference batch
  • Version 2.2.3

    1. fixed bug in summed partials in case of "scales batch": this combination is still dubious, but awaits proper analysis
    2. added PARTIALS keyword
    3. fixed bug in calculation of Rfull: this was completely wrong if anomalous data was present
    4. added INTENSITIES ANOMALOUS option to keep I+ & I- separate in scaling (not normally recommended)
    5. allow incomplete orientation data in certain cases

    Version 2.2.2, November 1996

    1. defaults on partial summation improved (and again 18/12/96)
    2. analysis on fulls only even when partials are used
    3. bug fix in random number routine (thanks to Adam)
    4. ONLYMERGE option
    5. If scaling across detector (e.g. "scales detector 3 3"), checks on valid Xdet, Ydet (within limits in file header)
    6. Rogues file lists Xdet, Ydet, Phi
    7. default in scaling is "exclude sdmin 6" (omitting weak observations speeds scaling)
    8. default FIX A0
    9. reject outliers on every cycle if scales "restored" (else previous scaling gets messed up)
    10. analysis by position on detector
    11. fixed bug affecting "reject byrun" & deviations with anomalous on

    Version 2.2.1, November 1996

    Many changes from version 1.x.x

    1. this version by default merges multiple measurements and thus replaces Agrovata. See the keyword OUTPUT for further description of the output options:-
    2. -
      AVERAGE [default] merged I (as from Agrovata)
      SEPARATE separate scaled measurements (as from older Scala versions), for reinput into Scala, or input into Agrovata [not recommended]
      POSTREF scaled file for input to POSTREF
      UNMERGED scaled, partials summed (or scaled), but not merged
    3. by default, the SDCORRECTION parameter SdFac (multiplier) will be automatically adjusted, from the normal probability analysis of deviations. This is done in a separate pass through the data before the final merging pass. The command SDCORRECTION NOADJUST disables this adjustment.
    4. The scaling option TAILS has been introduced. This makes some attempt to correct for the different truncation of the tails of diffuse scattering between fulls & partials. This option comes with a health warning: it should be treated with caution. Try with & without. (see commands SCALES . . TAILS, FIX, [UN]LINK)
    5. the way of putting data (e.g. native) back into the scaling as a reference set has changed. See example.
    6. treatment of summed partials has been elaborated (see FINAL & INTENSITIES keywords above). In 2.2.1, the defaults are not set optimally (whatever that means!): this is improved in 2.2.2
    7. Recommended usage:

      FINAL PARTIALS CHECK TEST 0.95 1.05     # for Mosflm
      
      FINAL PARTIALS TEST 0.95 1.05         # for Denzo (but FractionCalc 
                                            #  is rather unreliable)
      
      
    8. Scales are dumped to the file SCALES by default (see DUMP & RESTORE)
    9. Normal probability analyses done, plots output to files NORMPLOT and ANOMPLOT in a format sutiable for xmgr (from your favourite ftp server)
    10. by default scaling now excludes weak data (EXCLUDE SDMIN 3.0)

    AUTHOR

    Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

    SEE ALSO

    truncate, postref, Data Harvesting