RESPITE: The CASA Toolkit Page: Documentation: Block Library Index:ColourMask


The ColourMask block performs a `pseudo-grouping' on a discrete 1/0 missing data mask to perform a set of numbered groups. The output is a time-frequency map in which each point is labelled with an integer label indicating the group to which it has been assigned. This map may be used as input to the multisource decoder.

The `pseudo-grouping' is based on locating and labelling contiguous regions of 1's in the missing data mask. Time-frequency points are considered contiguous if they are adjacent in either time or frequency (i.e. joined either `horizontally' or `vertically' but not `diagonally' in the 2-D time-frequency map). Each separated region is assigned a unique group number. As an option the missing data mask may be split into sub-bands which are treated independently when searching for contiguous regions.

The block may also perform additional group splitting based on optional pitch and voicing estimate inputs. If the voicing input is connected then groups will be split at frames where the voicing parameter crosses the VOICING_THRESHOLD (i.e. at such frames all existing groups will end and new groups will begin). If the pitch input is supplied then groups will also be split at voiced frames in which the pitch changes by more than the DELTA_PITCH_THRESHOLD. (If the voicing input is not connected then all frames will be treated as though they are voiced).

After the contiguous region grouping, a higher level of common onset/common offset grouping may be applied to the groups that have been located. In this case disconnected groups that start (or end) at the same time frame are merged into a common group.

The grouping algorithm is controlled by the following 9 parameters:

Inputs Meaning Sample 1-D frame $\ge$2-D frame
in1 1/0 missing data mask frames No Yes No
(in2) Degree of voicing Yes No No
(in3) Pitch estimate Yes No No

Outputs Meaning
out1 labelled group frames

Parameters Type Default Meaning
WINDOW_SIZE Integer 5 Number of frames in running buffer
ONSET_GROUPING Boolean False Perform common onset grouping ?
OFFSET_GROUPING Boolean False Perform common offset grouping ?
MIN_GROUP_SIZE Integer 0 (see above)
MAX_GROUP_SIZE Integer 0 0 = no max size (see above)
NUM_SUBBANDS Integer 1 Number of subbands
HAS_DELTAS Boolean False Should be set to TRUE if input data includes deltas
VOICING_THRESHOLD Float 0.5 Threshold for discriminating voiced/unvoiced frames
DELTA_PITCH_THRESHOLD Float 10 Max pitch change allowed before groups will be split

Documentation for CTKv1.1.4 - Last modified: Thu Jun 28 12:12:04 BST 2001