|
Chinese character pattern dataThe pattern data contains the precise definition of which strokes are required to match a certain character. This data is the actual core of Tibi's character recognition algorithm. The match qualitiy is determined, by the presence of all strokes with their defined directions, correctness of stroke order, correctness of stroke direction and the length of unmatched redundant strokes. The current version of the pattern data can be downloaded here (utf8). Format DocumentationThe format is designed to be compact and quickly editable. RulesRules assign a pattern to a character. If the pattern is matched, the character will be presented to the user.
Character : Pattern ;
Furthermore, there are invisible rules. There are pattern that do not correspond to any character, but do occur as components of more complex characters. More about invisible rules in the section about Macros.
{ Name } : Pattern ;
CommentsComments start with a percent % and last to the end of the line. PatternPattern define a character or a part of a character and the way it is correctly written. Characters as PatternElsewhere defined characters can be used as pattern to make up for a more complex character. OrientationsOrientations are the most primitive form of pattern. The following orientations are allowed:
Orientation rangesMultiple orientations can be combined with a star "*". The resulting pattern is matched if any the provided orientations is found. Example for a stroke that goes either right or up:
E*NE*N
Connected orientationsIf orientations and orientation ranges are prefixed with a minus "-" there must be a connecting stroke to the previous orientation. A correct stroke sequence for the character 乙 is:
E-SW-SE-E-NE
GroupsParatheses can be used to group together multiple pattern. The importence of groups will become clear in the subsequent section on locators. A possible group definition the strokes of character 口 is defined below. However without locators completely different shapes will be matched too.
(S (E-S) (E))
LocatorsLocators define relative positions of individual strokes. In fact it is not sufficient to just check the order of certain orientations, the graphical position is important to correctly recognize the written character. Locators are appended to directly after a pattern and have the following forms. The name is required to identify corresponding locators. Equally named locators must have overlapping ranges. A range is defined relative to the patterns extension. Zero is the patterns leftmost or upmost position. Ten is the right or downmost position. If minimum or maximum values are omitted 0 or respecively 10 is assumed. if only a number is given the range contains only one point.
[ Name xmin : xmax , ymin : ymax ]
[ Name x, y ] [ Name ] Example for a 口 with locators. Each stroke must go at least through the center position of the bounding boxes edge.
口 : (S[left] E[up]-S[right] E[down]) [left 0,5][up 5,0][right 10,5][down 5,10];
When the second component of the range is smaller than the first then this range must be completely contained, instead of just overlapping.
中 : 口[x 3:7,:] S[x :,2:8];
MacrosMacros are defined as invisible rules. They can be called with curly parathesis that embrace the marcos name and its arguments. The macro 上下 defines an up down relation ship between its two arguments. It can be called to define 否:
否 : {上下 不 口};
A macro definition can be access its arguments by numbers in curly parathesis, e.g. {1} and {2}.
{上下} : {1}[x :,8:20] {2}[x 5,-20:2];
ConclusionThe pattern definition is specified in a simple bounding box based language, that allows the definition of high level positional relations. |