

The feature should be processed as the first feature processed, and should be processed only when it is called. This feature permits such composition/decompostion. Additionally, it may be preferable to compose two characters into a single glyph for better glyph processing. This feature can also be used to substitute ligatures formed using base glyphs and below base matras in Indic scripts.įunction: To minimize the number of glyph alternates, it is sometimes desired to decompose a character into two glyphs. In the Malayalam script (Indic), the conjunct Kla, requires a ligature which is formed using the base glyph Ka and the below-base form of consonant La. Below-base forms are represented by the non-spacing mark glyph.įunction: Produces ligatures that comprise of base glyph and below-base forms. Consonants in below-base form appear in Bengali syllables after the ones that form the base glyph. Given a sequence Gha, Virama (Halant), Va the below-base form of Va would be substituted to form the conjunct GhVa.įunction: The form that consonants appear below the base glyph. In complex scripts like Oriya (Indic), the consonant Va has a below-base form that is used to generate conjuncts. This mark combines with the consonant Ga to form a ligature.įunction: Substitutes the below-base form of a consonant in conjuncts. In complex scripts like Kannada (Indic), the vowel sign for the vowel I which a mark, is positioned above base consonants. The above-base form of OE would be substituted to form the correct piece of the letter that is displayed above the base consonant.įunction: Substitutes a ligature for a base glyph and mark that's above it.

In complex scripts like Khmer, the vowel OE must be split into a pre-base form and an above-base form. The glyph for ft replaces the sequence f t in Bickham Script, except when preceded by an ascending letter.įunction: Substitutes the above-base form of a vowel. This capability is important in some script designs and for swash ligatures. Unlike other ligature features, clig specifies the context in which the ligature is recommended. Other possibleĪpplications and perspectives of the system are discussed.Function: Replaces a sequence of glyphs with a single glyph which is preferred for typographic purposes. Million characters and get an improvement of 3% on an already high baseline ofĨ9.6% precision, obtained by a linear SVM classifier. Text classification task on two corpora (Chinese and Japanese) of a total of 18 Increase the efficiency of text mining methods. Finally,Īdding the information contained in these paths to unigrams we claim to We provide this graph with two weights: semanticity (semantic relationīetween subcharacter and character) and phoneticity (phonetic relation) andĬalculate "most semantic subcharacter paths" for each character.


provides us with a directed graph of allographicĬlasses. We use this structure toĮnhance the text model and obtain better results in standard NLP operations.įirst of all, to tackle the problem of graphical variation we defineĪllographic classes of characters. They can then use the design fea-tures embedded into these struc-ture elements (stem width, behavior of curved parts, contrast between thick and thin shape parts, and so on) to design the font's remaining characters., Today's industrial font description standards such as Adobe Type 1 or TrueType represent typographic characters by their shape outlines, because of the simplicity of digitiz-ing the contours of well-designed, large-size master characters., How-ever, outline characters only implic-itly incorporate the designer'sĬhinese characters have a complex and hierarchical graphical structureĬarrying both semantic and phonetic information. To create a new typeface family, type designers gen-erally start by designing a few key characters, such as o, h, p, and v, incorporating the most important structure elements such as vertical stems, round parts, diagonal bars, arches, and serifs (see Figure 1). Design tradition, the rules related to visual appearance, and the design ideas of a skilled character designer., The typographic design process is structured and sys-tematic: letterforms are visually related in weight, con-trast, space, alignment, and style.
