Prezentace aplikace PowerPoint

Download Report

Transcript Prezentace aplikace PowerPoint

External Tools Not Only for
ArabTeX Documents
Karel Mokry
Otakar Smrz
Faculty of Mathematics and Physics
Charles University in Prague
July 7, 2015
Processing of Arabic at FLM
1
… which include



ArabCode – nontrivial conversion of encoding
standards of Arabic script
ArabSpell – rule-driven spelling system suited
especially for vocalized Arabic encoded in
ArabTeX notation
acolor.sty – package for control over coloring in
ArabTeX and LaTeX typesetting systems
July 7, 2015
Processing of Arabic at FLM
2
ArabTeX encoding concept




Lower ASCII, human-readable, rather phonetic
Algorithmic determination of several
phenomena of Arabic script
Evaluation of context, parametric interpretation
Contemporary and historical orthography
<iqra’ h_a_dA an-na.s.sa bi-intibAhiN>
versus
Aiqora>o h`*aA {ln~aS~a bi{notibaAhK
July 7, 2015
Processing of Arabic at FLM
3
Ordinary graphemic approach

Unicode / Unicode Transformation Format (UTF)
with great descriptive scope
Ux0639 / 0xD8 0xB9 (Arabic `ayn)
0000 0110 0011 1001 / 1101 1000 1011 1001
Ux004C / 0x4C (Latin L)
0000 0000 0100 1100 / 0100 1100


Windows CP 1256, ISO 8859-6, ASMO 449 etc.
Buckwalter Transliteration using lower ASCII
July 7, 2015
Processing of Arabic at FLM
4
ArabCode solution





Set of subroutines and scripts in Perl
Complex ArabTeX  UTF / Unicode
Documented Unicode  UTF
Quite easy UTF / Unicode  Windows  ISO
 ASMO  Buckwalter  etc.
Currently ArabTeX  Windows and Windows
 UTF  ISO  ASMO  Buckwalter
July 7, 2015
Processing of Arabic at FLM
5
ArabCode method


Considering problem ArabTeX  UTF / Unicode
Present:
Regular expressions – system tool, fast and safe
 Rules wired-in in the code – hard to maintain,
inflexible …


Future:
Finite-state transducer – most adequate, use of
own implementation may slow computation down
 External grammar – clear and extensible rules

July 7, 2015
Processing of Arabic at FLM
6
ArabSpell motivation




Spell-checking of entries of human-edited
lexical database
Supervision over misuse of notation, document
consistency requirement
Trial and error way of teaching it
One version already applied to educational
purpose documents and a book of Arabic
proverbs
July 7, 2015
Processing of Arabic at FLM
7
ArabSpell novel concept



Separation of the definition of the language and
the response from the spell-checking engine
Right Linear Grammar and convenient syntax
source :<code>: <text>target <text>
Nondeterministic Finite Automaton and its
construction from the grammar
t
t
source
x
e
“”
t
target
:<code>:
July 7, 2015
Processing of Arabic at FLM
8
Grammar of Arabic syllable


Nonterm generative rules
syllable :< "Unruly input!" >:
[C][V][C+empty]syllable [C][V][C+empty]
[C][ending]
Cluster definition rules …
[C] :<>: <'> <b> <t> <_t> <^g> <.h> <_h>
<d> <_d> <r> <z> <s> <^s> <.s> <.d> <.t>
<.z> <`> <.g> <f> <q> <k> <l> <m> <n> <h>
<w> <y>
[V]
July 7, 2015
:<>:
<a> <i> <u> <A> <I> <U>
Processing of Arabic at FLM
:<>:
9
… continuation
<_a>
<aa>
<iy>
<uw>
:<
:<
:<
:<
"Dagger 'alif occurred." >:
"Use <A> instead!" >:
"Use <I> instead!" >:
"Use <U> instead!" >:
[ending] :< "Invalid ending?" >: <uN>
<iN> <aN> <aNY> <Y> :<>: <aNA> <UA> <aW>
<aWA> :< "Silent 'alif enforced." >:
[empty]

:<>:
<>
# see [C+empty] above
Multi-functionality of the :<>: operator
July 7, 2015
Processing of Arabic at FLM
10
ArabSpell features


Clusters enable eminent network optimization
Spelling :< Perl subroutines >: extend the
class of languages beyond regular ones
Bracket matching, word repetition
 Control over long-distance dependencies
 Easy counting, e.g. word and sentence length
 Reports in different language versions


Detailed yet flexible grammar for Arabic, models
of other formalizable languages
July 7, 2015
Processing of Arabic at FLM
11
Using acolor.sty





Typesetting Arabic script in color with ArabTeX
Text marking, hide-and-check of diacritics
Primers, textbooks, educational purposes
Coloring commands combined with original
ArabTeX vocalization control
No modification of the input data themselves
July 7, 2015
Processing of Arabic at FLM
12
… for any diacritics
\coldia{red}\fullvocalize\accentshigh
\nocolshadda\colother{blue}\vocalize
\nocolall\colhamza{green}\vocalize
July 7, 2015
Processing of Arabic at FLM
13
… for other marking
\nocolall\colbeginning{blue}\novocalize
\nocolall\colshadda{white}\novocalize
\colisolated{red}\vocalize\accentslow
July 7, 2015
Processing of Arabic at FLM
14
Acknowledgement

Arabic script displays in this presentation were
typeset using the ArabTeX package for TeX and
LaTeX by Prof. Dr. Klaus Lagally of the University
of Stuttgart. Existence of this system has inspired
our work principally.
July 7, 2015
Processing of Arabic at FLM
15