Prezentace aplikace PowerPoint
Download
Report
Transcript Prezentace aplikace PowerPoint
External Tools Not Only for
ArabTeX Documents
Karel Mokry
Otakar Smrz
Faculty of Mathematics and Physics
Charles University in Prague
July 7, 2015
Processing of Arabic at FLM
1
… which include
ArabCode – nontrivial conversion of encoding
standards of Arabic script
ArabSpell – rule-driven spelling system suited
especially for vocalized Arabic encoded in
ArabTeX notation
acolor.sty – package for control over coloring in
ArabTeX and LaTeX typesetting systems
July 7, 2015
Processing of Arabic at FLM
2
ArabTeX encoding concept
Lower ASCII, human-readable, rather phonetic
Algorithmic determination of several
phenomena of Arabic script
Evaluation of context, parametric interpretation
Contemporary and historical orthography
<iqra’ h_a_dA an-na.s.sa bi-intibAhiN>
versus
Aiqora>o h`*aA {ln~aS~a bi{notibaAhK
July 7, 2015
Processing of Arabic at FLM
3
Ordinary graphemic approach
Unicode / Unicode Transformation Format (UTF)
with great descriptive scope
Ux0639 / 0xD8 0xB9 (Arabic `ayn)
0000 0110 0011 1001 / 1101 1000 1011 1001
Ux004C / 0x4C (Latin L)
0000 0000 0100 1100 / 0100 1100
Windows CP 1256, ISO 8859-6, ASMO 449 etc.
Buckwalter Transliteration using lower ASCII
July 7, 2015
Processing of Arabic at FLM
4
ArabCode solution
Set of subroutines and scripts in Perl
Complex ArabTeX UTF / Unicode
Documented Unicode UTF
Quite easy UTF / Unicode Windows ISO
ASMO Buckwalter etc.
Currently ArabTeX Windows and Windows
UTF ISO ASMO Buckwalter
July 7, 2015
Processing of Arabic at FLM
5
ArabCode method
Considering problem ArabTeX UTF / Unicode
Present:
Regular expressions – system tool, fast and safe
Rules wired-in in the code – hard to maintain,
inflexible …
Future:
Finite-state transducer – most adequate, use of
own implementation may slow computation down
External grammar – clear and extensible rules
July 7, 2015
Processing of Arabic at FLM
6
ArabSpell motivation
Spell-checking of entries of human-edited
lexical database
Supervision over misuse of notation, document
consistency requirement
Trial and error way of teaching it
One version already applied to educational
purpose documents and a book of Arabic
proverbs
July 7, 2015
Processing of Arabic at FLM
7
ArabSpell novel concept
Separation of the definition of the language and
the response from the spell-checking engine
Right Linear Grammar and convenient syntax
source :<code>: <text>target <text>
Nondeterministic Finite Automaton and its
construction from the grammar
t
t
source
x
e
“”
t
target
:<code>:
July 7, 2015
Processing of Arabic at FLM
8
Grammar of Arabic syllable
Nonterm generative rules
syllable :< "Unruly input!" >:
[C][V][C+empty]syllable [C][V][C+empty]
[C][ending]
Cluster definition rules …
[C] :<>: <'> <b> <t> <_t> <^g> <.h> <_h>
<d> <_d> <r> <z> <s> <^s> <.s> <.d> <.t>
<.z> <`> <.g> <f> <q> <k> <l> <m> <n> <h>
<w> <y>
[V]
July 7, 2015
:<>:
<a> <i> <u> <A> <I> <U>
Processing of Arabic at FLM
:<>:
9
… continuation
<_a>
<aa>
<iy>
<uw>
:<
:<
:<
:<
"Dagger 'alif occurred." >:
"Use <A> instead!" >:
"Use <I> instead!" >:
"Use <U> instead!" >:
[ending] :< "Invalid ending?" >: <uN>
<iN> <aN> <aNY> <Y> :<>: <aNA> <UA> <aW>
<aWA> :< "Silent 'alif enforced." >:
[empty]
:<>:
<>
# see [C+empty] above
Multi-functionality of the :<>: operator
July 7, 2015
Processing of Arabic at FLM
10
ArabSpell features
Clusters enable eminent network optimization
Spelling :< Perl subroutines >: extend the
class of languages beyond regular ones
Bracket matching, word repetition
Control over long-distance dependencies
Easy counting, e.g. word and sentence length
Reports in different language versions
Detailed yet flexible grammar for Arabic, models
of other formalizable languages
July 7, 2015
Processing of Arabic at FLM
11
Using acolor.sty
Typesetting Arabic script in color with ArabTeX
Text marking, hide-and-check of diacritics
Primers, textbooks, educational purposes
Coloring commands combined with original
ArabTeX vocalization control
No modification of the input data themselves
July 7, 2015
Processing of Arabic at FLM
12
… for any diacritics
\coldia{red}\fullvocalize\accentshigh
\nocolshadda\colother{blue}\vocalize
\nocolall\colhamza{green}\vocalize
July 7, 2015
Processing of Arabic at FLM
13
… for other marking
\nocolall\colbeginning{blue}\novocalize
\nocolall\colshadda{white}\novocalize
\colisolated{red}\vocalize\accentslow
July 7, 2015
Processing of Arabic at FLM
14
Acknowledgement
Arabic script displays in this presentation were
typeset using the ArabTeX package for TeX and
LaTeX by Prof. Dr. Klaus Lagally of the University
of Stuttgart. Existence of this system has inspired
our work principally.
July 7, 2015
Processing of Arabic at FLM
15