Transcript Slajd 1

From PICLE search to IFAConc
Corpora in Poland PLM Session
Poznań 8 Sept. 2012
Przemysław Kaszubski
Faculty of English
Outline
• Research goal(s) and questions
• Disclaimers and challenges
• The story 2006-2012:
– evolving solutions (incl. demos)
•
•
•
•
•
PICLE search and Perl Concordancer
bigram and trigram tools
Error concordancer
IFA Student concordancer
IFAConc
• Some conclusions and plans
Overall research goal(s)
• Pedagogic “action-like” research:
– probing the potential of corpus-based e-learning for (my) EAP
writing instruction
– Explore options for “seamless” integration with coursework
• Some questions:
– Can students be (successful ?) (corpus) explorers of language
(for their own sake) ?
– (meta-)linguistic background knowledge (or remnants of it):
facilitator or inhibitor for data-driven learning (DDL)?
– can (controlled ?) corpus-exploration facilitate constructivist
(practical) knowledge making (= knowing how to write (this)
better / best ?)
– bottom-up exploration or / and top-down instruction?
Some challenges and disclaimers
•
•
•
•
small corpora (100 – 300 k words)
non-indexed text search
self-made online tools
questions of time:
– speed of search
– time of satisfactory data analysis (any tool!)
Why like this?
• “experimental” assumption:
– if tools with these limitations can work with learners, then ...
• fun of creation
• flexibility and freedom of development
• availability of man-power:
– student programmers
– seminar students as corpora collectors
– student writing groups as testers / testees
• EAP / EGAP / ESAP context
– special(ist) corpora
The start: Briefly about PICLE
• Polish sub-corpus of the International Corpus of
Learner English (ICLE)
• 330,000 words of running text (over 500 essays)
• Major part (c. 230,000 words) published on ICLE
CD ROM in 2002 (2006, 2nd ed), together with
comparable English learner corpora collected in
other EFL countries.
• 50-thousand word sampler has been error-tagged
• Can be (re-)searched online, unlike most other
learner corpora
Some (lexical) research insights from
PICLE (1)
Misuse
• 'HAVE/GIVE (sb) possibility to <do sth>' = *'MIEĆ/DAĆ (komuś)
możliwość / sposobność <zrobienia czegoś>'
– ... the adoptive parents have influenced their child, without giving him
any choice or *possibility to "try out" other options.
– For this reason we should reread a story because it gives us *possibility
to look at the literary work from a perspective.
• BNC (chance, likelihood of):
– [...] ... led him to the perception that man has the possibility of
changing his state of consciousness.
– The sample was so arranged as to be fully representative over the
country as a whole, and everyone had the same possibility of being
included.
Some (lexical) research insights from
PICLE (2)
Overuse
• High frequency vocabulary
• adverbs of stance (boosters): definitely, certainly, undoubtedly, for sure
“favourite” phraseology:
• BE full of <sth>:
– Our television is full of programmes unsuitable for young viewers, ...
• that/this BE why:
– Since imagination belongs to one of the most important of our features we
cannot deprive ourselves of it. That is why, many of us are (...).
• TAKE care (of <sb/sth>:
– ... duties on the side of a woman, who is now expected not only to take care of
her house and family but also to find time for professional work.
Some (lexical) research insights from
PICLE (3)
Underuse / avoidance
• E.g.: collocation breadth: attributive adjectives before attitude
• To be sure:
Exclusive NS use:
The motivations for both sexes, to be sure, are different.
There is plenty of violence, to be sure, but it is a nice violence and
no one gets killed.
Beginnings
• How (better ?) to share (and discover ?) such
learner-corpus insights with learners?
– items for (passive) study (usage alerts etc.)
– items for study AND the corpus method ..?
Let’s try !
• potential usefulness of DDL assumed
PICLE search / Perl Concordancer
• From one corpus to a range of (comparable)
corpora
• Tool hub:
– http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph
p
• “Perl Concordancer(s)”:
– http://ifa.amu.edu.pl/~kprzemek/concord2advr/s
earch_adv_new.html
Bigram and trigram tools
• towards more “search-worthy” items
• bigrams:
– http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht
ml
• trigrams counter:
– http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p
hp
• problems:
– “geek” tools
Error concordancing
• List-driven:
– http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors
/errors.htm
• "direct error concordancer“:
– http://ifa.amu.edu.pl/~kprzemek/concord2advr/error
-builder.php
• problems:
– direct interpretability?
– away from the “error” corpus evidence towards
exposure to, and noticing, NS usage...
IFA Student Concordancer
• IFAConc’s predecessor:
– http://ifa.amu.edu.pl/~kprzemek/concord2login/index.html
• Problems:
– interface issues
– search syntax
– getting students to do it, e.g.:
• need for integration of prompted and spontaneous work
• integration with other (online) (writing) course tasks
IFAConc inspirations
•
•
•
•
Tim John’s ‘Kibbitzer’
Tom Cobb’s URL-driven concordance feedback
Aston’s corpora for ESP + Hyland’s emphasis on ESAP
Linguistic theory:
– Hoey’s lexical priming
– also: Sinclair’s ‘extended unit of meaning’, Stubbs’ ‘phrasal schemas’,
Goldberg’s CCxG, Halliday’s metafunctions
• SLA and CALL theory: ‘default path’, DDL, constructionism and Web
2.0
• current DDL: Gavioli’s 'samples' vs 'examples‘, Widdowson’s
authentication
• Coniam's "concordancing oneself"
• online Cobuild Sampler (friendly search syntax)
• web search engines
IFAConc conceptions
• friendly (enough), but demanding (some) deep-level
processing (noticing, interpretation, adoption/rejection):
– patterns of use / meaning
– variation
• bottom-up and top-down access
• recommended theoretical platform (cf. ‘default path’ in
CALL)
• relevance
– authentication
– personalisation
• collaborative as well as individual
IFAConc success target: A good
human concordancer
• initiates searches
• adjusts searches
• interprets searches
– specific linguistic insights
– awareness
• applies interpretation (authenticates)
– on a task (eg. revision, vocab learning activities)
– personal record (annotation)
– discusses / shares findings – co-annotation, discussion
• personalizes the tool
– annotations
– personal corpora
IFAConc initial technologies (1)
• each search is a web link
– concordances are interactive, not static
• each search is recorded (user-logging)
– user interface
– teacher-admin interface
• each search can be annotated
– possible interaction with admin/teacher
• contrasting corpora along a cline of specialisation
– EGP -> EGAP -> ESAP
– EFL varieties
– personal(ised) corpora
IFAConc initial technologies (2)
• corpora easily switchable on and off
• random sampling (e.g. 20 lines – Sinclair, after
Hunston 2002)
• wider context view (cf. ‘shunting’, Halliday)
• Not only corpus search interface:
– History – past work, possibly annotated
– Resources - repository with recommended useful content
and / or tasks
• enhancing web integration and publicity: RSS, blog,
Moodle, traditional CALL
IFAConc – brief phase 1 to phase 2
change log
•
Tests (tasks. monitoring, questionnaires) performed up to 2010 showed:
–
–
–
–
•
Anybody can conduct a reasonably successful analysis (cf. > 200 extramurals)
Returning users (gradually) search and interpret better
procedure too teacher-intensive
need to increase breadth of searches / analyses
2010 Improvements in UX (user experience) , e.g.:
– system more interactive
– clearer highlighting of error-prone learner data
– optimization of training and teacher-student interactions => towards an e-learning
environment
– Encouraging boost in annotation quality after the changes
•
Latest enhancements (after 2011):
– context reading mode
– more sharing options (History entries / corpora)
•
CAVEAT: General dev problem:
– new features vs. (pedagogical / research) focus
IFAConc – some HIGHLIGHTS ...
IFAConc highlights:
Corpora Search
IFAConc highlights:
Corpora Search
IFAConc highlights:
Feedback link in student text
(by comparison)
IFAConc highlights:
Shared entry task prompt
IFAConc highlights:
Resources training page
IFAConc highlights:
User monitoring (1)
IFAConc highlights:
User monitoring (2) – S-T
collaborative annotation
IFAConc highlights:
User monitoring (3) – email
notification
H-98307 'devoted' - annotation update by 'jagodawasik'
IFAConc highlights:
Output: potential lexical primings:
literary criticism vs. linguistics
IFAConc highlights:
Output: Likely overuse / underuse
cases
IFAConc – demo of today’s look
• http://ifa.amu.edu.pl/~ifaconc
• student interface
• teacher / admin interface
• Pardon the imperfections (server changes):
Some IFAConc lessons learned
• The system CAN work and could be self-sustainable, but:
• Enforced mode => free-use
– research goal: fine-tuning of automatic student-tool interaction
– problematic peer-feedback-based constructivims (“collectivist”
culture? But: Is student’s web experience not changing that?)
• Prolonging the “novelty effect”
– at least within assumed one user (learner) cycle
• Steady user experience enhancements:
– increasing the ease of use
• improvement of ways-in without sacrificing learner effort (interpretation,
authentication)
• facilitation of annotations (integration of Corpora Search and History)
– enhancing interpretation options
• new corpora, new / improved training tasks etc.
My relevant earlier presentations
•
•
•
•
•
•
PALC (Lodz) 2007
TALC (Lisbon) 2008
PALC (Lodz) 2009
CL (Liverpool) 2009
TALC (Brno) 2010
ALL (Tuebingen) 2011