Transcript Slajd 1
From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012 Przemysław Kaszubski Faculty of English Outline • Research goal(s) and questions • Disclaimers and challenges • The story 2006-2012: – evolving solutions (incl. demos) • • • • • PICLE search and Perl Concordancer bigram and trigram tools Error concordancer IFA Student concordancer IFAConc • Some conclusions and plans Overall research goal(s) • Pedagogic “action-like” research: – probing the potential of corpus-based e-learning for (my) EAP writing instruction – Explore options for “seamless” integration with coursework • Some questions: – Can students be (successful ?) (corpus) explorers of language (for their own sake) ? – (meta-)linguistic background knowledge (or remnants of it): facilitator or inhibitor for data-driven learning (DDL)? – can (controlled ?) corpus-exploration facilitate constructivist (practical) knowledge making (= knowing how to write (this) better / best ?) – bottom-up exploration or / and top-down instruction? Some challenges and disclaimers • • • • small corpora (100 – 300 k words) non-indexed text search self-made online tools questions of time: – speed of search – time of satisfactory data analysis (any tool!) Why like this? • “experimental” assumption: – if tools with these limitations can work with learners, then ... • fun of creation • flexibility and freedom of development • availability of man-power: – student programmers – seminar students as corpora collectors – student writing groups as testers / testees • EAP / EGAP / ESAP context – special(ist) corpora The start: Briefly about PICLE • Polish sub-corpus of the International Corpus of Learner English (ICLE) • 330,000 words of running text (over 500 essays) • Major part (c. 230,000 words) published on ICLE CD ROM in 2002 (2006, 2nd ed), together with comparable English learner corpora collected in other EFL countries. • 50-thousand word sampler has been error-tagged • Can be (re-)searched online, unlike most other learner corpora Some (lexical) research insights from PICLE (1) Misuse • 'HAVE/GIVE (sb) possibility to <do sth>' = *'MIEĆ/DAĆ (komuś) możliwość / sposobność <zrobienia czegoś>' – ... the adoptive parents have influenced their child, without giving him any choice or *possibility to "try out" other options. – For this reason we should reread a story because it gives us *possibility to look at the literary work from a perspective. • BNC (chance, likelihood of): – [...] ... led him to the perception that man has the possibility of changing his state of consciousness. – The sample was so arranged as to be fully representative over the country as a whole, and everyone had the same possibility of being included. Some (lexical) research insights from PICLE (2) Overuse • High frequency vocabulary • adverbs of stance (boosters): definitely, certainly, undoubtedly, for sure “favourite” phraseology: • BE full of <sth>: – Our television is full of programmes unsuitable for young viewers, ... • that/this BE why: – Since imagination belongs to one of the most important of our features we cannot deprive ourselves of it. That is why, many of us are (...). • TAKE care (of <sb/sth>: – ... duties on the side of a woman, who is now expected not only to take care of her house and family but also to find time for professional work. Some (lexical) research insights from PICLE (3) Underuse / avoidance • E.g.: collocation breadth: attributive adjectives before attitude • To be sure: Exclusive NS use: The motivations for both sexes, to be sure, are different. There is plenty of violence, to be sure, but it is a nice violence and no one gets killed. Beginnings • How (better ?) to share (and discover ?) such learner-corpus insights with learners? – items for (passive) study (usage alerts etc.) – items for study AND the corpus method ..? Let’s try ! • potential usefulness of DDL assumed PICLE search / Perl Concordancer • From one corpus to a range of (comparable) corpora • Tool hub: – http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph p • “Perl Concordancer(s)”: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/s earch_adv_new.html Bigram and trigram tools • towards more “search-worthy” items • bigrams: – http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht ml • trigrams counter: – http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p hp • problems: – “geek” tools Error concordancing • List-driven: – http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors /errors.htm • "direct error concordancer“: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/error -builder.php • problems: – direct interpretability? – away from the “error” corpus evidence towards exposure to, and noticing, NS usage... IFA Student Concordancer • IFAConc’s predecessor: – http://ifa.amu.edu.pl/~kprzemek/concord2login/index.html • Problems: – interface issues – search syntax – getting students to do it, e.g.: • need for integration of prompted and spontaneous work • integration with other (online) (writing) course tasks IFAConc inspirations • • • • Tim John’s ‘Kibbitzer’ Tom Cobb’s URL-driven concordance feedback Aston’s corpora for ESP + Hyland’s emphasis on ESAP Linguistic theory: – Hoey’s lexical priming – also: Sinclair’s ‘extended unit of meaning’, Stubbs’ ‘phrasal schemas’, Goldberg’s CCxG, Halliday’s metafunctions • SLA and CALL theory: ‘default path’, DDL, constructionism and Web 2.0 • current DDL: Gavioli’s 'samples' vs 'examples‘, Widdowson’s authentication • Coniam's "concordancing oneself" • online Cobuild Sampler (friendly search syntax) • web search engines IFAConc conceptions • friendly (enough), but demanding (some) deep-level processing (noticing, interpretation, adoption/rejection): – patterns of use / meaning – variation • bottom-up and top-down access • recommended theoretical platform (cf. ‘default path’ in CALL) • relevance – authentication – personalisation • collaborative as well as individual IFAConc success target: A good human concordancer • initiates searches • adjusts searches • interprets searches – specific linguistic insights – awareness • applies interpretation (authenticates) – on a task (eg. revision, vocab learning activities) – personal record (annotation) – discusses / shares findings – co-annotation, discussion • personalizes the tool – annotations – personal corpora IFAConc initial technologies (1) • each search is a web link – concordances are interactive, not static • each search is recorded (user-logging) – user interface – teacher-admin interface • each search can be annotated – possible interaction with admin/teacher • contrasting corpora along a cline of specialisation – EGP -> EGAP -> ESAP – EFL varieties – personal(ised) corpora IFAConc initial technologies (2) • corpora easily switchable on and off • random sampling (e.g. 20 lines – Sinclair, after Hunston 2002) • wider context view (cf. ‘shunting’, Halliday) • Not only corpus search interface: – History – past work, possibly annotated – Resources - repository with recommended useful content and / or tasks • enhancing web integration and publicity: RSS, blog, Moodle, traditional CALL IFAConc – brief phase 1 to phase 2 change log • Tests (tasks. monitoring, questionnaires) performed up to 2010 showed: – – – – • Anybody can conduct a reasonably successful analysis (cf. > 200 extramurals) Returning users (gradually) search and interpret better procedure too teacher-intensive need to increase breadth of searches / analyses 2010 Improvements in UX (user experience) , e.g.: – system more interactive – clearer highlighting of error-prone learner data – optimization of training and teacher-student interactions => towards an e-learning environment – Encouraging boost in annotation quality after the changes • Latest enhancements (after 2011): – context reading mode – more sharing options (History entries / corpora) • CAVEAT: General dev problem: – new features vs. (pedagogical / research) focus IFAConc – some HIGHLIGHTS ... IFAConc highlights: Corpora Search IFAConc highlights: Corpora Search IFAConc highlights: Feedback link in student text (by comparison) IFAConc highlights: Shared entry task prompt IFAConc highlights: Resources training page IFAConc highlights: User monitoring (1) IFAConc highlights: User monitoring (2) – S-T collaborative annotation IFAConc highlights: User monitoring (3) – email notification H-98307 'devoted' - annotation update by 'jagodawasik' IFAConc highlights: Output: potential lexical primings: literary criticism vs. linguistics IFAConc highlights: Output: Likely overuse / underuse cases IFAConc – demo of today’s look • http://ifa.amu.edu.pl/~ifaconc • student interface • teacher / admin interface • Pardon the imperfections (server changes): Some IFAConc lessons learned • The system CAN work and could be self-sustainable, but: • Enforced mode => free-use – research goal: fine-tuning of automatic student-tool interaction – problematic peer-feedback-based constructivims (“collectivist” culture? But: Is student’s web experience not changing that?) • Prolonging the “novelty effect” – at least within assumed one user (learner) cycle • Steady user experience enhancements: – increasing the ease of use • improvement of ways-in without sacrificing learner effort (interpretation, authentication) • facilitation of annotations (integration of Corpora Search and History) – enhancing interpretation options • new corpora, new / improved training tasks etc. My relevant earlier presentations • • • • • • PALC (Lodz) 2007 TALC (Lisbon) 2008 PALC (Lodz) 2009 CL (Liverpool) 2009 TALC (Brno) 2010 ALL (Tuebingen) 2011