CLARIN: the idea

Download Report

Transcript CLARIN: the idea

CLARIN-EU: Where do we stand?

Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator

Steven Krauwer CLARIN-NL Launch 27-05-2009 1

Overview

• General – The idea – Characteristics – Overall plan – Division of labour – Why should NL participate – Priority activities • Where we stand – Financial – Technical – Users – Language resources and tools – Dissemination – IPR and legal issues – Governance and funding • Concluding remarks CLARIN-NL Launch 27-05-2009 Steven Krauwer 2

CLARIN: the idea

Common Language Resources and Technology Infrastructure

: the basic idea is – to create a European federation of existing digital repositories that include language based data (e.g. text, audio, video, gesture, multimodal) – with uniform access to the data wherever it is – with easy access to existing language and speech technology tools as web services, to retrieve, manipulate, enhance, explore and exploit the data – with as its primary target audience researchers in social sciences and humanities (i.e. in general people without technological background) Steven Krauwer CLARIN-NL Launch 27-05-2009 3

CLARIN: some characteristics

• Highly distributed, very much bottom-up • CLARIN should eventually cover all EU and associated countries • CLARIN should eventually be jointly funded by all participating countries (plus some EC contribution) • All languages (spoken or studied in participating countries) equally important • At this moment 33 consortium partners in 23 countries, more countries to join • Participation (unpaid for non-partners) in Preparatory Phase by >150 member organisations in 32 countries Steven Krauwer CLARIN-NL Launch 27-05-2009 4

Overall plan for CLARIN-EU

Preparatory phase (2008-2010) • Put everything in place before we start (design, governance, funding, …) ( and build a prototype ) Construction phase (2011-2015): • Build the infrastructure and populate it with tools and resources ( and start using it ) Exploitation phase (2016 …): • CLARIN in full service Steven Krauwer CLARIN-NL Launch 27-05-2009 5

Division of labour

In the preparatory phase: – CLARIN-EU takes care of overall design – CLARIN-National takes care of interests of own language(s) and research communities – But we are not supposed to build the infrastructure yet!

In the construction phase: – Each country builds its own part of the generic CLARIN infrastructure (and pays for it) – Each country populates CLARIN with resources and services (and pays for it) – And we hope for some funding from the EC to contribute to the general costs Steven Krauwer CLARIN-NL Launch 27-05-2009 6

Why should NL participate at all

• access for HSS scholars to accumulated knowledge and expertise and to digital resources and tools all over Europe • the achievements of the NL research community, and Dutch language and culture will be and remain visible and accessible all over Europe • if we don’t take care of the interests of “our” languages and our research communities in CLARIN no one else will do it for us Steven Krauwer CLARIN-NL Launch 27-05-2009 7

Priority activities for now (1)

• Start building the national component of the generic federation and service infrastructure, connected to the European CLARIN infrastructure (identifying data and service centres, centres of expertise, and connecting them) • Start populating the infrastructure with existing resources and tools by conversion and encapsulation so that they comply with CLARIN supported standards for representation and interoperability Steven Krauwer CLARIN-NL Launch 27-05-2009 8

Priority activities for now (2)

• Start planning and (if possible) building new resources and tools according to own priorities following from existing and future research programmes for the social sciences and humanities • Start reaching out to the national humanities and social sciences research communities in order to – identify their needs and requirements – familiarize them with the benefits and innovative potential of the use of human language technologies in their work – encourage them to use CLARIN to cross national and discipline borderlines Steven Krauwer CLARIN-NL Launch 27-05-2009 9

Where we stand

• We are now almost halfway the preparatory phase (19 months to go) • Just a quick look at some aspects: – Financial – Technical infrastructure – User requirements – Language resources and tools – Dissemination – IPR and legal issues – Funding and governance Steven Krauwer CLARIN-NL Launch 27-05-2009 10

Financial (1)

Budget prep phase: • 4.1 M€ from EC (“CLARIN-EU”) • at this moment in addition ca 8 M€ from countries (“CLARIN-national”) • and some 7.3 M€ for parallel activities Overall budget estimate until 2020: ca 146 M € • to be paid by participating countries • looks like a lot of money • but (compare with other research infrastructures!): – < 12 M€ per year – < 0.5 M€ per country per year – < ?? M€ per language per year Steven Krauwer CLARIN-NL Launch 27-05-2009 11

Financial (2)

National contributions: • NL only country with commitments for preparatory phase, construction phase and start of exploitation phase (9 M €) • 16 countries have made commitments to CLARIN preparatory phase (26 K € - 3.1 M€) • 19 countries have launched parallel projects that might feed into CLARIN (11 K € - 2.5 M€) • No supported activity at all in 3 countries (out of 23) Steven Krauwer CLARIN-NL Launch 27-05-2009 12

Technical infrastructure

Done: – Definition of centre types – Requirements for the federation – Persistent identifiers – Registry requirements – Web services and workflow requirements To do: – Take design decisions – Select initial centres – Prototype implementation – Cost estimates for next phase Steven Krauwer CLARIN-NL Launch 27-05-2009 13

User requirements

Done • Survey (ongoing) • Overview of humanities projects (ongoing) • Call for humanities projects To do • Execution of humanities projects • Continuation of surveys and impact studies • Strategic plan for supporting SSH research Steven Krauwer CLARIN-NL Launch 27-05-2009 14

Language resources and tools (1)

Done • Survey and taxonomy • Interoperability requirements • Analysis of current LRT coverage • Criteria and priorities for integration • Usage scenarios and interoperability case studies 15 Steven Krauwer CLARIN-NL Launch 27-05-2009

Language resources and tools (2)

To do: • Continuation of survey & taxonomy • Quality criteria • BLARK description for the humanities and analysis of what exists per language • LRT workflows • Interoperability standards (not 1, not new, but based on set of supported existing standards) • Validation of standards and prototype • Integration of existing LRT • Plan to fill LRT gaps Steven Krauwer CLARIN-NL Launch 27-05-2009 16

Dissemination

Done: – First version of website: www.clarin.eu

– Issues 1-5 of the newsletter published: www.clarin.eu/newsletter – Design of helpdesk and registry prototype To do: – Enhancing the website – Continuation of quarterly newsletters – Plan for helpdesk and advice infrastructure for next phases Steven Krauwer CLARIN-NL Launch 27-05-2009 17

IPR and legal issues

Done (and still ongoing): – Collecting info on existing licenses (models, problems) – Collecting info on existing legislation – Model for authorization and authentication To do: – Small set of licensing templates – Collaboration with other, related services – Federation agreements Steven Krauwer CLARIN-NL Launch 27-05-2009 18

Governance and funding

Done: • Requirements and best practice for transnational coordination and governance To do: • Provide cost estimates to funding agencies • Agree (with funding agencies) on governance model and implement it • Agree (with funding agencies) on sustainable funding for the next phases • Design and implement operational structure Steven Krauwer CLARIN-NL Launch 27-05-2009 19

Concluding remarks

• First 17 months mostly spent on collecting information and requirements and creating a common ‘conceptual framework’, now time for convergence and firm decisions •

Less than 19 months

to achieve this, because – governments take far more time for funding decisions than anticipated – national activities are already ongoing, need for standards for representation and interoperability NOW • Target: all the important decisions in place by the end of 2009 / early 2010 to ensure continuity after 2010!

• CLARIN-NL can play a strong role because we are the only ones that are guaranteed to exist as a funded activity on Jan 1 st 2011!

Steven Krauwer CLARIN-NL Launch 27-05-2009 20