Transcript CLARIN: the idea
CLARIN-EU: Where do we stand?
Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator
Steven Krauwer CLARIN-NL Launch 27-05-2009 1
Overview
• General – The idea – Characteristics – Overall plan – Division of labour – Why should NL participate – Priority activities • Where we stand – Financial – Technical – Users – Language resources and tools – Dissemination – IPR and legal issues – Governance and funding • Concluding remarks CLARIN-NL Launch 27-05-2009 Steven Krauwer 2
CLARIN: the idea
•
Common Language Resources and Technology Infrastructure
: the basic idea is – to create a European federation of existing digital repositories that include language based data (e.g. text, audio, video, gesture, multimodal) – with uniform access to the data wherever it is – with easy access to existing language and speech technology tools as web services, to retrieve, manipulate, enhance, explore and exploit the data – with as its primary target audience researchers in social sciences and humanities (i.e. in general people without technological background) Steven Krauwer CLARIN-NL Launch 27-05-2009 3
CLARIN: some characteristics
• Highly distributed, very much bottom-up • CLARIN should eventually cover all EU and associated countries • CLARIN should eventually be jointly funded by all participating countries (plus some EC contribution) • All languages (spoken or studied in participating countries) equally important • At this moment 33 consortium partners in 23 countries, more countries to join • Participation (unpaid for non-partners) in Preparatory Phase by >150 member organisations in 32 countries Steven Krauwer CLARIN-NL Launch 27-05-2009 4
Overall plan for CLARIN-EU
Preparatory phase (2008-2010) • Put everything in place before we start (design, governance, funding, …) ( and build a prototype ) Construction phase (2011-2015): • Build the infrastructure and populate it with tools and resources ( and start using it ) Exploitation phase (2016 …): • CLARIN in full service Steven Krauwer CLARIN-NL Launch 27-05-2009 5
Division of labour
In the preparatory phase: – CLARIN-EU takes care of overall design – CLARIN-National takes care of interests of own language(s) and research communities – But we are not supposed to build the infrastructure yet!
In the construction phase: – Each country builds its own part of the generic CLARIN infrastructure (and pays for it) – Each country populates CLARIN with resources and services (and pays for it) – And we hope for some funding from the EC to contribute to the general costs Steven Krauwer CLARIN-NL Launch 27-05-2009 6
Why should NL participate at all
• access for HSS scholars to accumulated knowledge and expertise and to digital resources and tools all over Europe • the achievements of the NL research community, and Dutch language and culture will be and remain visible and accessible all over Europe • if we don’t take care of the interests of “our” languages and our research communities in CLARIN no one else will do it for us Steven Krauwer CLARIN-NL Launch 27-05-2009 7
Priority activities for now (1)
• Start building the national component of the generic federation and service infrastructure, connected to the European CLARIN infrastructure (identifying data and service centres, centres of expertise, and connecting them) • Start populating the infrastructure with existing resources and tools by conversion and encapsulation so that they comply with CLARIN supported standards for representation and interoperability Steven Krauwer CLARIN-NL Launch 27-05-2009 8
Priority activities for now (2)
• Start planning and (if possible) building new resources and tools according to own priorities following from existing and future research programmes for the social sciences and humanities • Start reaching out to the national humanities and social sciences research communities in order to – identify their needs and requirements – familiarize them with the benefits and innovative potential of the use of human language technologies in their work – encourage them to use CLARIN to cross national and discipline borderlines Steven Krauwer CLARIN-NL Launch 27-05-2009 9
Where we stand
• We are now almost halfway the preparatory phase (19 months to go) • Just a quick look at some aspects: – Financial – Technical infrastructure – User requirements – Language resources and tools – Dissemination – IPR and legal issues – Funding and governance Steven Krauwer CLARIN-NL Launch 27-05-2009 10
Financial (1)
Budget prep phase: • 4.1 M€ from EC (“CLARIN-EU”) • at this moment in addition ca 8 M€ from countries (“CLARIN-national”) • and some 7.3 M€ for parallel activities Overall budget estimate until 2020: ca 146 M € • to be paid by participating countries • looks like a lot of money • but (compare with other research infrastructures!): – < 12 M€ per year – < 0.5 M€ per country per year – < ?? M€ per language per year Steven Krauwer CLARIN-NL Launch 27-05-2009 11
Financial (2)
National contributions: • NL only country with commitments for preparatory phase, construction phase and start of exploitation phase (9 M €) • 16 countries have made commitments to CLARIN preparatory phase (26 K € - 3.1 M€) • 19 countries have launched parallel projects that might feed into CLARIN (11 K € - 2.5 M€) • No supported activity at all in 3 countries (out of 23) Steven Krauwer CLARIN-NL Launch 27-05-2009 12
Technical infrastructure
Done: – Definition of centre types – Requirements for the federation – Persistent identifiers – Registry requirements – Web services and workflow requirements To do: – Take design decisions – Select initial centres – Prototype implementation – Cost estimates for next phase Steven Krauwer CLARIN-NL Launch 27-05-2009 13
User requirements
Done • Survey (ongoing) • Overview of humanities projects (ongoing) • Call for humanities projects To do • Execution of humanities projects • Continuation of surveys and impact studies • Strategic plan for supporting SSH research Steven Krauwer CLARIN-NL Launch 27-05-2009 14
Language resources and tools (1)
Done • Survey and taxonomy • Interoperability requirements • Analysis of current LRT coverage • Criteria and priorities for integration • Usage scenarios and interoperability case studies 15 Steven Krauwer CLARIN-NL Launch 27-05-2009
Language resources and tools (2)
To do: • Continuation of survey & taxonomy • Quality criteria • BLARK description for the humanities and analysis of what exists per language • LRT workflows • Interoperability standards (not 1, not new, but based on set of supported existing standards) • Validation of standards and prototype • Integration of existing LRT • Plan to fill LRT gaps Steven Krauwer CLARIN-NL Launch 27-05-2009 16
Dissemination
Done: – First version of website: www.clarin.eu
– Issues 1-5 of the newsletter published: www.clarin.eu/newsletter – Design of helpdesk and registry prototype To do: – Enhancing the website – Continuation of quarterly newsletters – Plan for helpdesk and advice infrastructure for next phases Steven Krauwer CLARIN-NL Launch 27-05-2009 17
IPR and legal issues
Done (and still ongoing): – Collecting info on existing licenses (models, problems) – Collecting info on existing legislation – Model for authorization and authentication To do: – Small set of licensing templates – Collaboration with other, related services – Federation agreements Steven Krauwer CLARIN-NL Launch 27-05-2009 18
Governance and funding
Done: • Requirements and best practice for transnational coordination and governance To do: • Provide cost estimates to funding agencies • Agree (with funding agencies) on governance model and implement it • Agree (with funding agencies) on sustainable funding for the next phases • Design and implement operational structure Steven Krauwer CLARIN-NL Launch 27-05-2009 19
Concluding remarks
• First 17 months mostly spent on collecting information and requirements and creating a common ‘conceptual framework’, now time for convergence and firm decisions •
Less than 19 months
to achieve this, because – governments take far more time for funding decisions than anticipated – national activities are already ongoing, need for standards for representation and interoperability NOW • Target: all the important decisions in place by the end of 2009 / early 2010 to ensure continuity after 2010!
• CLARIN-NL can play a strong role because we are the only ones that are guaranteed to exist as a funded activity on Jan 1 st 2011!
Steven Krauwer CLARIN-NL Launch 27-05-2009 20