Building a Service Centre for Mass-Digitisation of Natural History Collections GBIF European Nodes Annual Meeting Joensuu, Finland, 6-8 March 2013 Hannu Saarenmaa & Team.
Download ReportTranscript Building a Service Centre for Mass-Digitisation of Natural History Collections GBIF European Nodes Annual Meeting Joensuu, Finland, 6-8 March 2013 Hannu Saarenmaa & Team.
Building a Service Centre for Mass-Digitisation of Natural History Collections GBIF European Nodes Annual Meeting Joensuu, Finland, 6-8 March 2013 Hannu Saarenmaa & Team Funding and organisation • Started in 2010 • Digitisation Centre of the Finnish Museum of Natural History and the University of Eastern Finland • Located in Joensuu Science Park, 440 km north-east of Helsinki • Staff of 11 • Ten trainees • Funded through European Structural Funds for 1.7 M€ for 2010-13 • Other EU projects such as Biodiversity Virtual e-Laboratory www.biovel.eu, EU BON / GEO BON • Operates the GBIF National Node on behalf of the FMNH 6.11.2015 2 It has been a learning exercise… • Preparatory project 2009-10 to build momentum – Funding acquired • European Social Fund project 2010-13 to accumulate the know-how – People trained, digitisation process created, services defined • European Regional Development Fund project 2010-13 to build the infrastructure – Hardware, software, room space, automation • TEKES – the Finnish Funding Agency for Technology and Innovation – project 2013-16 to commercialise the results – proposal pending – Logistics, automation, customer relations, spin-off company • Really, we are in the business of building the markets for digitisation services… 6.11.2015 3 Products and services • Efficient, distributed digitisation process and workflow • Automated mass-digitisation • Research on advanced digitisation methods – “Living lab” • Training programme • Knowledge base • Repository services • Logistics / receiving services 6.11.2015 4 Software solutions used 6.11.2015 5 Receiving Distributed digitisation process Tagging Customer Delivering Physical specimen Specimen repository XML document and image pool Archiving Imaging • Red = physical specimen handling • Green = data only Georeferencing Filtering Flow control • Each step in process can be distributed to the best available specialist or automatic service Data entry Validation Publishing GIS (known locations) Morphbank GBIF XML database KDK PAS Digitisation service can be customised • A1: All steps in the process • A2: All steps plus deposition of collection in Digitarium’s storage • B1: Minimal set of process steps: Tagging, imaging • B2: As above plus minimal data entry • C: Custom process More details of the service at http://sites.digitarium.fi/wiki/?w=en 6.11.2015 7 Consequence of outsourcing: Transportation of collections • Digitarium is a service centre for outsourcing – no local collections • Logistical challenges – All material must be packed, fetched and, optionally, returned – Turnaround time is an important question – Prioritisation of digitisation – many aspects • Desinfestation by freezing upon arrival and/or departure • Receiving service for new donations is a popular function. Steps: 1. Retrieve collection from donor 2. Digitise the collection 3. Unpack the specimens in units fit for the customer museum 4. Delivery to the museum 5. Maintain virtually the original collection 6.11.2015 8 Costs • In the manual process: – 3,99 € / imaged specimen – 4,52 € / data entry – Sum: 9,60 € / specimen full digitisation • Based on 40 ,000 images and 10,000 fully digitised specimens • Conclusion: Automation must be introduced and costs brought down. 6.11.2015 9 Towards industrial scale mass-digitisation • Minimise manual work • Goal: 1000 tagged and imaged samples/day/ 1-3 workers • Explored in several fronts: – Machine vision and pattern recognition – Conveyor-belt system and automated imaging – Transcribing on-demand, not until needed 6.11.2015 10 Automated capture of label data • We have explored possibility to take several pictures of labels from different angles and join them using image pattern recognition. • An algorithm has been developed but is not yet generic • More work is still needed 11 A conveyor belt system with automated imaging • Asynchronous process: 1. Samples are placed on belt by a person 2. Imaging by robot 3. Pick-up by robot? Safe? 4. Data entry later using crowdsourcing and remote experts Tero Mononen 6.11.2015 12 Automatic digitisation line • In production. • World’s best: 1000 images in a day * 450 dpi / 2 persons. • Works for herbaria sheets, but a smaller system for insects in the works Digitising the worldwide Pteridophyta collection of the FMNH • About 20000 samples retrieved from Helsinki 6-8 Feb 2013. Packed and catalogued by 3 Digitarium staff. Professional removal company. • Protocol of handling the specimens agreed with the Botanical Museum. • Quarantee and deep-freezing upon arrival. • Monitoring of progress and quality control on the web. • On the average about 448 samples/day imaged, and transcribed for taxon, continent, and a URIbased GUID of the form H.1234567 • Personnel cost average 1.10 € / sample. Efficiency is being scaled up. 14 Hardware deployment is a possibility • The mass-digitisation solution can be deployed as hardware and support package • Optionally, it can also be operated on a remote site by Digitarium staff • These products and services will be developed in the TEKES project 6.11.2015 15 Conclusions • Distributed workflow made possible by full imaging. • Manual work is not going to be sufficient. Automation for massdigitisation is promising, cuts cost by a factor of ten. • Joint research and development of digitisation technology is needed, in particular for insect collections. • Service centre for out-sourcing digitisation introduces unique challenges and possibilities. • Market for digitisation services is emerging, but needs stimulation. • Finding funding channels for digitisation requires creativity. 6.11.2015 16 Thanks! Also see: www.digitarium.fi Zookeys 2012 special issue on Digitisation Acknowledgements • Finnish Museum of Natural History • City of Joensuu • European Social Fund and European Regional Development Fund • UEF / SIB Labs • Morphbank team at Florida State U, led by Greg Riccardi • Dedicated staff and trainees of Digitarium! 6.11.2015 17