Building a Service Centre for Mass-Digitisation of Natural History Collections GBIF European Nodes Annual Meeting Joensuu, Finland, 6-8 March 2013 Hannu Saarenmaa & Team.
Download
Report
Transcript Building a Service Centre for Mass-Digitisation of Natural History Collections GBIF European Nodes Annual Meeting Joensuu, Finland, 6-8 March 2013 Hannu Saarenmaa & Team.
Building a Service Centre
for Mass-Digitisation of
Natural History Collections
GBIF European Nodes Annual Meeting
Joensuu, Finland, 6-8 March 2013
Hannu Saarenmaa
& Team
Funding and organisation
• Started in 2010
• Digitisation Centre of the Finnish Museum of Natural History and
the University of Eastern Finland
• Located in Joensuu Science Park, 440 km north-east of Helsinki
• Staff of 11
• Ten trainees
• Funded through European Structural Funds for 1.7 M€ for 2010-13
• Other EU projects such as Biodiversity Virtual e-Laboratory
www.biovel.eu, EU BON / GEO BON
• Operates the GBIF National Node on behalf of the FMNH
6.11.2015
2
It has been a learning exercise…
• Preparatory project 2009-10 to build momentum
– Funding acquired
• European Social Fund project 2010-13 to accumulate the know-how
– People trained, digitisation process created, services defined
• European Regional Development Fund project 2010-13 to build the
infrastructure
– Hardware, software, room space, automation
• TEKES – the Finnish Funding Agency for Technology and Innovation –
project 2013-16 to commercialise the results – proposal pending
– Logistics, automation, customer relations, spin-off company
• Really, we are in the business of building the markets for
digitisation services…
6.11.2015
3
Products and services
• Efficient, distributed digitisation process and workflow
• Automated mass-digitisation
• Research on advanced digitisation methods – “Living lab”
• Training programme
• Knowledge base
• Repository services
• Logistics / receiving services
6.11.2015
4
Software solutions used
6.11.2015
5
Receiving
Distributed
digitisation
process
Tagging
Customer
Delivering
Physical
specimen
Specimen
repository
XML
document
and image
pool
Archiving
Imaging
• Red = physical specimen
handling
• Green = data only
Georeferencing
Filtering
Flow control
• Each step in
process can
be distributed
to the best
available
specialist or
automatic
service
Data entry
Validation
Publishing
GIS (known
locations)
Morphbank
GBIF
XML
database
KDK PAS
Digitisation service can be customised
• A1: All steps in the process
• A2: All steps plus deposition of collection in Digitarium’s storage
• B1: Minimal set of process steps: Tagging, imaging
• B2: As above plus minimal data entry
• C: Custom process
More details of the service at http://sites.digitarium.fi/wiki/?w=en
6.11.2015
7
Consequence of outsourcing: Transportation of
collections
• Digitarium is a service centre for outsourcing – no local collections
• Logistical challenges
– All material must be packed, fetched and, optionally, returned
– Turnaround time is an important question
– Prioritisation of digitisation – many aspects
• Desinfestation by freezing upon arrival and/or departure
• Receiving service for new donations is a popular function. Steps:
1. Retrieve collection from donor
2. Digitise the collection
3. Unpack the specimens in units fit for the customer museum
4. Delivery to the museum
5. Maintain virtually the original collection
6.11.2015
8
Costs
• In the manual process:
– 3,99 € / imaged specimen
– 4,52 € / data entry
– Sum: 9,60 € / specimen full digitisation
• Based on 40 ,000 images and 10,000 fully digitised specimens
• Conclusion: Automation must be introduced and costs brought
down.
6.11.2015
9
Towards industrial scale mass-digitisation
• Minimise manual work
• Goal: 1000 tagged and imaged samples/day/ 1-3 workers
• Explored in several fronts:
– Machine vision and pattern recognition
– Conveyor-belt system and automated imaging
– Transcribing on-demand, not until needed
6.11.2015
10
Automated
capture of
label data
• We have explored
possibility to take
several pictures of
labels from different
angles and join them
using image pattern
recognition.
• An algorithm has been
developed but is not
yet generic
• More work is still
needed
11
A conveyor belt system with automated imaging
•
Asynchronous
process:
1.
Samples are placed
on belt by a person
2.
Imaging by robot
3.
Pick-up by robot?
Safe?
4.
Data entry later
using crowdsourcing and
remote experts
Tero Mononen
6.11.2015
12
Automatic digitisation line
•
In production.
•
World’s best: 1000 images in a day *
450 dpi / 2 persons.
•
Works for herbaria sheets, but a
smaller system for insects in the works
Digitising the worldwide Pteridophyta collection of the FMNH
• About 20000 samples retrieved from Helsinki 6-8
Feb 2013. Packed and catalogued by 3
Digitarium staff. Professional removal company.
• Protocol of handling the specimens agreed with
the Botanical Museum.
• Quarantee and deep-freezing upon arrival.
• Monitoring of progress and quality control on
the web.
• On the average about 448 samples/day imaged,
and transcribed for taxon, continent, and a URIbased GUID of the form H.1234567
• Personnel cost average 1.10 € / sample.
Efficiency is being scaled up.
14
Hardware deployment is
a possibility
• The mass-digitisation solution
can be deployed as hardware
and support package
• Optionally, it can also be
operated on a remote site by
Digitarium staff
• These products and services
will be developed in the
TEKES project
6.11.2015
15
Conclusions
• Distributed workflow made possible by full imaging.
• Manual work is not going to be sufficient. Automation for massdigitisation is promising, cuts cost by a factor of ten.
• Joint research and development of digitisation technology is
needed, in particular for insect collections.
• Service centre for out-sourcing digitisation introduces unique
challenges and possibilities.
• Market for digitisation services is emerging, but needs stimulation.
• Finding funding channels for digitisation requires creativity.
6.11.2015
16
Thanks!
Also see:
www.digitarium.fi
Zookeys 2012 special issue on Digitisation
Acknowledgements
• Finnish Museum of Natural History
• City of Joensuu
• European Social Fund and European Regional Development Fund
• UEF / SIB Labs
• Morphbank team at Florida State U, led by Greg Riccardi
• Dedicated staff and trainees of Digitarium!
6.11.2015
17