Introduction to supportDM - University of East London

Download Report

Transcript Introduction to supportDM - University of East London

Data Management for
Geoinformatics
A short course on good data management for taught
postgraduate students in geoinformatics and related data
sciences.
John Murtagh, UEL
Data
Management
What is research data
management?
• Looking after data throughout the data
lifecycle (from conception to destruction)
• Good documentation and record-keeping
• Transfer of responsibility after project ends
• Keeping safe and possibly confidential
• Access, preservation and re-use
• Destruction
• “It’s just good research”
Preparing
your data
The following slides are taken from the Research Data MANTRA online
course by Data Library and EDINA, University of Edinburgh & is licensed
under a Creative Commons Attribution 2.5 UK: Scotland License.
The benefits of consistent data file labelling:
• Data
files are distinguishable
each
Research
data filesfrom
and
other
within their
containing
folders
need
to befolder
labelled
and
organised
inconfusion
a
• Data
file naming
prevents
when
File
labelling
are both identifiable and
systematic
soonthat
they
multiple
people areway
working
shared
files
• Data files are easier to locate and browse
accessible for current and
• Data files can be retrieved not only by the
future users.
creator but by other users
• Data files can be sorted in logical
sequence
• Data files are not accidentally overwritten
or deleted
• Different versions of data files can be
identified
• If data files are moved to other storage
platform their names will retain useful
context
Consistency -- important
choose
1.3.Organisation
a2.naming
convention
and
for
future
access
Context
- thisand
could
There
are
three
main
ensure
that
the
rules
are
retrieval
include content specific
criteria
to
consider
followed
systematically by
or descriptive
regarding
the
naming
and
always
including
the same
information
independent
labelling
of
research
data
information
of where the(such
data as
is date
files,
namely:
and
time) in the same
stored
order (e.g. YYYYMMDD)
The following video is from a talk given by Dave Anderson from the
National Oceanic and Atmospheric Administration's (NOAA) National
Climatic Data Center at the Data Management workshop sponsored by
the Earth Science Information Partners (ESIP).
It highlights some of the research data organisation issues such as
proprietary formats, cryptic labelling and vague filenames.
• Windows:
Ant Renamer (http://www.antp.be/software/renamer)
RenameIT (http://www.bulkrenameutility.co.uk/)
If you need to rename data
file names in bulk there are
a number of tools
available. Here are some
examples for different
operating systems:
• Mac:
Renamer4Mac (http://renamer4mac.com/)
Name Changer
(http://web.mac.com/mickeyroberson/MRR_Software/NameChanger.html)
• Linux:
GNOME Commander (http://www.nongnu.org/gcmd/)
GPRename (http://gprename.sourceforge.net/)
• Unix
The use of the grep command to search for regular expressions
Backing up
& storing
your data
Data loss will happen to you
• Dropping your laptop
• Hard drive failures
• are updates
•
•
•
•
• Research trends
(follow the money
consequences)
• Overwriting
Obsolescence/upgrades
data/versioning
Poorly described data
• File formats
(metadata)
• Media degradation
Theft of equipment
(CDR’s, memory
People move on
sticks, SSD’s)
Slide from Data Management Planning and Storage for Psychology (DMSPpsych)
The University of Sheffield
16/07/2015
Research data loss – read this
article!
December 2012
• The laptop was left by a
graduate student in the backseat
of a car parked outside a
downtown restaurant Someone
broke in to the car and stole the
computer
• Trophic ecologist contained a
vast amount of experimental
data from tracked fish (cost
$50,000 CND)
• “Unfortunately none of the
data had been backed up yet.If
we don’t get this laptop back,
that data is lost forever.”
HOWEVER
You can prevent
For example, original, external
totaland
loss
of (remotely),
your
(locally),
external
and have a policy for maintaining
data
by
backing
up.
regular backups.
It is recommended that you keep at
least 3 copies of your data.
A guide to backing up your
data
Questions to
ask yourself
• How will I back up my data?
• How regularly will backups be made?
• Will all data, or only changed data, will
be backed up?
• (A backup of changed data is known as an
"incremental backup", while a backup of all
data is known as a "full backup").
• How often full and incremental
backups will be made?
• How long will backups be stored?
• How much hard drive space or number
of Digital Video Discs (DVDs) will I
require to maintain this backup
schedule?
• If the data is sensitive, how will they be
secured and (possibly) destroyed?
• What backup services are available
that meet these needs and, if none,
what will be done about it?
• Who will be responsible for ensuring
backups are available?
In the following video Professor Lynn Jamieson from the
University of Edinburgh talks about the importance of
keeping regular backups of research data.
Storing it in
the Cloud
“Cloud storage is a model of
networked enterprise
storage where data is stored not
only in the user's computer, but in
virtualized pools of storage which
are generally hosted by third
parties, too.”
http://en.wikipedia.org/wiki/Cloud_storage
Cloud services
Fortunately….26 Online
Backup Services have been
reviewed
of Hertfordshire
has
ItThe
hasUniversity
also analysed
the pros and
cons of
reviewed
popular
cloud
their
data the
andmost
security
policies
asstorage
well as
services….
their
costs and access.
You can read it here:
http://sitem.herts.ac.uk/rdm/files/Cloud_Storage_Review_v.1.2.pdf
Cloud Storage:
Advantages and
Disadvantages
The following slide is taken from the Research Data MANTRA online
course by Data Library and EDINA, University of Edinburgh & is licensed
under a Creative Commons Attribution 2.5 UK: Scotland License.
Advantages
Disadvantages
No user intervention is required (change tapes,
Restoration of data may be slow (dependent upon
label CDs, perform manual tasks).
network bandwidth).
Remote backup maintains data offsite.
Stored data may not be entirely private (thus preencryption).
Most provide versioning and encryption.
They
areprovider
multi-platform.
Service
may go out of business.
Protracted intellectual property rights/copyright/data
protection licences.
Access
control
Data security is the means of
ensuring that research data is
kept
safe
from
corruption
and
• Accidental or malicious damage/modification to
data.access is suitably
that
• Theft of valuable data.
controlled.
• Breach of confidentiality agreements & privacy
laws.
• Premature release of data, which can void
intellectual property claims.
• Release before data have been checked for
accuracy and authenticity.
It is important to consider the
security of your data to prevent:
Access control
• How will you manage access arrangements and
data security?
• How will you enforce permissions, restrictions and
embargoes?
• Other security issues such as sensitive data, offnetwork storage, storage on mobile devices
(laptops, smartphones, flash drives, etc), policy on
making copies of data, etc. where relevant.
You need to consider the
following questions for
securing your research data
Encryption
There are a number of ways to encrypt your
data where it is stored. There are many
software programs which allow you to do
this easily and are also for free.
See the following Wikipedia page: Comparison
of disk encryption software
Encryption - TrueCrypt
One of the most popular encryption tools is
TrueCrypt. You can see why…
Other sessions as part of Data Management in
Geoinformatics:
•Data Collection
•Data Integration
•Data Sharing
Data Management for Geoinformatics by John Murtagh as part of the
Jisc funded project TraD (University of East London is licensed under
a Creative Commons Attribution Share Alike Licence