Presentation Title - Indiana University Northwest

Download Report

Transcript Presentation Title - Indiana University Northwest

Research Data Storage
Resources at IU
Anurag Shankar
University Information Technology Services
Indiana University
March 2, 2012
University Information Technology Services
July 7, 2015
Outline
• Data Storage Use Cases
- Types of research data and the storage they require
• Data Storage Services
- Where/how to store your data
• Storage of HIPAA Regulated Data
- Storing sensitive data
• Real World Examples
- How people are using the storage services
University Information Technology Services
7/7/2015
Data Types and Desired Storage
Characteristics
Type of Data
Volume
Throughput
Access
Speed
Criticality
Data being
acquired
MB -TB
MB/second
Fast
High (not easy to
reproduce)
Data in analysis
MB -TB
MB-GB/s
Very fast
Low - High
(reproducible)
Data being
published/shared
MB - GB
KB-MB/s
Moderate
Low
(reproducible)
Archival data
MB - PB
MB-GB/s
Slow
High if not also
stored elsewhere
University Information Technology Services
7/7/2015
Research Data Storage Services
•
•
•
•
•
•
Data Capacitor
Research File System (RFS)
Scholarly Data Archive (SDA)
Research Database Complex (RDC)
Alfresco Share
REDCap
• Slashtmp
University Information Technology Services
7/7/2015
How IU’s Research Data Storage
Services Fit Data Types
Type of Data
Resource/Service
Space
Available
Eligibility
Duration
Data being
acquired
RFS, Data Capacitor
GB – 100s
of TB
IU
Days –
Months
Data in analysis
RFS, Data Capacitor
on Big Red/Quarry
MB - TB
IU
Days Months
Data being
published/shared
Server disk, Alfresco
Share, REDCap,
Slashtmp
MB - GB
IU, PU,
ND,
outside
users
Months Years
Archival data
SDA
GB - PB
IU
Years
University Information Technology Services
7/7/2015
Data Storage Services by Use
Use
Service
Access
Backed
Up?
High Performance
Storage
Data Capacitor
File System on Big
Red/Quarry
No
Storage for In-Work,
Data
RFS
Mapped Drive, Web,
SFTP, OpenAFS client
Yes
Structured Data
Storage
RDC (Oracle, MySQL)
Applications
Yes
Shared Document
Storage
Alfresco Share
Web, WebDAV
Yes
Shared Storage for
HIPAA data
REDCap
Web
Yes
Archival Storage
SDA
Web, Mapped Drive,
SFTP, Parallel FTP
No
University Information Technology Services
7/7/2015
Service
Targeted For
Not Good For
RFS
Storing relatively small files that
are updated and/or accessed
frequently, need group access
Storing database files, backups
SDA
Storing large files or small files
aggregated (zipped) into large
files, long-term storage
Storing small files, files requiring
frequent/quick access, in work
data
RDC
Relational databases
Storing unstructured data
Alfresco Share
Sharing Word, Excel, PDF, text
files
Storing data
REDCap
Storing & sharing HIPAA data
General storage
Data Capacitor
Temporary data being read or
written on Big Red/Quarry
requiring the fastest speeds
available
General storage
Slashtmp
Temporary space to exchange
files too large as email
attachments
General storage
University Information Technology Services
7/7/2015
Storage Resource/Service Details
Service
Technology
Capacity
RFS
OpenAFS
60TB
SDA
High Performance
Storage System
(HPSS)
15 PB tape, 150TB
disk
RDC
Oracle, MySQL
200TB
Alfresco Share
Alfresco Share
1TB
REDCap
REDCap
1TB
Data Capacitor
Lustre
360TB
University Information Technology Services
7/7/2015
Storage Resource/Service Details
Service
Default Quota
Account Request
RFS
100GB
http://itaccounts.iu.edu
SDA
None
http://itaccounts.iu.edu
RDC
10GB
http://itaccounts.iu.edu
Alfresco Share
None
http://www.indianactsi.org/alfrescorequ
est
REDCap
None
http://www.indianactsi.org/redcapacr
Data Capacitor
None
Big Red/Quarry account
Slashtmp
4GB
No Slashtmp account needed, only IU
login to use
University Information Technology Services
7/7/2015
Storage Resource/Service Details
Service
Web Access URL
More Help at
RFS
http://rfsweb.iu.edu
http://kb.iu.edu/aroz.html
SDA
http://www.sdarchive.iu.edu
http://kb.iu.edu/aiyi.html
RDC
Application specific
http://kb.iu.edu/awmv.html
Alfresco Share
http://alfresco.uits.iu.edu
http://www.indianactsi.org/kb/a
lfresco
REDCap
http://redcap.uits.iu.edu
http://www.indianactsi.org/kb/r
edcap
Data Capacitor
N/A (accessed from the Unix
command line)
http://kb.iu.edu/data/avvh.html
Slashtmp
http://slashtmp.iu.edu
http://kb.iu.edu/data/angt.html
University Information Technology Services
7/7/2015
Storage of HIPAA Regulated Data
• HIPAA (Health Insurance Portability and
Accountability Act) Security Rule
regulates electronic protected health
information (ePHI), i.e. identifiable patient
information
• It mandates physical, administrative, and
technical controls for storing ePHI
University Information Technology Services
7/7/2015
HIPAA Data …
• To support IU School of Medicine (IUSM)
researchers, RT initiated a project in 2008 to
align its systems and services with HIPAA
• The project was overseen by a committee
consisting of IU’s compliance office, IT security
and policy offices, the IUSM CIO, faculty, and
IT staff
• Alignment included gap and risk analyses by
an outside expert, filling gaps, and the creation
of an ongoing risk management plan
University Information Technology Services
7/7/2015
HIPAA Data …
• In 2009, the compliance office blessed RT
being capable of handling ePHI
• As of Dec. 31, 2011, this has resulted in the
following (starting from zero):
•
•
•
•
•
•
•
Number of biomedical user accounts on RT systems : 2800
Volume of biomedical data stored on RT systems : 500TB
Use of computing cycles on RT supercomputers : 1 million SUs
Number of biomedical databases : 450
Number of new RT services developed specifically for biomedical
researchers : 10
Number of major NIH grants we are written into : 5
Number of FTEs these grants have funded : 6
University Information Technology Services
7/7/2015
Real World Examples
• A research group in the IUSM Dept. of
Radiology was running out of space in
the department to archive digital X-ray
images (100-200MB/image). They were
able to use the SDA to store tens of
thousands of these images and now rely
solely on SDA as their image archive.
They have over 10TB of data currently
stored.
University Information Technology Services
7/7/2015
Real World Examples …
• A research group needed to use an
application to view a certain collection of
data at the same time. They stored it in a
group area in RFS, mapped it to drive R:
on their individual Windows desktops,
and accessed it simultaneously from
various campus location as well as
home/while traveling (using VPN).
University Information Technology Services
7/7/2015
Real World Examples …
• The state of Indiana collected geospatial
data when they flew the state in 2005.
Because of its size, no one in the state
had the capacity to make these data
available to the public. The SDA was
used to store all 20TB of orthoquads and
serves them currently over the web (see
http://gis.iu.edu).
University Information Technology Services
7/7/2015
Real World Examples …
• A research in the School of Library
Information Sciences at IUB wanted to
explore relationships between fields of
science in scientific journal publications.
She was able to use the RDC to host
nearly a TB of data in Oracle and do her
relational work.
University Information Technology Services
7/7/2015
Real World Examples …
• An IU researcher had an urgent need for a
shared space to store work documents for
collaborators within and outside IU. He could
not wait for affiliate accounts to be created for
external users. He was able to use Alfresco
Share to set up multiple collaboration “sites” for
the project within the hour and invite
collaborators to these sites. Each space
provides not only a shared document library, but
also shared wiki, blogs, etc.
University Information Technology Services
7/7/2015
Real World Examples …
• The Division of Biostatistics at IUPUI wanted
to help a clinical researcher at IUSM migrate
data in spreadsheets to a central, webaccessible database, and allow her to share
the database with collaborators at University
of Maryland who needed to add patient data
(ePHI) they were acquiring. They used
REDCap to accomplish all this AND export
the data in a format ready for the SAS
statistics package to analyze.
University Information Technology Services
Contact
Your single point of contact for all things RT:
Anurag Shankar
[email protected]
812-325-8629
Local Contact:
Carol Wood
[email protected]
219-980-7758
7/7/2015