Selling a Product or Service - HDF-EOS

Download Report

Transcript Selling a Product or Service - HDF-EOS

HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop XI
November 7, 2007
7/18/2015
The HDF Group
1
Outline
• What is The HDF Group?
• HDF Software Update
• Other Activities of Interest
7/18/2015
The HDF Group
2
What is
The HDF Group
(THG)?
7/18/2015
The HDF Group
3
THG, the Company
•
•
•
•
Spun-off from University of Illinois July 2006
Non-profit
20+ scientific, technology, professional staff
Intellectual property:
 THG owns HDF4 and HDF5
 HDF formats and libraries to remain open
 Libraries have BSD-type license
• Continue ties to U of I and NCSA
7/18/2015
The HDF Group
4
The mission of The HDF Group
is to ensure long-term
accessibility of HDF data through
sustainable development and
support of HDF technologies.
7/18/2015
The HDF Group
5
Goals
• Maintain, evolve HDF for sponsors and
communities that depend on it
• Do consulting, training, tuning, development,
research
• Sustain The HDF Group for long term to assure
data access over time
7/18/2015
The HDF Group
6
THG Services
• Helpdesk and Mailing Lists
 Available to all users as a first level of support
• Standard Support
 Rapid issue resolution support
• Consulting
 Needs assessment, troubleshooting, design reviews, etc.
• Enterprise Support
 Coordinating HDF activities across divisions
• Special Projects
 Adapting customer applications to HDF
 New features and tools, with changes normally incorporated into
open source product
 Research and Development
• Training
 Tutorials and hands-on practical experience
7/18/2015
The HDF Group
7
HDF Software Update
7/18/2015
The HDF Group
8
HDF4 update
7/18/2015
The HDF Group
9
HDF 4.2r2
Released in October
7/18/2015
The HDF Group
10
New features and changes
• New APIs added to the SD and GR interfaces:
 SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports
maximum allowable number of files
 SDget_numopenfiles:Gets number of open files
 SDgetcompinfo, GRgetcompinfo: Gets compression info
 SDgetfilename: Retrieves name of file, given its ID
 SDgetnamelen: Retrieves length of object name, given its ID
• SZIP compression
 Now can be invoked by Fortran API
 Now available for raster images via GR interface
• SDS, Vgroup names no longer limited to 64 characters
7/18/2015
The HDF Group
11
New features and changes
• HDF configuration changes
 --enable-netcdf flag introduced
 Autotools versions updated
• Many bug fixes made to hrepack and hdiff
• See RELEASE.txt for a full list of changes
7/18/2015
The HDF Group
12
Platforms to drop/add next release
• Drop
 Windows XP with
MSVC++ 6.0
 Linux 2.4
 IRIX64 6.5
 SunOS 5.8, 5.9
7/18/2015
• Add
 Windows 64-bit (32 and
64-bit binaries)
The HDF Group
13
Platforms tested
• Systems
• Compilers








AIX 5.3 (32-bit, 64-bit)
Free BSD 6.2 (32-bit, 64-bit)*
HP-UX B.11.23 (32-bit, 64-bit)*
IRIX 64 v6.5 (32-bit, 64-bit)
Linux 2.4, 2.6*
Linux ia64
Linux x86_64
Sun OS 5.8, 5.10* (32-bit, 64bit)
 SunOS 5.10 on Intel
 Windows XP, Vista
 Mac OS X Intel*









IBM C and Fortran compilers
GNU gcc 3.4* and GNU Fortran
HPUX C and Fortran compilers
GNU gcc 3.4 and 4.*
Intel C and Fortran versions 9.1 and
10.00
SUN WorkShop C and Fortran
Visual Studio .NET and 2005 and
Intel Fortran
Visual Studio 2005 (no fortran)
GNU gcc 4.0.1 with gfortran and
g95
* New platforms
For detailed info, see RELEASE.txt
7/18/2015
The HDF Group
14
HDF5 Update
7/18/2015
The HDF Group
15
HDF5 1.6.6
7/18/2015
The HDF Group
16
HDF5 1.6.6 release
• Primarily a bug-fix release
• Some tool changes (see later slide)
• http://hdfgroup.org/HDF5/release/obtain5.html
7/18/2015
The HDF Group
17
Platforms dropped
• Operating systems




• Compilers
 PGI 6.5-*
AIX 5.3
Solaris 2.8 and 2.9
OSF1
Windows XP with MSVC++ 6.0
http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
7/18/2015
The HDF Group
18
Platforms added
• Systems
 Alpha Open VMS
 MAC OSX 10.4 (Intel)
 Solaris 2.* on Intel
 Cray XT3
 Windows 64-bit (32 and 64bit)
 BG/L
7/18/2015
• Compilers
The HDF Group




PGI V. 7.*
Intel 10.*
MPICH 1.2.7
MPICH2
19
HDF5 1.8
7/18/2015
The HDF Group
20
HDF5 1.8 new library features
• Datatype and dataspace features






7/18/2015
Create datatype from text description
Integer to float conversions during I/O
Compact storage for N-bit datatypes
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter
The HDF Group
21
HDF5 1.8 – new library features
• Group improvements




Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation
• Link improvements
 Unicode names allowed
 External links – to objects in another file
 User defined links – create own kinds of links
7/18/2015
The HDF Group
22
HDF5 1.8 – new library features
• Attribute improvements
 Improved storage for large number of attributes
 Iterate or look up by creation order
 Unicode names allowed
• Support for Unicode UTF-8 character set
• Shared header information, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Better UNIX/Linux portability
7/18/2015
The HDF Group
23
HDF5 1.8 – new APIs
•
•
•
•
New extendible error-handling API
New APIs to copy objects between files quickly
Dimension scale model and API
“HDFpacket” API, to read/write packets efficiently
7/18/2015
The HDF Group
24
HDF5 1.8 – Backward and
Forward Compatibility
7/18/2015
The HDF Group
25
HDF5 1.8 and 1.6
• Differences between 1.8 and 1.6.x
 Some file format changes
 Several new routines added
 Old APIs deprecated – may be removed in later
release
• Consequences
 Applications requiring 1.8 format changes will
generate objects that cannot be read by 1.6 library
 To exploit 1.8 changes, applications need to be
rewritten
7/18/2015
The HDF Group
26
“The art of progress is to
preserve order amid change, and
to preserve change amid order.”
Alfred North Whitehead
7/18/2015
The HDF Group
27
Principle of
Maximum File Format Compatibility
Unless instructed otherwise, the HDF5 library will write objects
using the earliest version of the format possible for describing
the information.
Assures older library versions are forward compatible whenever
possible:
 Objects in new files can be read with old versions of the library,
if the objects are “known” to the old libraries.
 New versions of the library can always read objects in files
written with older versions.
7/18/2015
The HDF Group
28
Command Line Tools
7/18/2015
The HDF Group
32
New features for existing tools
• -V option for all tools
 Prints HDF5 library version number used by tool
• h5repack: -L option
 Use latest version of file format to create objects
• h5dump: dumps groups/attributes in creation or
name order
 -q Q, --sort_by=Q Sort groups and attributes by index Q
 -z Z, --sort_order=Z Sort groups and attributes by order Z
7/18/2015
The HDF Group
33
New command line tools
• h5mkgrp
 Creates new groups and group hierarchies in an HDF5 file
• h5stat
 Provides statistics regarding the file, such as number of
objects per group, sizes of datasets, amount of free space in
file
• h5copy
 Copy object within a file or cross files
• h5check
 Verifies an HDF5 file against the defined HDF5 File Format
Specification
 Completed for 1.6.
 In progress for 1.8
7/18/2015
The HDF Group
34
Tool work in the pipeline
• Export numeric data formatted in several different
ways (such as MS excel, XML, etc)
• Import ASCII data that conforms to certain format
• Use a common text format for h5import and
h5dump
• Support NaN in tools such as h5diff.
Challenges:
 NaN is platform specific
 NaN can have different values for the same
machine
 Checking NaN can be a performance hit
7/18/2015
The HDF Group
35
HDF Java Products
7/18/2015
The HDF Group
36
HDF5 Java is Growing UP
7/18/2015
The HDF Group
37
HDFView changes
• HDFView 2.4 released
• Many new features, such as





Support for compound datatypes of 2D+ arrays
Support for "filtering fill value" in Image Viewer
Effective handling of large 3D images
Support large fonts in GUI components
New autogain algorithm for image Brightness/Contrast
• New platforms
 Mac intel
 Linux 64-bit AMD
 Solaris 64-bit
7/18/2015
The HDF Group
38
Other Java products
• 36 new enhancements and 44 bugs fixed
• Test suite (using junit testing framework)
 Tests all public methods in the object package
 Added “make check” to run the test suite
• Enhanced documentation
 All public methods in the object package are fully
documented
7/18/2015
The HDF Group
39
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView with bug fixes/new features
with HDF5 1.8 release
• Port HDF5-SRB model to HDF5-iRODS model
• Writing capability for HDF5-iRODS model
7/18/2015
The HDF Group
40
Other Activities of Interest
7/18/2015
The HDF Group
41
New THG Website
7/18/2015
The HDF Group
42
New THG Website
7/18/2015
The HDF Group
43
HDF Performance
Framework
7/18/2015
The HDF Group
44
Goals
• A framework for performance regression testing
• A tool for




7/18/2015
Testing on multiple platforms
Testing different versions
Long term regression testing
Assistance in debugging
The HDF Group
45
Solution
HDF5 1.6
HDF5 1.8
cron
A User’s
Benchmark
Database
Performance
Library
www
PHP
Web Server
Graph/Text
7/18/2015
The HDF Group
46
Sample Usage
H5Perf_startTimer(&time);
for(i=0;i<1000 ;i++) {
H5Gcreate(fileid,group_name,(size_t)0));
// Add groups
}
H5Perf_endTimer(&time);
H5Perf_addInstance(db_host, date, time);
00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh
|
178820 | 2007-08-17 21:51:14 | 10000 groups
Timestamp
7/18/2015
| creating 10000 empty groups
Instance Name
The HDF Group
| 1.8.0
| hdfdap |
0.670198 |
Version Platform
Time
47
4384 |
Improved Crash
Survivability
in the HDF5 Library
7/18/2015
The HDF Group
48
Crash Survivability in HDF5
• Problem:
 Data in HDF5 files susceptible to corruption in the
event of an application or system crash.
 Corruption possible if structural metadata is being
written when the crash occurs.
• Initial Objective:
 Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash.
 No guarantee on state of raw data – contains
whatever made it to disk prior to crash.
7/18/2015
The HDF Group
49
Crash Survivability in HDF5
• Approach: Metadata Journaling
 When a piece of metadata is modified and in a
consistent state, make a journal note.
 If the application crashes, a recovery program can
replay the journal by applying in order all metadata
writes until the end of the last completed
transaction written to the journal file.
7/18/2015
The HDF Group
50
Faster HDF5 Data Appends
7/18/2015
The HDF Group
51
Fast Data Appends
• Problem: Metadata operations limit the rate at
which HDF5 can append data to datasets.
• Solution: new data structure for indexing chunks:
 Allows constant time extend, shrink and lookup of
chunks in datasets with single unlimited dimension
 # of metadata I/O operations to append to dataset
is independent of # of chunks
 Allows single-writer/multiple-reader access
• Details at:
http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList
ChunkIndex/SkipListChunkIndex.html
7/18/2015
The HDF Group
52
netCDF-4
7/18/2015
The HDF Group
53
netCDF-4 Project
• Enhanced NetCDF-4 Interface to HDF5
 Combine features of netCDF and HDF5
 Take advantage of their separate strengths
• Collaboration between NCSA, THG, Unidata
• Currently in beta release
• Will be released after HDF5 1.8
7/18/2015
The HDF Group
54
NetCDF-4 Architecture
netCDF-3
applications
netCDF
files
netCDF-4
HDF5 files
netCDF-4
applications
HDF5
applications
netCDF-3
Interface
netCDF-4
Library
HDF5
files
HDF5 Library
• Supports access to netCDF files and HDF5
files created through netCDF-4 interface
7/18/2015
The HDF Group
55
HDF5 OPeNDAP
Project
7/18/2015
The HDF Group
56
Project description
• Investigate integrated DAP-aware HDF5 library
that can provide seamless access to both
local and remote data
• A NASA ROSES NRA project
• See Kent Yang’s talk and poster
7/18/2015
The HDF Group
57
NOAA – Science Data
Stewardship
7/18/2015
The HDF Group
58
NOAA – Science Data Stewardship
• Use HDF5 Archival Information Package (AIP) to
archive HDF EOS2 data
• A collaboration between NSIDC and THG
• See Ruth Duerr and Kent Yang’s poster
7/18/2015
The HDF Group
59
HDF5 and .NET
Framework
7/18/2015
The HDF Group
60
Why .NET?
• The Microsoft .NET framework is used by most
new applications created for Windows.
 Makes it easier to develop applications
 Reduces application vulnerability to security threats
• Supports development in multiple programming
languages, in particular C#.
• Increased level of interest in .NET from users of
HDF5.
7/18/2015
The HDF Group
61
HDF and .NET Status
• Received funding to implement prototype .NET
wrapper API for Windows XP
 Based on HDF5 C API
 Focus on C# binding
 Functionality limited to subset of API routines
• If funded, we would like to move beyond the
prototype to
 Create .NET wrappers for all HDF C functions
 Offer full support for .NET wrappers with HDF5 1.8
7/18/2015
The HDF Group
62
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat
Managing genomic data
7/18/2015
The HDF Group
63
Electron tomography
25-80Å resolution
4k x 4k x 500 images now
8k x 8k x 1k images soon (256 GB)
7/18/2015
The HDF Group
64
Sequencing
• Next Gen Sequencing platforms produce ~1500 X more data than
CE (Sanger)
• A single Next Gen instrument can produce 20 times more data a
single run than a day’s operation of a genome center with 100 CE
instruments
7/18/2015
The HDF Group
65
An email on Sept 21…
“… A little background, we're doing genetic
association studies, these result in large 2-d matrices
(40K x 1M before applying threshholds). Each of
the cells in this matrix has ~10 numerical
statistics (e.g. some sort of pvalue)… ”
40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)
7/18/2015
The HDF Group
66
Product Data
STEP
7/18/2015
The HDF Group
67
Product data
• HDF5 proposed to ISO as binary representation
for product data representation and exchange
• Would be a binary option to the STEP format
• ISO/NWI-CD 10303-026, STEP Part 26
7/18/2015
The HDF Group
68
SQL Server and HDF5
7/18/2015
The HDF Group
69
SQL Server and HDF5
• THG discussing possible project with Microsoft
• Microsoft envisions a dream environment for
scientists that would encompass both computing
and data management
• Possible SQL Server solution
 Combine RDBMS and scientific analysis tools in a
single integrated system
 Use HDF5 to manage scientific objects not handled
well by traditional database
7/18/2015
The HDF Group
70
HDF5 in SQL server
Visualization
Libraries
Web Services
(MATLAB,…)
(XML, REST, RSS)
OLAP and
Data Mining
Reporting
.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)
HDF5 EDM model
SQL Server
HDF5
HDF5
TVFs
Index
HDF5
type
7/18/2015
HDF5
files
HDF5
FS blob
The HDF Group
71
Thank You All
and
Thank You NASA!
7/18/2015
The HDF Group
72
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and conclusions
or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the
views of the National Aeronautics and Space
Administration.
7/18/2015
The HDF Group
73
Questions/comments?
7/18/2015
The HDF Group
74
Information Sources
• HDF website
http://hdfgroup.org/
• HDF5 Information Center
http://hdfgroup.org/HDF5/
• HDF Helpdesk
[email protected]
• HDF users mailing list
[email protected]
coming soon: [email protected]
7/18/2015
The HDF Group
75