sql-hive-meetup201210

Download Report

Transcript sql-hive-meetup201210

SQL (92 and Beyond) Support for
Hive
Jason Dai
Principal Engineer
Intel SSG (Software and Services Group)
Software and Services Group
What SQL support is needed?
More SQL-92 support for analytics
•
Complete SQL data type system
–
•
Full subquery support
–
–
•
Subquery in WHERE clauses, correlated subquery, scalar subquery, etc.
New expressions (EXISTS, ALL, ANY, etc.)
Complete Set operators
–
•
•
Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function
(CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc.
DISTINCT UNION, INTERSECT, EXCEPT, etc.
Multiple-table SELECT statement
Update/delete?
–
On HBase only?
(Almost) SQL-92 compliance?
•
How about transaction?
Software and Services Group
2
‹#›
What SQL support is needed (continued)?
Additional analytics support (beyond SQL-92)
•
Advanced OLAP functions for analysis & reporting
– E.g., rank, rollup, cube, window function (SQL 2003), etc.
•
Advanced SQL syntax
– E.g. WITH clause (SQL-99)
•
Procedural extensions
– E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.
Software and Services Group
3
‹#›
Workload Analysis
TPC-H
TPC-DS
Complex Subquery
Y
Y
Multiple-table SELECT
Y
Y
Set operators
SQL data types (especially
Datetime)
Y
Y
Y
Advanced OLAP functions (e.g., rank,
grouping and window functions)
Y
WITH clause (SQL-99)
Y
UPDATE/DELETE
Y
Software and Services Group
4
‹#›
Let’s Get Our Hands Dirty
Query
Parser
AST (Abstract
Syntax Tree)
Semantic Analyzer
(Optimizer)
Execution Plan
Execution
(Almost) SQL-compliant Hive parser
•
A lot of work: SQL much more complex than HiveQL
–
–
•
HiveQL grammar file: ~61KB with 2487 lines
SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines
Also complex: many existing Hive grammar rules need to be changed
–
To support more complex SQL constructs (e.g., subquery)
UDF/UDAF/UDTF
•
For some operators (e.g., rank)
Software and Services Group
5
‹#›
Let’s Get Our Hands Dirty
Query
Parser
AST (Abstract
Syntax Tree)
Semantic Analyzer
(Optimizer)
Execution Plan
Execution
Analysis, transformation & optimization
•
•
•
•
•
•
SQL data type system
Subquery support (incl. subquery unnestting)
Multiple-table SELECT
Set operations
Advanced OLAP functions
…
Software and Services Group
6
‹#›
How to Leverage Existing Works?
Project Panthera:
Our open source efforts to enable better analytics capabilities on Hadoop/HBase
https://github.com/intel-hadoop/project-panthera
Query
(Open SQLSQL Source) AST
Driver
SQL
Parser*
SQL-AST Analyzer
& Translator
Subquery Multi-Table
SELECT
Unnesting
HiveAST
…
HiveQL
Hive
Parser
Hive Semantic
Analyzer
INTERSECT MINUS
Support
Support
Hadoop
MR
…
•
•
Hive-AST
*https://github.com/porcelli/plsql-parser
A SQL engine for Hive MapReduce
 Goal: full analytical SQL support for OLAP




Subquery in WHERE clause
Correlated subquery
Multiple-table SELECT statement
…
Software and Services Group
7
‹#›
How to Leverage Existing Works?
NextR Hive UDFs
•
•
https://github.com/nexr/hive-udf
UDFs for Oracle db extensions (rank, decode, nvl, etc.)
SQL windowing functions for Hive
•
https://github.com/hbutani/SQLWindowing
Software and Services Group
8
‹#›
Software and Services Group
‹#›
9