Relational Efficiencies: Part 1

Download Report

Transcript Relational Efficiencies: Part 1

Relational Efficiencies: Part I
Renee Teatro
Information Builders
Copyright 2007, Information Builders. Slide 1
Relational Efficiencies
Agenda
 Optimization Overview
 JOINs
 Sorting
 Aggregation
 Expressions
 Direct SQL Passthru
Relational Efficiencies
Layers of Processing
User executes a FOCUS request...
 FOCUS calls the appropriate module
(TABLE, GRAPH)
 Reads and parses the MFD
 Parses the request
 Calls the Interface with information on
FOCUS / WebFOCUS
TABLE
TABLEF
GRAPH
FOCUS
MATCH
FILE
FRL
FOCUS Reporting Modules
DEFINE
JOIN
ANALYSE
the MFD and the request available in
work areas
Data Adapter
 Reads and parses the AFD
 Checks the module specifics for
optimization
 Analyzes and optimizes the request
 Generates SELECT statement(s)
 Passes SELECT statements
to the appropriate ‘physical’ module
 Prepares, allocates, and opens cursor(s)
 Calls the RDBMS to FETCH rows of data
SQL Generation and Execution
(‘logical layer’ – GNTINT )
Report is displayed
 FOCUS reads a row from the answer set
and processes the remaining actions on
that row (IF/WHERE, DEFINEs…)
 Puts the valid row into the Internal Matrix
 Reads the next row and repeats the
process until the end of the answer set
 Processes the Internal Matrix and
displays the report
Data Adapter
SQL modules
(‘physical layer’)
Teradata
DBTFOC
DB2
SQL/DS
RRSET
Oracle
ORAFOC
 The Interface receives a row and/or an
SQL status code
 Converts non-standard data into FOCUS
format, making it available to FOCUS
DB2
SQL/DS
Teradata
Oracle
 Asks RDBMS for the next row (FETCH),
until the end of the answer set is reached
(SQL status code +100)
 Analyzes and optimizes the SQL
translation of the FOCUS application
request
 Chooses the appropriate access path
and retrieval method
 Retrieves data
 Creates the answer set
 Sends back a row and/or an SQL status
code to the Interface
DATA
Copyright 2007, Information Builders. Slide 3
Relational Efficiencies
RDBMS Optimization
Interface optimization is the degree to which a TABLE request
is translated to SQL
In other words, the process in which the interface translates
projection, selection, JOIN, sort, and aggregation operations
of a report request into its SQL equivalent and passes it to
the RDBMS for processing
 TABLE, MODIFY, MAINTAIN 
 Direct SQL Passthru
 Passing JOINs
SQL
Relational Efficiencies
The Optimization Command
TSO
CMS
SQL
DB2/SQLDS
SQLDBC
SQLORA
SQLMSS
SET
Components
Target Database Engine
OPTIMIZATION
SQLJOIN
OUTER
ON
OFF
SQL
FOCUS
*
Description
DB2 / SQLDS
SQLDBC
SQLORA
SQLMSS
for DB2
for Teradata
for Oracle
for SQL Server
Copyright 2007, Information Builders. Slide 5
Relational Efficiencies
Optimization Settings
In the SQL engine SET OPTIMIZATION ON is the default, so
the interface attempts to fully optimize all requests
 RDBMS is preferred engine for processing
 More RDBMS processing is usually beneficial
Types of optimization
 ON – default
 OFF – Lets FOCUS handle all JOINs, sorts, and
aggregations
 SQL – pass SQL even if multiplicative effect found
 FOCUS – pass SQL only if results are identical to
FOCUS processing
Relational Efficiencies
Data Adapter TRACE Facility
 SET TRACEON=component//destination
 Component
 SQLDI
- FSTRACE - All Interface-RDBMS activity
 SQLAGGR - FSTRACE3 - Optimization messages
 STMTRACE - FSTRACE4 - SQL only
 SQLCALL - commands and data exchange between

the physical and the logical layers of the data adapter
Destination
 FSTRACE - allocation for the ddname of FSTRACE
 CLIENT
- displays client session to the screen
Relational Efficiencies
XRETRIEVAL Option
SET XRETRIEVAL=[ON | OFF]
 ON – the interface sends the request to the RDBMS
and it processes the request
 OFF – the interface attempts to optimize the request,
but no RDBMS processing is done
Relational Efficiencies
Fully Optimized Query
SET TRACEOFF=ALL
SET TRACEUSER=CLIENT
SET TRACEON=STMTRACE//FSTRACE
TABLE FILE EMPLOYEE
COUNT EMP_ID BY DEPARTMENT
WHERE CURR_SAL GT 10000
END
STMTRACE:
SELECT DEPARTMENT, COUNT(*) FROM EMPLOYEE
WHERE (CURR_SAL > 10000)
GROUP BY DEPARTMENT ORDER BY DEPARTMENT;
Relational Efficiencies
Non-Optimized Query
DEFINE FILE EMPDB2
CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW'
ELSE 'HIGH';
END
TABLE FILE EMPDB2
SUM CSAL CATEGORY BY EID
END
SQLAGGR:
(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:
(FOC2597) USE OF DEFINE FIELD THAT CANNOT BE AGGREGATED: CATEGORY
STMTRACE:
SELECT T1.EID,T1.CSAL
FROM "PMSSAE"."EMPINFO" T1
ORDER BY T1.EID FOR FETCH ONLY;
Relational Efficiencies
Now Optimized Query
SET
SET
SET
SET
TRACEOFF=ALL
TRACEUSER=CLIENT
TRACEON=SQLAGGR//CLIENT
TRACEON=STMTRACE//CLIENT
TABLE FILE EMPDB2
SUM CSAL COMPUTE CATEGORY/A4=IF CSAL LT 10000 THEN
'LOW' ELSE 'HIGH';
BY EID
END
SQLAGGR:
AGGREGATION DONE ...
STMTRACE:
SELECT T1.EID, SUM(T1.CSAL) FROM "PMSSAE"."EMPINFO" T1
GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
Copyright 2007, Information Builders. Slide 11
Projection and Selection
Copyright 2007, Information Builders. Slide 12
Relational Efficiencies
Projection
Projection is the retrieval of specific columns only
 Projection is always optimized
 Interface retrieves columns referenced in
 Print/sum/count commands
 Objects of JOINs and DEFINE statements
 PRINT * and SEG.fieldname
 Will return all columns in master file only
 SELECT * never produced
TABLE FILE EMPLOYEE
PRINT *
END
SELECT EID, LN, FN
FROM EMPLOYEE;
Relational Efficiencies
Projection
A master file can be considered a dynamic RDBMS view.
A master can contain:
 One or more columns of a relational table
 Multiple relational tables – called an embedded
MFD
 Real relational views
Main advantages of a master file
 SQL JOIN syntax hidden from user
 View not stored in RDBMS catalog
 Activation of only necessary tables (segments)
 Note: Not the case with dynamic JOIN
Relational Efficiencies
Selection: Translatable Screening
Type of Expressions
Expression Components
Examples
Arithmetic-Valued
Expressions
(Expressions that
return a single
number)
• Real fields of datatype I, P, D, or F
• Numeric constants
• Arithmetic operators (+, -, *, / )
• Aggregation operators (SUM.,
CNT., AVE., MIN., MAX.)
WHERE TOTAL (AVE.CSAL *
0.10) +AVE.CSAL
GT 55000;
Character StringValued Expressions
(Expressions that
return a character
string)
• Real fields of datatype A
• String constants
• Concatenation operator (I)
• EDIT of alphanumeric fields
WHERE EDIT(FN, ‘9.$’) |LN
EQ ‘J.WANG’ ;
• Real fields with any FOCUS
datatype
Logical Expressions
WHERE (CDIV EQ ‘CORP’ OR
•
Constants
of
consistent
datatype
(Expressions that
‘NE’) AND ((CSAL*
return a single value, • Relational operator
0.10) +CSAL GT 55000);
(EQ, NE, GT, LE ..)
True or False)
• Logical operators (AND, OR, NOT)
• Valued expression operands
NOTE
Screening conditions on DEFINEd fields, which calculate the above type of expressions, are
passed to the RDBMS.
Copyright 2007, Information Builders. Slide 15
Relational Efficiencies
Non-Translatable Screening Conditions
Example
Expressions using
User-written subroutines
DEFINE... FNL/I3 = ARGLEN(15,LN,FNL);
TABLE... IF FNL LE 6
IF–THEN–ELSE expressions
*** optimized
DEFINE... DIVISION/A11=IF CDIV EQ ‘CORP’ THEN ‘CORPORATE’ ELSE
IF CDIV EQ ‘NE’
THEN ‘NORTH-EAST’ ELSE ‘NA’;
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST”
Self-referential expressions
DEFINE... CPT/I2=CPT+1;
TABLE... IF CPT NE 0
EDIT for field format conversions
WHERE EDIT(ID) GT 20
Strong concatenation (II)
DEFINE... NAME/A27=FN||(‘ ‘ | LN);
TABLE... IF NAME EQ ‘DANIEL VALINO’
DECODE function
Non-SQL relational operators
(INCLUDES, EXCLUDES)
DEFINE... DEVISION/A11=DECODE CDIV (‘CORP’ ‘CORPORATE’ ‘NE’
‘NORTH-EAST’ ELSE ‘NA’);
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST’
IF LN INCLUDES ‘VALINO’
FOCUS subroutines
WHERE SQRT(CSAL) GT 260
(ABS, INT, MAX, MIN, LOG, SQRT)
Expressions using fields with
ACTUAL=DATE
DEFINE... HDAT2/YYMD=HDAT+365;
TABLE... IF HDAT2 GT ‘1990/03/01’
Copyright 2007, Information Builders. Slide 16
JOIN Processing
Copyright 2007, Information Builders. Slide 17
Relational Efficiencies
JOIN Optimization
 Interface attempts to generate ONE SELECT statement to
JOIN all tables
 Applies to dynamic or embedded JOINs
 One OPEN cursor operation
 JOIN optimized more readily by RDBMS
 An optimized JOIN enables sorts and aggregations to be
passed
 Limits interface <==> RDBMS communications
Relational Efficiencies
JOIN Optimization
 If JOIN is not passed to RDBMS
 Termed: FOCUS-managed JOIN
 One SELECT statement for EACH table
 FOCUS executes a nested loop JOIN
 Parent table (HOST) is the outer table
 One inner table OPEN for each row returned from the


outer table
 Outer (host) table – choose the one with fewest rows
returned
Sorts and aggregations are not passed
SQLAGGR/FSTRACE3 displays reason
Relational Efficiencies
Optimized JOIN vs. Non-Optimized JOIN
JOIN F1SSN IN TABLE1 TO F2SSN IN TABLE2
TABLE FILE TABLE1
PRINT F2SSN
END
Optimized JOIN:
SELECT T1.F1SSN,T2.F2SSN FROM "PMSNJC".TABLE1 T1,
PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = T1.F1SSN)
FOR FETCH ONLY;
Non-Optimized JOIN:
(FOC2510) FOCUS-MANAGED JOIN SELECTED FOR FOLLOWING
REASON(S):
SELECT T1.F1SSN FROM "PMSNJC".TABLE1 T1 FOR FETCH ONLY;
SELECT T2.F2SSN FROM PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = ?)
FOR FETCH ONLY;
Relational Efficiencies
Special JOINs
In earlier releases, these types of JOINs disabled optimization:
 Multiplicative effect encountered for aggregated requests
 Termed: Interface-managed native JOIN
 Check results, FOCUS managed may be more efficient
(SET OPTIMIZATION=OFF)
 Outer JOIN (SET ALL=ON)
 Missing cross-referenced rows are processed
 RDBMS specific syntax in SQL SELECT statement
 SQL sqlengine SET SQLJOIN OUTER OFF|ON
 SET ALL=PASS not supported
 WHERE field EQ ‘$*’ OR field IS-MISSING
 Create HOLD files/JOIN/SET ALL=PASS
 Heterogeneous JOIN
 Differing file types (e.g., flat file, IMS, etc.)
Relational Efficiencies
JOIN Considerations
When a JOIN is not passed to RDBMS, make sure:
 The KEYS= parameter is defined correctly
 The JOIN command (unique or non-unique) corresponds
to the AFD KEYS= parameter
Some other considerations:
 (Over) normalized vs. non-normalized data
 Ensure referenced tables on same retrieval path
 Consider use of indices
 If Interface optimization is disabled, consider choice of
parent table, use of HOLD files
 JOIN on same data type and length
Sort Processing
Copyright 2007, Information Builders. Slide 23
Relational Efficiencies
Sort Optimization
FOCUS BY/ACROSS translated to SQL ORDER BY
Translating sort phrases (BY/ACROSS) to SQL is important
 Relational sort is usually more efficient
 RDBMS uses indices
 Sort enables RDBMS to perform aggregation
 FOCUS retrieves the answer set in sorted order
 Reduced I/O since answer set is aggregated
Relational Efficiencies
Sort Optimization
FOCUS sort phrases are NOT translated to SQL and
Optimization is disabled when:
 Optimization was set OFF by user
 JOINs were not passed to RDBMS (and consequently
Optimization was disabled by the interface)
 A FOCUS sort phrase uses an FRL command:
BY field ROWS value1 OVER value2...
FOR field ROWS value1 OVER value2…
FOCUS sort phrases are not fully translated to SQL and
aggregation and optimization is automatically disabled when:
 FOCUS BY/ACROSS…IN-GROUPS-OF is requested
Relational Efficiencies
Sort Optimization
To get FOCUS Sort phrases translated to SQL
 Sort on real fields & use COMPUTEs instead of DEFINEs
 Sorts on most DEFINEd fields are now optimized
 Use SQLAGGR/STMTRACE to evaluate if DEFINE
fields are being translated. If not, reformulate if
possible
 With FST. and LST. ensure access file KEYS and
KEYORDER parameters are correct
Considerations
 Consider indexes on sort objects
 SET OPTIMIZATION OFF/TABLEF/External Sort
 Consider using TABLEF if sort is passed
Relational Efficiencies
Using TABLEF
Use TABLEF when all FOCUS sort phrases are translated to SQL
 Faster than TABLE
 Does not generate an internal matrix (FOCSORT)
 Eliminates FOCUS sorting
You cannot use TABLEF when FOCUS has to process some of the
sorting with
 ACROSS
 Direct operators requiring the FOCUS internal matrix
(TOT., PCT., or RPCT.)
 COMPUTE expressions using direct operators
 Multi-verb requests
 RETYPE
Note: Locks are held with TABLEF until report is complete (commit
issued)
Aggregation Processing
Copyright 2007, Information Builders. Slide 28
Relational Efficiencies
Efficient Aggregation
Aggregation translation is important
 RDBMS aggregation is more efficient: indices
 An aggregated answer set reduces FOCUS-to-RDBMS
communication
 A smaller answer set reduces FOCUS local processing
FOCUS
SQL
SUM ..., WRITE ... BY field
SELECT SUM(...)
GROUP BY column
ORDER BY column
SUM., CNT., MIN., MAX., AVE.
SUM(...), COUNT(*), MIN(...), MAX(...), AVG(...)
Relational Efficiencies
Translatable Aggregation
 Verbs: SUM, COUNT, WRITE
 Direct operators: MIN., MAX., AVE.
 Aggregating DEFINEd fields:
 Constant DEFINEd fields translated with CNT.
The following defined expressions can be translated
Type of expressions
Expression components
Arithmetic Valued
• Real fields of datatype I, P, D, or F
(Expressions that return • Numeric constants
• Arithmetic operators (+, -, *, / )
a single number)
Character String Valued • Real fields of datatype A
(Expressions that return • String constants
• Concatenation operator (I)
a character string)
• EDIT of alphanumeric fields
Examples
DEFINE FILE ORAEMP
NEW_SAL/D12.2=(CSAL * 0.10)
+ CSAL ;
END
DEFINE FILE ORAEMP
NAME/A18=EDIT(FN,‘9.$’)|LN;
END
Relational Efficiencies
Non-Translatable Aggregation
Aggregation is not translated to SQL and optimization is
automatically disabled when:
 Optimization was set off by user
 JOINs were not passed to RDBMS (and consequently
optimization was disabled by the interface)
 FOCUS sort phrase is not translated
 Some screening conditions not passed to RDBMS
 Some non-SQL operators are used
 Multi-verb requests
 COUNT with MISSING=ON
NOTE
If the verbs PRINT or LIST are used, no aggregation is requested and FSTRACE3 returns the
following message:
(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:
(FOC2594) AGGREGATION IS NOT APPLICABLE TO THE VERB USED
Relational Efficiencies
Aggregation Considerations
 Possible index-only processing
 Possibly aggregate in RDBMS index
 Explicit or implicit (e.g., in heading/footing) FST. and LST.
can be optimized using MIN and MAX
 Aggregate on real fields
 Use COMPUTE in place of DEFINE
 Create aggregated extract files (HOLD files) in cases
where aggregation is not optimized
Virtual Field Processing
Copyright 2007, Information Builders. Slide 33
Relational Efficiencies
Virtual Field (DEFINE) Optimization
 DEFINE fields can be optimized as part of aggregation or
record selection
 Aggregation or record selection can optimize:
 Arithmetic-valued expressions
 Character string-valued expressions
 Logical expressions (selection only)
 Aggregation cannot be optimized for logical expressions
 Single segment DEFINEs passed when JOIN is not
 IF-THEN-ELSE DEFINEs capable of being passed
Relational Efficiencies
IF-THEN-ELSE DEFINE Example – Optimized
DEFINE FILE EMPINFO
SAL_FLAG = IF (CURRENT_SALARY LT 10000) AND
(DEPARTMENT_CD EQ 'MIS') THEN 1 ELSE 0;
END
TABLE FILE EMPINFO
PRINT EMP_ID LAST_NAME FIRST_NAME
IF SAL_FLAG EQ 1
END
STMTRACE:
SELECT T1.EID,T1.LN,T1.FN,
T1.DEPARTMENT_CD,T1.CURRENT_SALARY
FROM "USER1"."EMPINFO" T1
WHERE ((((T1.CURRENT_SALARY < 10000)
AND (T1.DEPARTMENT_CD = 'MIS')))) FOR FETCH ONLY;
Relational Efficiencies
Aggregation DEFINE Example – Optimized
DEFINE FILE EMPDB2
CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';
CATEGORY1/I4 = IF CSAL LT 10000 THEN 0 ELSE 1 ;
CATEGORY2/D10 = CSAL * 1.3;
END
TABLE FILE EMPDB2
SUM CSAL CATEGORY2
BY EID
END
STMTRACE:
SELECT T1.EID, SUM(T1.CSAL), SUM((T1.CSAL * 1.3))
FROM "PMSSAE"."EMPINFO" T1
GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
Relational Efficiencies
Sort Expression Example – Optimized
Aggregation by Expression
 Allows named expression to be used in ORDER BY clause
DEFINE FILE DB2FILE
TAX = 0.08 * PRICE
END
TABLE FILE DB2FILE
SUM PRICE TAX
BY TAX NOPRINT
AGGREGATION DONE ...
SELECT SK001, SUM(VB001), SUM(VB002) FROM
(SELECT (.08 * T1.PRICE) AS SK001,T1.PRICE AS VB001,(.08 * T1.PRICE) AS VB002
FROM USER.DB2FILE T1 ) X GROUP BY SK001 ORDER BY SK001
FOR FETCH ONLY;
In the past: (FOC2597) USE OF DEFINED FIELD THAT CANNOT BE AGGREGATED :
TAX
Copyright 2007, Information Builders. Slide 37
Relational Efficiencies
Direct SQL Passthru
Best of both worlds
 If the most efficient SQL is not generated or
 Optimized SQL code already exists
 SQL sqlengine {any valid SQL statement} END
SET SQLENGINE=SQLORA
SQL PREPARE result FOR
SELECT * FROM DQAORA01;
TABLE FILE result
PRINT F1SSN
ON TABLE HOLD AS HOLD1
END
TABLE FILE HOLD1
PRINT F1SSN
END
SQL DB2 SELECT
C.CLIENT_ID,J.CLIENT_ID,
C.CASE_NO,J.REST
FROM CLIENT C,
CLIENTJ J
WHERE C.CLIENT_ID=J.CLIENT_ID;
TABLE FILE SQLOUT
PRINT *
ON TABLE HOLD
END
Copyright 2007, Information Builders. Slide 38
Copyright 2007, Information Builders. Slide 39