Relational Efficiencies: Part 1
Download
Report
Transcript Relational Efficiencies: Part 1
Relational Efficiencies: Part I
Renee Teatro
Information Builders
Copyright 2007, Information Builders. Slide 1
Relational Efficiencies
Agenda
Optimization Overview
JOINs
Sorting
Aggregation
Expressions
Direct SQL Passthru
Relational Efficiencies
Layers of Processing
User executes a FOCUS request...
FOCUS calls the appropriate module
(TABLE, GRAPH)
Reads and parses the MFD
Parses the request
Calls the Interface with information on
FOCUS / WebFOCUS
TABLE
TABLEF
GRAPH
FOCUS
MATCH
FILE
FRL
FOCUS Reporting Modules
DEFINE
JOIN
ANALYSE
the MFD and the request available in
work areas
Data Adapter
Reads and parses the AFD
Checks the module specifics for
optimization
Analyzes and optimizes the request
Generates SELECT statement(s)
Passes SELECT statements
to the appropriate ‘physical’ module
Prepares, allocates, and opens cursor(s)
Calls the RDBMS to FETCH rows of data
SQL Generation and Execution
(‘logical layer’ – GNTINT )
Report is displayed
FOCUS reads a row from the answer set
and processes the remaining actions on
that row (IF/WHERE, DEFINEs…)
Puts the valid row into the Internal Matrix
Reads the next row and repeats the
process until the end of the answer set
Processes the Internal Matrix and
displays the report
Data Adapter
SQL modules
(‘physical layer’)
Teradata
DBTFOC
DB2
SQL/DS
RRSET
Oracle
ORAFOC
The Interface receives a row and/or an
SQL status code
Converts non-standard data into FOCUS
format, making it available to FOCUS
DB2
SQL/DS
Teradata
Oracle
Asks RDBMS for the next row (FETCH),
until the end of the answer set is reached
(SQL status code +100)
Analyzes and optimizes the SQL
translation of the FOCUS application
request
Chooses the appropriate access path
and retrieval method
Retrieves data
Creates the answer set
Sends back a row and/or an SQL status
code to the Interface
DATA
Copyright 2007, Information Builders. Slide 3
Relational Efficiencies
RDBMS Optimization
Interface optimization is the degree to which a TABLE request
is translated to SQL
In other words, the process in which the interface translates
projection, selection, JOIN, sort, and aggregation operations
of a report request into its SQL equivalent and passes it to
the RDBMS for processing
TABLE, MODIFY, MAINTAIN
Direct SQL Passthru
Passing JOINs
SQL
Relational Efficiencies
The Optimization Command
TSO
CMS
SQL
DB2/SQLDS
SQLDBC
SQLORA
SQLMSS
SET
Components
Target Database Engine
OPTIMIZATION
SQLJOIN
OUTER
ON
OFF
SQL
FOCUS
*
Description
DB2 / SQLDS
SQLDBC
SQLORA
SQLMSS
for DB2
for Teradata
for Oracle
for SQL Server
Copyright 2007, Information Builders. Slide 5
Relational Efficiencies
Optimization Settings
In the SQL engine SET OPTIMIZATION ON is the default, so
the interface attempts to fully optimize all requests
RDBMS is preferred engine for processing
More RDBMS processing is usually beneficial
Types of optimization
ON – default
OFF – Lets FOCUS handle all JOINs, sorts, and
aggregations
SQL – pass SQL even if multiplicative effect found
FOCUS – pass SQL only if results are identical to
FOCUS processing
Relational Efficiencies
Data Adapter TRACE Facility
SET TRACEON=component//destination
Component
SQLDI
- FSTRACE - All Interface-RDBMS activity
SQLAGGR - FSTRACE3 - Optimization messages
STMTRACE - FSTRACE4 - SQL only
SQLCALL - commands and data exchange between
the physical and the logical layers of the data adapter
Destination
FSTRACE - allocation for the ddname of FSTRACE
CLIENT
- displays client session to the screen
Relational Efficiencies
XRETRIEVAL Option
SET XRETRIEVAL=[ON | OFF]
ON – the interface sends the request to the RDBMS
and it processes the request
OFF – the interface attempts to optimize the request,
but no RDBMS processing is done
Relational Efficiencies
Fully Optimized Query
SET TRACEOFF=ALL
SET TRACEUSER=CLIENT
SET TRACEON=STMTRACE//FSTRACE
TABLE FILE EMPLOYEE
COUNT EMP_ID BY DEPARTMENT
WHERE CURR_SAL GT 10000
END
STMTRACE:
SELECT DEPARTMENT, COUNT(*) FROM EMPLOYEE
WHERE (CURR_SAL > 10000)
GROUP BY DEPARTMENT ORDER BY DEPARTMENT;
Relational Efficiencies
Non-Optimized Query
DEFINE FILE EMPDB2
CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW'
ELSE 'HIGH';
END
TABLE FILE EMPDB2
SUM CSAL CATEGORY BY EID
END
SQLAGGR:
(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:
(FOC2597) USE OF DEFINE FIELD THAT CANNOT BE AGGREGATED: CATEGORY
STMTRACE:
SELECT T1.EID,T1.CSAL
FROM "PMSSAE"."EMPINFO" T1
ORDER BY T1.EID FOR FETCH ONLY;
Relational Efficiencies
Now Optimized Query
SET
SET
SET
SET
TRACEOFF=ALL
TRACEUSER=CLIENT
TRACEON=SQLAGGR//CLIENT
TRACEON=STMTRACE//CLIENT
TABLE FILE EMPDB2
SUM CSAL COMPUTE CATEGORY/A4=IF CSAL LT 10000 THEN
'LOW' ELSE 'HIGH';
BY EID
END
SQLAGGR:
AGGREGATION DONE ...
STMTRACE:
SELECT T1.EID, SUM(T1.CSAL) FROM "PMSSAE"."EMPINFO" T1
GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
Copyright 2007, Information Builders. Slide 11
Projection and Selection
Copyright 2007, Information Builders. Slide 12
Relational Efficiencies
Projection
Projection is the retrieval of specific columns only
Projection is always optimized
Interface retrieves columns referenced in
Print/sum/count commands
Objects of JOINs and DEFINE statements
PRINT * and SEG.fieldname
Will return all columns in master file only
SELECT * never produced
TABLE FILE EMPLOYEE
PRINT *
END
SELECT EID, LN, FN
FROM EMPLOYEE;
Relational Efficiencies
Projection
A master file can be considered a dynamic RDBMS view.
A master can contain:
One or more columns of a relational table
Multiple relational tables – called an embedded
MFD
Real relational views
Main advantages of a master file
SQL JOIN syntax hidden from user
View not stored in RDBMS catalog
Activation of only necessary tables (segments)
Note: Not the case with dynamic JOIN
Relational Efficiencies
Selection: Translatable Screening
Type of Expressions
Expression Components
Examples
Arithmetic-Valued
Expressions
(Expressions that
return a single
number)
• Real fields of datatype I, P, D, or F
• Numeric constants
• Arithmetic operators (+, -, *, / )
• Aggregation operators (SUM.,
CNT., AVE., MIN., MAX.)
WHERE TOTAL (AVE.CSAL *
0.10) +AVE.CSAL
GT 55000;
Character StringValued Expressions
(Expressions that
return a character
string)
• Real fields of datatype A
• String constants
• Concatenation operator (I)
• EDIT of alphanumeric fields
WHERE EDIT(FN, ‘9.$’) |LN
EQ ‘J.WANG’ ;
• Real fields with any FOCUS
datatype
Logical Expressions
WHERE (CDIV EQ ‘CORP’ OR
•
Constants
of
consistent
datatype
(Expressions that
‘NE’) AND ((CSAL*
return a single value, • Relational operator
0.10) +CSAL GT 55000);
(EQ, NE, GT, LE ..)
True or False)
• Logical operators (AND, OR, NOT)
• Valued expression operands
NOTE
Screening conditions on DEFINEd fields, which calculate the above type of expressions, are
passed to the RDBMS.
Copyright 2007, Information Builders. Slide 15
Relational Efficiencies
Non-Translatable Screening Conditions
Example
Expressions using
User-written subroutines
DEFINE... FNL/I3 = ARGLEN(15,LN,FNL);
TABLE... IF FNL LE 6
IF–THEN–ELSE expressions
*** optimized
DEFINE... DIVISION/A11=IF CDIV EQ ‘CORP’ THEN ‘CORPORATE’ ELSE
IF CDIV EQ ‘NE’
THEN ‘NORTH-EAST’ ELSE ‘NA’;
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST”
Self-referential expressions
DEFINE... CPT/I2=CPT+1;
TABLE... IF CPT NE 0
EDIT for field format conversions
WHERE EDIT(ID) GT 20
Strong concatenation (II)
DEFINE... NAME/A27=FN||(‘ ‘ | LN);
TABLE... IF NAME EQ ‘DANIEL VALINO’
DECODE function
Non-SQL relational operators
(INCLUDES, EXCLUDES)
DEFINE... DEVISION/A11=DECODE CDIV (‘CORP’ ‘CORPORATE’ ‘NE’
‘NORTH-EAST’ ELSE ‘NA’);
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST’
IF LN INCLUDES ‘VALINO’
FOCUS subroutines
WHERE SQRT(CSAL) GT 260
(ABS, INT, MAX, MIN, LOG, SQRT)
Expressions using fields with
ACTUAL=DATE
DEFINE... HDAT2/YYMD=HDAT+365;
TABLE... IF HDAT2 GT ‘1990/03/01’
Copyright 2007, Information Builders. Slide 16
JOIN Processing
Copyright 2007, Information Builders. Slide 17
Relational Efficiencies
JOIN Optimization
Interface attempts to generate ONE SELECT statement to
JOIN all tables
Applies to dynamic or embedded JOINs
One OPEN cursor operation
JOIN optimized more readily by RDBMS
An optimized JOIN enables sorts and aggregations to be
passed
Limits interface <==> RDBMS communications
Relational Efficiencies
JOIN Optimization
If JOIN is not passed to RDBMS
Termed: FOCUS-managed JOIN
One SELECT statement for EACH table
FOCUS executes a nested loop JOIN
Parent table (HOST) is the outer table
One inner table OPEN for each row returned from the
outer table
Outer (host) table – choose the one with fewest rows
returned
Sorts and aggregations are not passed
SQLAGGR/FSTRACE3 displays reason
Relational Efficiencies
Optimized JOIN vs. Non-Optimized JOIN
JOIN F1SSN IN TABLE1 TO F2SSN IN TABLE2
TABLE FILE TABLE1
PRINT F2SSN
END
Optimized JOIN:
SELECT T1.F1SSN,T2.F2SSN FROM "PMSNJC".TABLE1 T1,
PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = T1.F1SSN)
FOR FETCH ONLY;
Non-Optimized JOIN:
(FOC2510) FOCUS-MANAGED JOIN SELECTED FOR FOLLOWING
REASON(S):
SELECT T1.F1SSN FROM "PMSNJC".TABLE1 T1 FOR FETCH ONLY;
SELECT T2.F2SSN FROM PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = ?)
FOR FETCH ONLY;
Relational Efficiencies
Special JOINs
In earlier releases, these types of JOINs disabled optimization:
Multiplicative effect encountered for aggregated requests
Termed: Interface-managed native JOIN
Check results, FOCUS managed may be more efficient
(SET OPTIMIZATION=OFF)
Outer JOIN (SET ALL=ON)
Missing cross-referenced rows are processed
RDBMS specific syntax in SQL SELECT statement
SQL sqlengine SET SQLJOIN OUTER OFF|ON
SET ALL=PASS not supported
WHERE field EQ ‘$*’ OR field IS-MISSING
Create HOLD files/JOIN/SET ALL=PASS
Heterogeneous JOIN
Differing file types (e.g., flat file, IMS, etc.)
Relational Efficiencies
JOIN Considerations
When a JOIN is not passed to RDBMS, make sure:
The KEYS= parameter is defined correctly
The JOIN command (unique or non-unique) corresponds
to the AFD KEYS= parameter
Some other considerations:
(Over) normalized vs. non-normalized data
Ensure referenced tables on same retrieval path
Consider use of indices
If Interface optimization is disabled, consider choice of
parent table, use of HOLD files
JOIN on same data type and length
Sort Processing
Copyright 2007, Information Builders. Slide 23
Relational Efficiencies
Sort Optimization
FOCUS BY/ACROSS translated to SQL ORDER BY
Translating sort phrases (BY/ACROSS) to SQL is important
Relational sort is usually more efficient
RDBMS uses indices
Sort enables RDBMS to perform aggregation
FOCUS retrieves the answer set in sorted order
Reduced I/O since answer set is aggregated
Relational Efficiencies
Sort Optimization
FOCUS sort phrases are NOT translated to SQL and
Optimization is disabled when:
Optimization was set OFF by user
JOINs were not passed to RDBMS (and consequently
Optimization was disabled by the interface)
A FOCUS sort phrase uses an FRL command:
BY field ROWS value1 OVER value2...
FOR field ROWS value1 OVER value2…
FOCUS sort phrases are not fully translated to SQL and
aggregation and optimization is automatically disabled when:
FOCUS BY/ACROSS…IN-GROUPS-OF is requested
Relational Efficiencies
Sort Optimization
To get FOCUS Sort phrases translated to SQL
Sort on real fields & use COMPUTEs instead of DEFINEs
Sorts on most DEFINEd fields are now optimized
Use SQLAGGR/STMTRACE to evaluate if DEFINE
fields are being translated. If not, reformulate if
possible
With FST. and LST. ensure access file KEYS and
KEYORDER parameters are correct
Considerations
Consider indexes on sort objects
SET OPTIMIZATION OFF/TABLEF/External Sort
Consider using TABLEF if sort is passed
Relational Efficiencies
Using TABLEF
Use TABLEF when all FOCUS sort phrases are translated to SQL
Faster than TABLE
Does not generate an internal matrix (FOCSORT)
Eliminates FOCUS sorting
You cannot use TABLEF when FOCUS has to process some of the
sorting with
ACROSS
Direct operators requiring the FOCUS internal matrix
(TOT., PCT., or RPCT.)
COMPUTE expressions using direct operators
Multi-verb requests
RETYPE
Note: Locks are held with TABLEF until report is complete (commit
issued)
Aggregation Processing
Copyright 2007, Information Builders. Slide 28
Relational Efficiencies
Efficient Aggregation
Aggregation translation is important
RDBMS aggregation is more efficient: indices
An aggregated answer set reduces FOCUS-to-RDBMS
communication
A smaller answer set reduces FOCUS local processing
FOCUS
SQL
SUM ..., WRITE ... BY field
SELECT SUM(...)
GROUP BY column
ORDER BY column
SUM., CNT., MIN., MAX., AVE.
SUM(...), COUNT(*), MIN(...), MAX(...), AVG(...)
Relational Efficiencies
Translatable Aggregation
Verbs: SUM, COUNT, WRITE
Direct operators: MIN., MAX., AVE.
Aggregating DEFINEd fields:
Constant DEFINEd fields translated with CNT.
The following defined expressions can be translated
Type of expressions
Expression components
Arithmetic Valued
• Real fields of datatype I, P, D, or F
(Expressions that return • Numeric constants
• Arithmetic operators (+, -, *, / )
a single number)
Character String Valued • Real fields of datatype A
(Expressions that return • String constants
• Concatenation operator (I)
a character string)
• EDIT of alphanumeric fields
Examples
DEFINE FILE ORAEMP
NEW_SAL/D12.2=(CSAL * 0.10)
+ CSAL ;
END
DEFINE FILE ORAEMP
NAME/A18=EDIT(FN,‘9.$’)|LN;
END
Relational Efficiencies
Non-Translatable Aggregation
Aggregation is not translated to SQL and optimization is
automatically disabled when:
Optimization was set off by user
JOINs were not passed to RDBMS (and consequently
optimization was disabled by the interface)
FOCUS sort phrase is not translated
Some screening conditions not passed to RDBMS
Some non-SQL operators are used
Multi-verb requests
COUNT with MISSING=ON
NOTE
If the verbs PRINT or LIST are used, no aggregation is requested and FSTRACE3 returns the
following message:
(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:
(FOC2594) AGGREGATION IS NOT APPLICABLE TO THE VERB USED
Relational Efficiencies
Aggregation Considerations
Possible index-only processing
Possibly aggregate in RDBMS index
Explicit or implicit (e.g., in heading/footing) FST. and LST.
can be optimized using MIN and MAX
Aggregate on real fields
Use COMPUTE in place of DEFINE
Create aggregated extract files (HOLD files) in cases
where aggregation is not optimized
Virtual Field Processing
Copyright 2007, Information Builders. Slide 33
Relational Efficiencies
Virtual Field (DEFINE) Optimization
DEFINE fields can be optimized as part of aggregation or
record selection
Aggregation or record selection can optimize:
Arithmetic-valued expressions
Character string-valued expressions
Logical expressions (selection only)
Aggregation cannot be optimized for logical expressions
Single segment DEFINEs passed when JOIN is not
IF-THEN-ELSE DEFINEs capable of being passed
Relational Efficiencies
IF-THEN-ELSE DEFINE Example – Optimized
DEFINE FILE EMPINFO
SAL_FLAG = IF (CURRENT_SALARY LT 10000) AND
(DEPARTMENT_CD EQ 'MIS') THEN 1 ELSE 0;
END
TABLE FILE EMPINFO
PRINT EMP_ID LAST_NAME FIRST_NAME
IF SAL_FLAG EQ 1
END
STMTRACE:
SELECT T1.EID,T1.LN,T1.FN,
T1.DEPARTMENT_CD,T1.CURRENT_SALARY
FROM "USER1"."EMPINFO" T1
WHERE ((((T1.CURRENT_SALARY < 10000)
AND (T1.DEPARTMENT_CD = 'MIS')))) FOR FETCH ONLY;
Relational Efficiencies
Aggregation DEFINE Example – Optimized
DEFINE FILE EMPDB2
CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';
CATEGORY1/I4 = IF CSAL LT 10000 THEN 0 ELSE 1 ;
CATEGORY2/D10 = CSAL * 1.3;
END
TABLE FILE EMPDB2
SUM CSAL CATEGORY2
BY EID
END
STMTRACE:
SELECT T1.EID, SUM(T1.CSAL), SUM((T1.CSAL * 1.3))
FROM "PMSSAE"."EMPINFO" T1
GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
Relational Efficiencies
Sort Expression Example – Optimized
Aggregation by Expression
Allows named expression to be used in ORDER BY clause
DEFINE FILE DB2FILE
TAX = 0.08 * PRICE
END
TABLE FILE DB2FILE
SUM PRICE TAX
BY TAX NOPRINT
AGGREGATION DONE ...
SELECT SK001, SUM(VB001), SUM(VB002) FROM
(SELECT (.08 * T1.PRICE) AS SK001,T1.PRICE AS VB001,(.08 * T1.PRICE) AS VB002
FROM USER.DB2FILE T1 ) X GROUP BY SK001 ORDER BY SK001
FOR FETCH ONLY;
In the past: (FOC2597) USE OF DEFINED FIELD THAT CANNOT BE AGGREGATED :
TAX
Copyright 2007, Information Builders. Slide 37
Relational Efficiencies
Direct SQL Passthru
Best of both worlds
If the most efficient SQL is not generated or
Optimized SQL code already exists
SQL sqlengine {any valid SQL statement} END
SET SQLENGINE=SQLORA
SQL PREPARE result FOR
SELECT * FROM DQAORA01;
TABLE FILE result
PRINT F1SSN
ON TABLE HOLD AS HOLD1
END
TABLE FILE HOLD1
PRINT F1SSN
END
SQL DB2 SELECT
C.CLIENT_ID,J.CLIENT_ID,
C.CASE_NO,J.REST
FROM CLIENT C,
CLIENTJ J
WHERE C.CLIENT_ID=J.CLIENT_ID;
TABLE FILE SQLOUT
PRINT *
ON TABLE HOLD
END
Copyright 2007, Information Builders. Slide 38
Copyright 2007, Information Builders. Slide 39