advanced72

Transcript advanced72

Advanced PowerCenter
V 7.1.1
by
11262004
www.infocrest.com
1
Copyright ©2004
Table of Contents
Introduction and Objectives
Mapping Techniques
Local Variables
Expression Transformation
Aggregator
Aggregate Functions
De-Normalizing
Lookup Transformation
Dynamic Lookups
Transaction Control
Transaction Based Processing
Joiner
Stored Procedure
Debugger Tips
Mapping Error Handling
Duplicate Keys
Audits
Workflow Manager
Design
Sessions & Tasks
Worklets
Workflows
Triggers and Schedulers
Testing
Session and Server Variables
Parameter Files
Incremental Loads
11262004
www.infocrest.com
3
8
10
11
15
17
18
21
27
33
36
39
43
50
58
64
66
68
71
73
79
81
85
92
95
100
103
Partitioning
Architecture Review
Guidelines and Definitions
Partition Methods
Cache Partitioning
Limitations
Demo
Performance & Tuning
Tuning 101
Target Bottlenecks
Source Bottlenecks
Mapping Bottlenecks
Session Bottlenecks
System Bottlenecks
Using Performance Counter
Command Line Utilities
pmcmd
pmrep
pmrepagent
Repository MX Views
Viewing Dependencies
Performance Historical Data
Documentation
2
Copyright ©2004
104
105
107
111
113
114
116
130
131
135
137
139
141
144
146
155
160
163
167
168
172
175
177
Introduction
11262004
www.infocrest.com
3
Copyright ©2004
Founded in 1993
Leader in Enterprise Data Integration
Platform
More than 1500 customers
Global presence
High growth
Public since April 1999
Symbol: INFA
11262004
www.infocrest.com
4
Copyright ©2004
Training & Knowledge Transfer:
Creator of the Informatica Boot Camp
INFA Project reviews
PM/PC Implementation methodologies
Founded in 1997 in Los Angeles, CA
Principal: Jean Abi-Jaoudé
Informatica Partner since 1997
Fortune 100 clients (GE, Qualcomm, Disney ..)
11262004
www.infocrest.com
5
Copyright ©2004
Class Objectives








Advanced Mapping Techniques
Mapping Error Handling
Workflow Techniques
Parameter Files
Partitioning
Performance and Tuning
Command Line Utilities
MX Views
11262004
www.infocrest.com
7
Copyright ©2004
Mapping Techniques
11262004
www.infocrest.com
8
Copyright ©2004
Transformation Objects

Expression Transformation



Aggregator Transformation







Dynamic commits
Transaction Based Processing
Joiner Transformation



Lookup Caches
SQL overrides
Dynamic lookups
Transaction Control


Sorted input
De-normalizer techniques
Aggregate functions
Lookup Transformation


Using local variables, forward and self references
Using an expression with a sequence generator to achieve very large primary key sequences
Using sorted input
Self-joins
Stored Procedures


Pre and post load procedures
Calling unconnected stored procedures
11262004
www.infocrest.com
9
Copyright ©2004
Local Variables

Self Reference


Forward Reference



A local variable can refer to a variable that is further down the port list
Using this method, you can compare data from 2 or more consecutive rows
Un-initialized Values




Keep the current value of
v_EmployeeID IF v_IsDupe is true
A local variable can refer to itself as in:
v_CustomerIDLagged takes the value of
v_CustomerID before v_CustomerID is evaluated
Strings: empty (not a NULL value!)
Numbers: 0
Dates: 01/01/1753
Transformations

Local variables available in Expression, Aggregator and Rank
11262004
www.infocrest.com
10
Copyright ©2004
Expression Transformation

Lagger Technique
When you need to make a decision or set a value based on the contents of 2 or more
consecutive rows
Expression holds one or more field values in memory for one or more consecutive rows
Makes use of the top to bottom order of evaluation for variable ports
One input port, two variable ports and one output port needed for each lagged field
Add one variable port for each additional row you wish to hold in memory





Use variable forward reference to
assign v_ShipCountry to
v_ShipCountry_Lagged before
v_ShipContry’s value is changed

Usage Example

One row holds a start time, the next row holds an end time, hold the first row in memory
to compute the elapsed time
11262004
www.infocrest.com
11
Copyright ©2004
Expression Transformation

Lagger Mapping Needs
 Dummy ‘pusher’ row to push the last row of data out of the lagger expression



Flat files: add an extra row using a pre-session command task
Relational: add an UNION ALL statement in a SQL override
Filter or Router group to remove the first row coming out of the lagger expression

This row holds empty (un-initialized values from the expression)
Source Qualifier with UNION
ALL SQL override
11262004
www.infocrest.com
Lagger expression to
detect duplicates. Has a
flag with 3 values:
•dupe
•not dupe
•first row
12
Router to direct the flow
to dupe and non dupes
target. Router will reject
the first row based on
the flag value
Copyright ©2004
Expression Transformation

Generating Very Large Surrogate Key Values



Sequence generator limited to 2 billion +
Use an expression to increment a multiplier when the sequence reaches a given
threshold
Use a mapping variable to store the multiplier value in the repository after each run
11262004
www.infocrest.com
13
Copyright ©2004
Aggregator

Sorted Input








Use it whenever you can
Improves the pipeline
performance
Only the rows needed
for one group are kept in
memory
All aggregations are
done using RAM, not
disk cache
You need to pre-sort the data using either a SQL override in your source qualifier, a
Sorter Transformation or pre-sorted flat files
You need to sort the data in the order of the ‘group by’ ports in your aggregator
The Informatica Server will fail the session if the data is not sorted in strict ascending
or descending order
The Informatica Server will revert to unsorted input behavior when:



11262004
www.infocrest.com
There is no ‘group by’ port in your aggregator
You select ‘Incremental Aggregation’ in the aggregator properties
You have nested aggregate functions in your aggregator’s expressions
15
Copyright ©2004
Aggregator

Why Sorted Input Improves Performance



Informatica Server processes data in buffered stages
Read, Transform and Write stages normally operate simultaneously, feeding
each others buffers
An unsorted Aggregator is a bottleneck, as the write stage must wait until all
aggregation is done before processing any row
Read
Read
Transform
unsorted
sorted
Write
Write
Elapsed Time
Elapsed Time
11262004
www.infocrest.com
Transform
16
Copyright ©2004
Aggregate Functions

Nesting functions




Conditional clause





You can nest aggregate functions as in MAX(SUM(order_amount))
The Informatica server ignores ‘sorted input’ when you nest aggregate functions
You cannot mix nested and non-nested functions in the same aggregator
Use instead of an IIF statement for aggregate expressions
Syntax is : FUNCT(input, [Boolean expression])
Example: AVG (price, price > 2000), computes the average price value when price is above
2000
The input will be taken into account for the aggregation if and only if the Boolean expression
returns TRUE
Availability

With output ports only, no variable port can have an aggregate expression
11262004
www.infocrest.com
17
Copyright ©2004
De-Normalizing

Definition


BookID
Language
1324
English
1324
French
1324
Spanish
1325
English
BookID
EnglishFlag
FrenchFlag
SpanishFlag
1324
Y
Y
Y
1325
Y
N
N
Pivot
Methods



Pivot table to arrange data in multiple field in a single row rather than one field in multiple rows
Using the first, last, min or max functions with a conditional clause

For simple cases
Using local variables and forward and self reference techniques

For more complex aggregations or concatenation of string values
Limitations

Use only when the input has a known number of repeating rows or a know maximum number of
repeating rows
11262004
www.infocrest.com
18
Copyright ©2004
De-Normalizing

Using Aggregate Functions

Use with a conditional expression to extract the value of the first or last row that matches the
condition
Amount
Year
Quarter
254,556
2003
first
546,332
2003
second
129,034
2003
third
490,223
2003
fourth
165,768
2004
first
265,443
2004
second
510,412
2004
third
435,690
2004
fourth
11262004
www.infocrest.com
Year
Q1_Amount
Q2_Amount
Q3_Amount
Q4_Amount
2003
254,556
546,332
129,034
490,223
2004
165,768
265,443
510,412
435,690
In this case, it does not matter
what aggregate expression is
used. Last, Max or Avg would do
just as well.
19
Copyright ©2004
De-Normalizing

Using Local Variables



Local variables inside an aggregator are evaluated at each row
In the debugger, you can only see values for each group, if the aggregator is sorted
Local variables are needed for custom aggregation, like string concatenations
Group by bookid
Use forward reference
technique to identify new
groups
Output ports hold the
value of the flag for each
group
11262004
www.infocrest.com
20
Use expressions to set each
language flag. Set flag to ’Y’ if
book is published in that
language. Uses selfreference to keep the
previous setting unless it is a
new group
Copyright ©2004
Lookup Transformation

Lookup Caches
1 - Informatica engine issues a SELECT
statement against the database when the first row
arrives at the lookup. The result set is split into 2
caches
3 - Index cache holds the columns
used in the lookup condition
2 - Data cache holds the columns in
the output ports
11262004
www.infocrest.com
4 - Caches are queried internally for each row passing
through the lookup. The query is based on the lookup
condition and the values of the lookup input ports
21
Copyright ©2004
Lookup Transformation

Lookup query


Always look at the session log and review the SQL query issued to
populate lookup caches
This query always includes an ORDER BY clause.





This clause is generated automatically by Informatica and cannot be modified
directly
Since every column is included in this ORDER BY, it can become very
expensive
The only way around is to override the SQL query, add your own ORDER BY
and finish the statement with a comment delimiter (such as -- )
Your custom ORDER BY must include the columns used in the lookup
condition, in the order their appear in the condition tab
The engine expects rows to be sorted in the caches, so an ORDER BY clause
is mandatory
TRANSF_1_1_1> DBG_21097 Default sql to create lookup cache: SELECT OrderID,CustomerID FROM
Orders ORDER BY CustomerID,OrderID
TRANSF_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Sep 02 07:58:34 2004)
TRANSF_1_1_1> DBG_21682 Lookup table row count : 830
TRANSF_1_1_1> DBG_21297 Lookup cache row count : 830
Session log extract
11262004
www.infocrest.com
23
Copyright ©2004
Lookup Transformation

Lookup query



If you only need a subset of the lookup table, override the SQL and write a WHERE
clause
You can perform joins in the SQL override as long as the joined tables come from
the same database
You must use the syntax SELECT <column name> AS <port name>. Your session
will fail otherwise even though the query may validate in the transformation
Use this field to set the lookup
condition: OrderCount >= Zero and
leave the ‘Zero’ port unconnected


You can use mapping variables and parameters in your override, for dynamic
queries
You have to cache the lookup when you have a SQL override, otherwise you will get
a run-time error
11262004
www.infocrest.com
24
Copyright ©2004
Lookup Transformation

Sharing Lookup Caches
 Un-named





Named






For multiple lookups on the same table in the same mapping
Automatically shared if condition ports are the same (although operators can
be different)
The first lookup transformation must extract the same data set as subsequent
lookups or extract a superset
Cannot share a dynamic lookup
To share lookups among sessions
Lookup structures must be identical (same conditions, same output ports)
SQL overrides must be identical
Session’s high precision setting must be the same for all sessions sharing the
lookup files
You mix dynamic and static shared lookups, but you can’t run them in
simultaneous sessions
Tip

11262004
www.infocrest.com
Make your shared lookup transformation reusable
25
Copyright ©2004
Lookup Transformation

Shared Lookup Usage
 Data Warehousing



Dimensions may need to be looked up several times during the load process
Optimize your workflow by having the first lookup create a named persistent lookup file
Alter your other lookups to read from these cache files
Example: Orders lookup used 3
times, in sessions 1, 2 and 3
First session must re-cache
from the source

Debugging



11262004
www.infocrest.com
Sessions with large lookups are time-consuming to debug
If you can’t use a subset of the lookup data, make the cache persistent
The first run will have to create the cache files but subsequent runs will be much faster
26
Copyright ©2004
Dynamic Lookups

Dynamic Lookups

Use when looking up a target table, when you need to keep lookup cache and target contents in
sync

Dynamic lookups can insert new rows or update existing rows in the cache (but cannot delete
rows)

Updated, inserted or unchanged rows are flagged by the lookup transformation via the
NewLookupRow port

The row type is not changed by the dynamic lookup, you need to set up Update Strategies
downstream to change the actual row type

Ports
Select ignore in comparison for fields you don’t
need to compare to determine the update status
Lookup fields
Source fields
11262004
www.infocrest.com
Lookup fields not used in the
condition must be manually
associated with their
corresponding source field
27
Select ignore null in
comparison so null source
values don’t trigger an update
Copyright ©2004
Dynamic Lookups

Ports (cont.)




You can associate a lookup field holding a surrogate key (integer or small integer) with the
special Sequence-ID value
The startup value of Sequence-ID is the highest field value found in the initial cache data for the
surrogate key field or 1 if the initial cache is empty
This value is incremented every time a new row is added to the lookup cache
You can use this feature in-lieu of a Sequence Generator transformation
Select Sequence-ID from the
drop-down menu

NewLookupRow Port


Pass this field to a router to branch out the data flow based on the insert/update/no-change
status
3 possible values:



11262004
www.infocrest.com
0 = no change
1 = row inserted in the lookup cache
2 = row updated in the lookup cache
28
Copyright ©2004
Dynamic Lookups

Properties
Lookup caching
must be enabled
Toggle dynamic lookup.
Automatically adds the
NewLookupRow port when on
When the lookup updates a row in the
cache, you can choose to return either old
(before the update) or new (after the
update) values. Useful for slowly changing
dimensions, when you need both values.
11262004
www.infocrest.com
29
Copyright ©2004
Dynamic Lookups

Properties (cont.)

Updating the cache




Insert Else Update




Lookup cache can be updated if the incoming source row exists in the cache but the values
of one or more fields differ between the source row and the cached row
By default, the dynamic lookup does not perform both insert and update operations on the
lookup cache
You have to turn on ‘insert else update’ or ‘update else insert’ properties to enable this
feature, depending on the row type coming into the transformation
When the row type is ‘insert’ (the default)
Off: lookup cache is not updated, only new source rows are inserted into the cache
On: both inserts and updates are performed
Update Else Insert



11262004
www.infocrest.com
When the row type is ‘update’ (you need an Update Strategies transformation upstream to
change the row type to update).
Off: no lookup cache inserts are performed, only updates
On: both inserts and updates are performed
31
Copyright ©2004
Dynamic Lookups

Caveats

SQL override with a WHERE clause
When using a subset of data for building the lookup cache
Lookup cache and target may get out of sync if your source stream does not use the same
filter as your WHERE clause


SQL override includes a WHERE
clause to exclude some productID
Filter must exclude the same
productIDs from the source


Ignore NULLS in comparison


When using the Sequence-ID feature to generate surrogate keys, you may get duplicate
key values if your lookup cache does not hold the entire data set
You may need an Expression Transformation after the dynamic lookup to make sure you
are not passing NULL values from the source into the target
Speed Penalty


11262004
www.infocrest.com
Dynamic lookups are a bit slower than regular lookups because they have to update their
cache contents on the fly
Biggest hit when the lookup cache is paging to disk and there are many updated or
inserted rows
32
Copyright ©2004
Transaction Control

Overview




Active (row level) transformation
Enable dynamic commit or rollback points
Works for relational targets or MQ series targets
Naming convention: TC_<what it does>
Iconic view
11262004
www.infocrest.com
33
Copyright ©2004
Transaction Control

Setting Up the Mapping
Define a Boolean condition
expression and a transaction type.
Transaction type will be executed
when the expression evaluates to
TRUE

Transaction constants

TC_COMMIT_BEFORE, TC_ROLLBACK_BEFORE:



TC_COMMIT_AFTER, TC_ROLLBACK_AFTER:



Commit /rollback rows for the current transaction, not including the current row.
Initiate a new transaction starting at the current row
Commit/rollback rows for the current transaction including the current row.
Will Initiate a new transaction starting after the current row
TC_CONTINUE_TRANSACTION

11262004
www.infocrest.com
Do not commit or rollback at this point
34
Copyright ©2004
Transaction Control

Setting Up the Session
Set to user defined automatically
when you have a valid transaction
control transformation
Checked by default, otherwise
server rollbacks left over
transactions if the last row does
not coincide with a commit point
You can tell the server to rollback
transactions from the last commit
point when it encounters non-fatal
errors
11262004
www.infocrest.com
35
Copyright ©2004
Transaction Based Processing


Transformation Scope Property

For Aggregator, Sorter, Joiner and Rank transformation

If ‘Transaction’ is selected, the transformation logic will only apply to rows within the same
transaction
Example scenario

Source:



Target


A ‘shipments’ table, containing one row per order per ship date
Challenge



A very large flat file containing order details, at the line item level
This file is already sorted by order id
You need to aggregate the data by order id and ship date to populate this table
The file is huge and you want to avoid either sorting the entire data set in the aggregator or sorting the data in
a pre-process
Transaction Based Solution


11262004
www.infocrest.com
The data is already sorted by order id. Any additional sorting can be done within the relatively small set of
order lines that comprise each order.
Each set of line items will trigger a new transaction, and the scope of the sorter and the aggregator will be a
single transaction
36
Copyright ©2004
Transaction Processing Example

Mapping
Transaction control, set the type
to TC_COMMIT_BEFORE every
time a new order id is encountered
Aggregate by order id and ship
date, sorted input is On
Expression to detect a change
in order ids. Compare the
current order id with the previous
order id, a change means we
have the first row of a new order
Sort by order id and ship date,
scope set to ‘Transaction‘
Transaction control, to reset the
commit point to target based
Sample input,
sorted by order id
but not by ship
date
Sample output, one row
per distinct order id and
ship date, ordered by
order id and ship date
11262004
www.infocrest.com
37
Copyright ©2004
Joiner

Sorted Input

Not selected




Cache the Master table rows in memory (or
disk)
Brings in detail rows one at the time and
compares them to the master rows
depending on your join type and condition
You get better performance if the master
table is the smallest of the two sources
even if, logically this table is not the master
Selected





11262004
www.infocrest.com
Works only if both sources are pre-sorted
Saves time by not having to cache the
entire master source
If your sources are not sorted, you can use
a SQL override or a Sorter transformation
before the Joiner
Pulls rows from both sources and
compares them using the join type and
condition
Fastest way to join if your sources are
already sorted, otherwise weigh in the cost
of using a database sort or a Sorter
transformation
39
Copyright ©2004
Joiner

Caching
Specify a cache directory on a
separate disk to reduce i/o
contentions
Use RAM cache only
for best performance
Monitor caches files
on server while
session is running
TRANSF_1_1_1> DBG_21077 Create joiner cache on master relation : (Wed Sep 08 05:31:04 2004)
TRANSF_1_1_1> CMN_1690 Created new data file [D:\Program Files\Informatica PowerCenter
7.1\Server\Cache\PMJNR530_19.dat] and index file [D:\Program Files\Informatica PowerCenter
7.1\Server\Cache\PMJNR530_19.idx] for Joiner [JNR_OldNew].
TRANSF_1_1_1> DBG_21214 Finished joiner cache on master relation : (Wed Sep 08 05:31:04 2004)
TRANSF_1_2_1> DBG_21603 Open master relation cache for detail joiner : (Wed Sep 08 05:31:04 2004)
TRANSF_1_2_1> DBG_21215 Finished processing detail relation : (Wed Sep 08 05:31:04 2004)
11262004
www.infocrest.com
40
Copyright ©2004
Joiner

Self-Join
Join two
instances of the
same source
Aggregate
dollars per
company
Join source data to
aggregated data
Join two
branches of the
same pipeline
11262004
www.infocrest.com
41
Copyright ©2004
Joiner

Self-Join




To join two branches of the same pipeline, you must use sorted data and turn on the
sorted input property
If you cannot use sorted data, you must read your source twice and join the two
pipelines
You can also join output from multi-group transformations such as Router, XML source
qualifier or Custom transformation
If you join 2 branches with an Aggregator in one branch, the Aggregator must use
sorted input as well
11262004
www.infocrest.com
42
Copyright ©2004
Stored Procedure

Pre and Post Load Stored Procedures
 Connections




$Source alias is automatically selected for Source Pre
and Post-Load types
$Target alias is selected for Target Pre and Post Load
types
When changing from a Source to a Target type, the
corresponding connection alias is restored, overwriting
any custom relational connections you might have
selected.
Properties tab, Stored
Procedure Type
Call Text
 Accepts only hard-coded values, mapping parameters or variables will not be
expanded
 Use Pre/Post SQL if you want dynamic stored procedure calls
 Use double quotes if your input parameters include spaces; single quotes are
ignored
11262004
www.infocrest.com
43
Copyright ©2004
Stored Procedure

Pre and Post Load Stored Procedures
 Execution Plan


To select the execution order of procedures having the same type
Interface is similar to the Target Load Plan function
1- Choose Stored Procedures
Plan… in the Mappings menu
2- Move stored procedures within a
group using the up and down arrows
11262004
www.infocrest.com
44
3- The Execution order
property is updated to
reflect the execution plan
Copyright ©2004
Stored Procedure

Pre and Post Load Stored Procedures
 Source Pre Load and Target Pre Load types



Source Post Load and Target Post Load types



Run after the mapping has finished writing to the target(s)
Similar to Post SQL statements
Debugging


Run before the mapping starts reading rows
Similar to Pre SQL statement
You need to use a session instance (within a workflow) to have the debugger
execute your Pre/Post Load procedures
Session error handling


Either Stop or Continue
If you choose Stop and your session fails on a Post Load procedure, your target
rows are already committed
Under Config Objects  Error Handling
11262004
www.infocrest.com
45
Copyright ©2004
Stored Procedure

Calling Normal Unconnected Stored Procedures
 Using PROC_RESULT, one output parameter, return value is lost

Using a local variable, one output parameter, return value is kept
Local variable and associated output parameter must have the same datatype

Using PROC_RESULT and a local variable, two output parameters

11262004
www.infocrest.com
47
Copyright ©2004
Stored Procedure

Calling Normal Unconnected Stored Procedures

Nesting Stored Procedure calls using PROC_RESULT


The output of the innermost procedure is passed as an input parameter to the
outermost procedure
You can use PROC_RESULT once per procedure call

Within a conditional expression
 Your expression must evaluate to TRUE for the procedure to be called

Within an expression attached to an output port
 The port must be connected forward to the data flow or your procedure will not
run
11262004
www.infocrest.com
48
Copyright ©2004
Stored Procedure

Normal Stored Procedures
 Session Error Handling



Fatal errors in connected or unconnected normal procedures will always cause the
session to fail
Non-fatal errors will increment the session error counter. Your session will fail when
this counter reaches the threshold set in the session properties
Non-fatal error rows are skipped by the server and written the the session log and
reject file
11262004
www.infocrest.com
49
Copyright ©2004
Debugger Tips
11262004
www.infocrest.com
50
Copyright ©2004
Debugger Tips

Reusable sessions
 Create a reusable session to debug your mapping when practical
Use for both debugger and workflow
Cuts down on development time, specially if you have lots of sources and
targets
Many parameters are not available in debugger sessions




Using Existing Sessions
 Choose a session instance to run the debugger within the context of a
workflow



If you want to test mapping or workflow parameters and variables
If you want your debug session to execute Pre or Post Load stored procedures
Choose a session instance or a reusable session to test all session
attributes not available with a simple debug session

11262004
www.infocrest.com
To run session components such as command tasks or emails
51
Copyright ©2004
Debugger Tips

Drawback

Remember to validate and save the reusable session when you make a big
change in your mapping





Adding source or target
Adding transformation
Sometimes, you’ll have to disconnect and reopen your folder in the Workflow
Manager to register mapping changes
Beware of overridden parameters, like SQL overrides
 Session override takes precedence over the mapping
 Session instance overrides take precedence over the reusable session
object
Configuration objects


You can’t specify a line buffer size or a block buffer size in a debug session
But you can create a configuration object with the settings you want and use it
with your debug session
11262004
www.infocrest.com
52
Copyright ©2004
Debugger Tips

Workspace

Organize your workspace to display


Debug & session logs at the bottom, using full window length
Transformation instances and target instance above, side by side
Remember to switch to session log pane
after the debug session is initialized

First row

Monitor the time it takes to bring in the first row of data
 It is acceptable ?
 If not, review the SQL query
 Or, a flat file source may be rejecting all the rows
11262004
www.infocrest.com
Line buffer too small
Wrong date/time conversions
53
Copyright ©2004
Debugger Tips

Source data

Always examine source data closely during the first run




Look for bad or unexpected data formats (date/time, amounts)
Look for truncated data
Make sure the data is indeed what you expect
Data movement

Follow data through each transformation during the first run



Pay attention to unnecessary type conversions and back and forth type
conversions
Verify the logic in each complex expression
Look at how the data moves within your mapping



Do you have too many data paths ?
It there a way to simplify the logic ?
Record the time it takes to load cached lookups




11262004
www.infocrest.com
Review the SQL query in the session log
Do you need to bring in all the fields in the lookup ?
You may want to override the ORDER BY clause inserted automatically by the server
Do you have a complex join condition ?
» Make sure the conditions with an equal sign ‘=‘ are put at the top of the list
55
Copyright ©2004
Debugger Tips

Evaluate Expression


Find out the value of a port, a variable or enter your own
expression to evaluate
Not available for every transformation, only for:







Aggregator
Expression
Filter
Rank
Router
Update Strategy
Results are displayed in between square brackets as in
[VALUE]


Easy to spot unexpected padded char values, such as [VALUE ]
Unexpected padded values cause trouble in comparisons



Lookups
Expressions
Find out the value of a mapping parameter or variable


Start value, enter the name of the variable or parameter
Current value, use the variable or parameter with a variable function



11262004
www.infocrest.com
SETMAXVARIABLE
SETMINVARIABLE
SETCOUNTVARIABLE
56
Copyright ©2004
Debugger Tips

Discard or Not Discard ?

When discarding target data, the Informatica server does NOT








execute pre or post SQL commands against the target database
write sequence generator values to the repository
truncate target tables
prepare insert or delete SQL statements against the targets
verify if the target tables actually exist in the target database
Choose to write target data for a true test of the target system
The debugger only commits rows to the database at the end of a completed
(successful) debug session
Debugger shutdown

If running, click first on the ‘Break Now’ icon
11262004
www.infocrest.com
57
for a faster response
Copyright ©2004
Mapping Error
Handling
11262004
www.infocrest.com
58
Copyright ©2004
Error Handling
1.
Guidelines
2.
Overview of most common errors
3.
Handling data errors
4.
Handling duplicate key errors
5.
Example of an error table
6.
Performing audits
11262004
www.infocrest.com
59
Copyright ©2004
Error Handling Guidelines

Data quality
 Develop a set of standards or adopt existing company standards
 Standards should include

Definition of an error


Error handling procedures in your mappings and sessions


How data gets rejected or corrected
What to do with rejected data




What data gets rejected and why
Ignore
Save for manual review
Save for automated reload
Define your error strategy
 Errors are inevitable
 A good error strategy tries to:




Catch errors and reject bad rows
Store these rows somewhere
Set a threshold for an acceptable number of errors
Provide a way to report on rejected rows as part of the load process
11262004
www.infocrest.com
60
Copyright ©2004
Typical Errors

Data errors





Database errors






The load process did not terminate within the allotted time window
Audit/Balance



A session or a job on which your session depends did not run properly
Missing trigger file
Time constraints


Exceeds error threshold
Wrong connection settings or line buffer too small
Insufficient resources (disk space or memory)
Dependencies errors


Row violates primary key constraints
Inserting a NULL value in a non null field
Session errors


Incorrect or corrupted source data
Unexpected values
Data conflicts with business rules
Referential integrity failure
Row counts or sum totals do not match
Data mismatch between production systems and data warehouse
Server errors



Informatica server or database server goes down
Bad network connections
Hardware failure
11262004
www.infocrest.com
61
Copyright ©2004
Data Errors

Handling in mappings
 At the field level



Program defensively, don’t leave open doors or loose ends
Handle NULL values, blanks or zeroes consistently and in accordance to your error
strategy
Use consistent default values for cases such as





Make sure these default values are understood and approved by your users
Override the default ERROR() function for output ports


Missing or blank data
Invalid or NULL lookup return value
Invalid or corrupted source data
Use a constant value or expression
At the row level

Use a custom ERROR() function





To force a row to be skipped
To describe the details of an error condition in the session log
Beware, this method can throw off a Joiner Transformation downstream
Rows rejected this way will not be written to the reject file
Transfer rejected rows to a dedicated error table or error file



11262004
www.infocrest.com
For automated reloads
For audit purposes
Most flexible solution but also the most expensive in terms of work and processing
overhead
63
Copyright ©2004
Duplicate Keys

Informatica Server behavior
 During normal load (row by row)





Using External loaders




Rows that violate database constraints will be rejected by the database
Rejected rows are written to the reject file (.bad)
Details also appear in the session log (normal tracing)
Error handling is automatic but there is a big performance hit if there is more than a few errors
Load may succeed but indexes will be left in unusable state
Rejected rows will be written to the .ldrreject file
Error details may be found in the .ldrlog file
Handling
 If performing full refreshes


If performing updates



whenever possible, eliminate duplicates at the source, before the load
Load in normal mode and let Informatica handle it
Run post process audit to correct rejected rows
Alternative, when performing full refreshes or updates


Be proactive and catch potential duplicates inside your mapping
Most expensive solution, reserve for critical tables
11262004
www.infocrest.com
64
Copyright ©2004
Duplicate Keys

Normal mapping processing
goes here…
Mapping solutions for full refresh
1 - Relational source,
duplicate rows routed to an
error table
11262004
www.infocrest.com
2 - Flat file source, duplicate
rows destroyed, last row of
duplicate group sent to the
target
Normal mapping processing
goes here…
65
Copyright ©2004
Audits

Create a versatile error table

Should provide storage for all types of audit errors










Not tied to one particular source or target table
format
One row per error
All data values displayed as string
Includes a concatenated value of the primary key
fields to reference the source or target system
Source and target names are fully qualified
Identifies the process (mapping) that wrote rows to
the table
Data type
String
32
TargetName
String
32
SourceFieldName
String
32
SourceFieldValue
String
500
TargetFieldname
String
32
TargetFieldValue
String
500
PrimaryKeyConcat
String
100
ErrorType
String
16
AuditProcess
String
64
Date/time
19
InsertDate
If you need to keep data for several days, archive the daily table at the end of each
load process
Run an automated report

After each load

Total number of errors per error type
of sources and targets having triggered an error
11262004

List
www.infocrest.com
66
Precision
SourceName
Truncate the error table before each load


Primary key errors
Data errors
Row count errors
Control totals errors
Should be reused by all mappings that perform
audits


Field name
Copyright ©2004
Audits

Row by row audit

For critical tables, when extracting data from production systems to data warehouse
Source must be unchanged
or staged for the audit to work
Joiner needed only when
source and target in different
databases or source is a flat
or VSAM file
11262004
www.infocrest.com
67
Copyright ©2004
Workflow
Manager
11262004
www.infocrest.com
68
Copyright ©2004
Assembling Workflows

Step 1


Step 2



Assemble worklets into workflow
Implement workflow level extra functionality
Step 5


Assemble load stages and/or subject areas into worklets
Step 4


Gather unit-tested mappings into reusable sessions
Step 3


Design flow, analyze dependencies and devise error strategy
Implement trigger mechanisms and scheduler
Step 6

Test worklets, workflow, triggers and scheduler
11262004
www.infocrest.com
69
Copyright ©2004
Step 1 - Design

Large Workflows





Complex workflows handling complete loads offer more flexibility and functionality
than a collection of small workflows
The new workflow manager interface makes it easy to design large and complex load
processes
The workflow manager workspace window is self documenting
The workflow monitor will give you a very good idea of the load’s progress, using the
Gantt chart view
Exception
 If you have an external scheduler, you can either:


11262004
www.infocrest.com
Run single session workflows and let the scheduler handle Informatica job
dependencies
Run more complex workflows from your scheduler and let the workflows handle
the job dependencies
71
Copyright ©2004
Step 1 - Design

The Big Picture




Analyze your sources and targets to determine your dependencies at the
mapping level
Devise triggering mechanisms to start the process when the sources are ready
Use a diagram to visualize the flow of data
Design your error strategy at the session level and workflow level



What to do when a session fails
Audit and balancing procedures to capture data discrepancies
Think about production issues


11262004
www.infocrest.com
Who will get noticed when an error occurs and by what means
Design restart-ability into your workflows, worklets and mappings
72
Copyright ©2004
Step 2 - Sessions

Reusable Sessions


After a session is unit-tested, move it to the project folder as a
reusable object
Reusable sessions are easier to manage




Easy to copy between folders
Can be exported/imported as XML objects
Easy to integrate into worklets and nested worklets
All reusable sessions conveniently listed in the Navigator, under the
‘Sessions’ folder.
Reusable session instance
11262004
www.infocrest.com
73
Copyright ©2004
Step 2 - Sessions

Overriding Reusable Sessions Instances


Reusable sessions provides an additional level of override at the instance
level, either in a workflow or a worklet
The instance override takes precedence over the master object



If you change a property in the master object and that property is overridden at the
instance level, the override will still be in effect
You must use the revert button to cancel an override, even if the value of the
overridden property is the same as the value in the master object
Transformation properties cannot be overridden at the instance level
The target connection in this instance
has been changed. Click Revert to
return to the connection specified in the
original session object
11262004
www.infocrest.com
74
Copyright ©2004
Step 2 - Sessions

Error Strategies

Use a reusable session configuration to:



set the error threshold
set the number of session logs to save
set the log tracing level…
1- Define common error handling behavior
in a session configuration object
2- Click to pick your reusable session
config from a menu of available
configurations
11262004
www.infocrest.com
75
Copyright ©2004
Step 2 - Sessions

Error Strategies

Use a reusable email objects in your session components to send
pre-formatted messages on failure or success
1- Create an email task you can reuse
for each session failure
11262004
www.infocrest.com
2- attach this email to your sessions as a
reusable component
76
Copyright ©2004
Step 2 – Sessions And Tasks

Error Strategies


Use task general properties to implement error handling at the task
level
Fail parent…




Fail parent if this task fails


Simply marks the enclosing Workflow or Worklet as failed
Does not stop or abort the enclosing Workflow or Worklet
Use when your scheduler relies on a failed status code to flag an error at
the workflow level
Changes the end status of the enclosing Workflow to failed when the task
fails
Fail parent if this task does not run


11262004
www.infocrest.com
Changes the end status of the enclosing Workflow to failed when the task
does not run
For instance, if a link condition between the previous task and this task
evaluates to FALSE
77
Copyright ©2004

advanced72

Transcript advanced72

Directory