Overview
Data cleansing and standardization is an important aspect of any Master Data Management (MDM) project. Informatica MDM Multi-Domain Edition (MDE) provides reasonable number of cleanse functions out-of-the-box. However, there are requirements when the OOTB cleanse functions are not enough and there is a need for comprehensive functions to achieve data cleansing and standardization, for e.g. address validation, sequence generation. (IDQ) provides an extensive array of cleansing and standardization options. IDQ can easily be used along with .
This blog post describes the various options to integrate Informatica MDM and IDQ, explains the advantages and disadvantages of each approach to aid in deciding the optimal approach based on the requirements.
Webinar
Nexus of a successful data-driven bank
Informatica MDM-IDQ Integration Options
There are three options through which IDQ can be integrated with Informatica MDM.
Informatica Platform stagingIDQ Cleanse LibraryInformatica MDM as target
Option 1: Informatica Platform Staging
Starting with Informatica MDM's Multi-Domain Edition (MDE) version 10.x, Informatica has introduced a new feature called "Informatica Platform Staging" within MDM to integrate with IDQ (Developer Tool). This feature enables to directly stage/cleanse data using IDQ mappings to MDM's Stage tables bypassing Landing tables.
FIGURE 1: INFORMATICA PLATFORM STAGING PROCESS
Advantages
Stage tables are immediately available to use in the Developer tool after synchronization eliminating the need to manually create physical data objects.Changes to the synchronized structures are reflected into the Developer tool automatically.Enables loading data into Informatica MDM's staging tables bypassing the landing tables.
Disadvantages
Creating connection for each Base Object folder in Developer tool can be cumbersome to maintain.Hub Stage options like Delta detection, hard delete detection, and audit trails are not available.System generated columns need to be populated manually.Rejected records are not captured in _REJ table of the corresponding stage table but get captured in .bad file.Invalid lookup values are not rejected while data loads to stage unlike in Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.
Option 2: IDQ Cleanse Library
IDQ allows creating functions as operation mappings and deploy them as web service which can then be imported in Informatica MDM Hub implementation as a new type of cleanse library defined as IDQ cleanse library. This functionality allows usage of the imported IDQ cleanse functions, just like any other out-of-the-box cleanse function. Informatica MDM Hub acts as a Web service client application that consumes IDQ's web services.
FIGURE 2: HUB STAGE PROCESS DESIGN USING IDQ IMPORTED FUNCTION
Advantages
Easily build transformations in IDQ's Informatica Developer tool rather than creating complex java functions.Unlike Informatica Platform staging, Hub Stage process options such as delta detection, hard delete detection, audit trail are available for use.
Disadvantages
Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.IDQ function must contain all transformation logic to leverage batching of records. If any transformation logic is additionally defined in the MDM map, then calls to the IDQ will be single record leading to performance issue.Web service invocations are synchronous only, which can be a concern for large data volume.
Option 3: Informatica MDM as target
3.1 Loading data landing tables
Informatica MDM can be used as target for loading the data to landing tables in Informatica MDM.
FIGURE 3: USING IDQ AS ETL TOOL FOR LOADING DATA INTO LANDING TABLE
Advantages
Single connection created in Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.No need to standardize data in the Hub Stage Process.Unlike Informatica Platform staging, Hub Stage process options – delta detection, hard delete detection, audit trail are available to use.
Disadvantages
Physical data objects need to be manually created for each landing table and manually updated for any changes to the table.Need to create mappings at two levels (i) source to landing and (ii) landing to staging (direct mapping).
3.2 Loading data staging tables (bypassing landing tables)
Informatica MDM can be used as target for loading the directly to staging tables in Informatica MDM, bypassing landing tables.
FIGURE 4: USING IDQ AS ETL TOOL TO LOAD DATA DIRECTLY INTO STAGING TABLES
Advantages
Single connection created in Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.Can be used for lower version of Informatica MDM where Informatica Platform staging option is not available.
Disadvantages
Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.Hub Stage Delta detection, hard delete detection, and audit trails options are not available.System generated columns need to be populated manually.Rejected records are not captured in _REJ table of the corresponding stage table but get captured in .bad file.Invalid lookup values are not rejected while data loads to stage unlike in Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.Conclusion
While there are multiple options to integrate IDQ with Informatica MDM, based on client's requirements, the ideal approach is analyzed and recommended.
YOU ARE READING
How to integrate Informatica Data Quality (IDQ) with Informatica MDM
Short StoryData cleansing and standardization is an important aspect of any Master Data Management (MDM) project. Informatica MDM Multi-Domain Edition (MDE) provides reasonable number of cleanse functions out-of-the-box. However, there are requirements when th...