T: +353 (0)1 7071750
e: info@dbase.ie
e: info@dbase.ie
Data Deduplication and Cleansing
Dealing with millions of poorly structured, misspelt, abbreviated or missing data is a daunting but unavoidable data healthcare task.
Poorly maintained data destroys credibility, reduces loyalty and adversely impacts on your brand. Maintaining the all round accuracy and format of your data:
• Avoids the waste caused by duplicate mailings
• Saves you time and money
• Ensures a single view of your customers
• Links disparate data sources to deliver a single version of the truth
• Delivers accurate information for better informed decision making
Dealing with millions of poorly structured, misspelt, abbreviated or missing data is a daunting but unavoidable data healthcare task.
Poorly maintained data destroys credibility, reduces loyalty and adversely impacts on your brand. Maintaining the all round accuracy and format of your data:
• Avoids the waste caused by duplicate mailings
• Saves you time and money
• Ensures a single view of your customers
• Links disparate data sources to deliver a single version of the truth
• Delivers accurate information for better informed decision making
A holistic treatment for all-round wellbeing
Match is a batch remedy with the power to perform a number of healthy data treatments however large and infected the data set. With match you can interrogate all your data in one go, search for duplicate records, link data in disparate systems, amend poorly formatted names and addresses,transfer information between matching records (merge) and remove out-of-date information (purge) your data.
Features
Direct database connectivity
Avoid error-prone import/export routines by connecting directly into your data sources. Native drivers are available for:
• Oracle
• SQL Server
• Pivotal
• SalesLogix
• Goldmine
• Paradox
• dbase
• CSV
Alternatively, Excel or UDL are available as fall back options.
Drag and drop
Quickly review possible duplicates and enhance the records with drag and drop technology.
Intelligent phonetic matching
Unsurpassed phonetic matching capabilities, utilising DQ’s own Fonetix™ algorithms which consistently out-performs SoundEx.
Intra-matching technologies match searches across an unlimited number of database fields to identify potential matches - even when data is misaligned by column.
Customisable match definitions
Avoid extraneous data affecting the overall ‘match score’ by allowing users to define the criteria by which match returns potential duplicates.
Record survivorship the One2One Manager creates the perfect record by dragging and dropping fields from duplicates. A great prepackaged example of the integrated Visual Basic capability.
Modular approach
Create the application of your choice by choosing the match base module and adding the VB Developer module or multiple interactive user seats as required.
Isolate, flag or delete duplicate records
Users have complete control over the treatment of identified duplicates. Match will flag duplicates directly to the database and generate an instantly cleaned file, group multiple matches together for further interaction, or delete the duplicate data.
User-defined merge/append/update functionality
Process potential duplicates with ease - review, enhance, flag, link,update or delete as required.
Generating a match results list for reference match generates a result list of master and duplicate ID relationships, along with their percentage match score and groupings. These files are ideal for passing back to data owners as a linking reference of which records are duplicates of others, or for manually checking potential duplicate records at a lower percentage ‘match score’ Reassigning orphan records.
Reassign Orphaned Records
To ensure that database integrity is maintained in relational databases, a ‘Reassign Orphaned Records’ Wizard guides the user through the process of reassigning potentially orphaned data before duplicate records are deleted.
Data standardisation and enhancement
Apply intelligent name-case, business/address elements, and gender transformations to data. Create customised transformations that will standardise address information and screen out salacious or ‘junk’ data.
Visual basic capability
Powerful VB scripting functionality enables developers to define and implement their own matching criteria, extending the functionality of the product by writing and including those scripts directly into the product. You can integrate virtually any other application, perform specific pre or post processing functions. Integrate with Data Warehouses, ETL, CRM, ERP,GIS, EIS, BI and any other data visualization or migration application – the possibilities are endless.
International capability match supports 5 international languages: English, French, German, Italian and Spanish.
Complex data symptoms are a healthcare challenge
Dealing with millions of poorly structured, misspelt, abbreviated or missing data is a daunting but unavoidable data healthcare task.
Poorly maintained data destroys credibility, reduces loyalty and adversely impacts on your brand. Maintaining the all round accuracy and format of your data:
• Avoids the waste caused by duplicate mailings
• Saves you time and money
• Ensures a single view of your customers
• Links disparate data sources to deliver a single version of the truth
• Delivers accurate information for better informed decision making
Facts and figures
• Interactive GUI
• Fast loading review screen
• Process 3 million records per hour
• B2B and B2C compliant
• Export of screens via XML
• Unique security module and audit trail
• Full integration with CRM, ETL, ERP and BI tools
Minimum technical requirements
Minimum Recommended
CPU P4 1.7 GHz P4 2.4GHz
RAM 256Mb 1 Gb
Disk Space 512 Mb 2Gb
The larger the database being deduplicated, the more space will be required.
Bench Mark 14.4 mins 9 mins. Based on multiple sessions between 120k records and 1.3m records (average 250k) on SQL7 and SQL2000 databases.
Supported operating systems Operating systems Version
MicrosoftTM Windows 98
MicrosoftTM Windows 98SE
MicrosoftTM Windows NT4
MicrosoftTM Windows 2000
MicrosoftTM Windows XP
MicrosoftTM Windows 2003 Server
Supported databases
Database Version
CSV/TXT Not applicable
Excel 97 or Higher
Access 2000 or Higher
dbase IV or Higher
FoxPro 6 or Higher
Oracle 8 or higher
SQLServer 7 or higher
SalesLogix™ 6.x via OLEDB Provider
GoldMine XBase and SQL via APIs
Match is a batch remedy with the power to perform a number of healthy data treatments however large and infected the data set. With match you can interrogate all your data in one go, search for duplicate records, link data in disparate systems, amend poorly formatted names and addresses,transfer information between matching records (merge) and remove out-of-date information (purge) your data.
Features
Direct database connectivity
Avoid error-prone import/export routines by connecting directly into your data sources. Native drivers are available for:
• Oracle
• SQL Server
• Pivotal
• SalesLogix
• Goldmine
• Paradox
• dbase
• CSV
Alternatively, Excel or UDL are available as fall back options.
Drag and drop
Quickly review possible duplicates and enhance the records with drag and drop technology.
Intelligent phonetic matching
Unsurpassed phonetic matching capabilities, utilising DQ’s own Fonetix™ algorithms which consistently out-performs SoundEx.
Intra-matching technologies match searches across an unlimited number of database fields to identify potential matches - even when data is misaligned by column.
Customisable match definitions
Avoid extraneous data affecting the overall ‘match score’ by allowing users to define the criteria by which match returns potential duplicates.
Record survivorship the One2One Manager creates the perfect record by dragging and dropping fields from duplicates. A great prepackaged example of the integrated Visual Basic capability.
Modular approach
Create the application of your choice by choosing the match base module and adding the VB Developer module or multiple interactive user seats as required.
Isolate, flag or delete duplicate records
Users have complete control over the treatment of identified duplicates. Match will flag duplicates directly to the database and generate an instantly cleaned file, group multiple matches together for further interaction, or delete the duplicate data.
User-defined merge/append/update functionality
Process potential duplicates with ease - review, enhance, flag, link,update or delete as required.
Generating a match results list for reference match generates a result list of master and duplicate ID relationships, along with their percentage match score and groupings. These files are ideal for passing back to data owners as a linking reference of which records are duplicates of others, or for manually checking potential duplicate records at a lower percentage ‘match score’ Reassigning orphan records.
Reassign Orphaned Records
To ensure that database integrity is maintained in relational databases, a ‘Reassign Orphaned Records’ Wizard guides the user through the process of reassigning potentially orphaned data before duplicate records are deleted.
Data standardisation and enhancement
Apply intelligent name-case, business/address elements, and gender transformations to data. Create customised transformations that will standardise address information and screen out salacious or ‘junk’ data.
Visual basic capability
Powerful VB scripting functionality enables developers to define and implement their own matching criteria, extending the functionality of the product by writing and including those scripts directly into the product. You can integrate virtually any other application, perform specific pre or post processing functions. Integrate with Data Warehouses, ETL, CRM, ERP,GIS, EIS, BI and any other data visualization or migration application – the possibilities are endless.
International capability match supports 5 international languages: English, French, German, Italian and Spanish.
Complex data symptoms are a healthcare challenge
Dealing with millions of poorly structured, misspelt, abbreviated or missing data is a daunting but unavoidable data healthcare task.
Poorly maintained data destroys credibility, reduces loyalty and adversely impacts on your brand. Maintaining the all round accuracy and format of your data:
• Avoids the waste caused by duplicate mailings
• Saves you time and money
• Ensures a single view of your customers
• Links disparate data sources to deliver a single version of the truth
• Delivers accurate information for better informed decision making
Facts and figures
• Interactive GUI
• Fast loading review screen
• Process 3 million records per hour
• B2B and B2C compliant
• Export of screens via XML
• Unique security module and audit trail
• Full integration with CRM, ETL, ERP and BI tools
Minimum technical requirements
Minimum Recommended
CPU P4 1.7 GHz P4 2.4GHz
RAM 256Mb 1 Gb
Disk Space 512 Mb 2Gb
The larger the database being deduplicated, the more space will be required.
Bench Mark 14.4 mins 9 mins. Based on multiple sessions between 120k records and 1.3m records (average 250k) on SQL7 and SQL2000 databases.
Supported operating systems Operating systems Version
MicrosoftTM Windows 98
MicrosoftTM Windows 98SE
MicrosoftTM Windows NT4
MicrosoftTM Windows 2000
MicrosoftTM Windows XP
MicrosoftTM Windows 2003 Server
Supported databases
Database Version
CSV/TXT Not applicable
Excel 97 or Higher
Access 2000 or Higher
dbase IV or Higher
FoxPro 6 or Higher
Oracle 8 or higher
SQLServer 7 or higher
SalesLogix™ 6.x via OLEDB Provider
GoldMine XBase and SQL via APIs