关键词:数据质量; 记录连接; 匹配; 编辑距离; Levenshtein算法; Jaro-Winkler算法
String Matching Algorithm and its Realization
Based on Character Editor
ABSTRACT
With the rapid development of information technology and various data generation and data acquisition equipment widely used ,the amount of data which people get is increasing by exponential,however, the huge amounts of data which people get in the convenience of access to information has not been effective improvement, one of reseaons is that data quality significantly decreased and insufficient to meet the application requirements.
This paper introduces the necessarity of researching data quality and describes the current hot topic of data quality ,then puts an emphasis on introducing through the records to improve data quality problems. Through the matching technology in the edit distance, Jaro-Winkler algorithm to achieve the purpose of record linkage,then describe the Principles and implementation of the algorithm .Through Introduces the useage of the edit distance algorithm, Jaro-Winkler algorithm of matching technology and how to realize them ,through calculating the similarity of two records to solve the character-based string matching editor to achieve detection of duplicate records ,finally looks forward to the research on matching technology for data quality.