Probabilistic Record Linkage and Address Standardization

November 17, 2015

11:25am - 12:40pm

SB 220

Home
Events
Probabilistic Record Linkage and Address Standardization

Host

Lulu Kang

Speaker

Sou-Cheng Choi
Senior Statistician in NORC, University of Chicago; Research Assistant Professor, Applied Mathematics, IIT

Description

Probabilistic record linkage (PRL) refers to the process of matching records from different data sources, such as database tables with missing data in primary key. It can be applied to join or deduplicate records or to impute missing data, resulting in better data quality in any case. An important subproblem in PRL is to parse or standardize a text field such as address into its component fields, e.g., street number, street name, city, state, zip code, and country. Often, various modern data analysis techniques such as natural language processing and machine learning methods are gainfully employed in both PRL and address standardization to achieve high accuracies of linking or prediction. In a study, we compare the performance of a few widely used open source PRL packages, namely FRIL, Link Plus, R RecordLinkage, and SERF. In addition, we evaluate the baseline performance and sensitivity of a number of address-parsing web services, including the U.S. address parser, Google Maps APIs, Geocoder.us, and Data Science Toolkit. We will present the strengths and limitations of the software and services we have evaluated. This is joint work with Edward Mulrow, NORC at the University of Chicago.

Event Topic

Data Science

Seminar

Probabilistic Record Linkage and Address Standardization

Host

Speaker

Description

Event Topic

Tags:

Learn more...

Probabilistic Record Linkage and Address Standardization

Time

Locations

Host

Speaker

Description

Event Topic

Tags:

Learn more...

We use technologies such as cookies to customize content and advertising, to provide social media features, and to analyze traffic to the site. By using or registering on any portion of this site, you agree to our privacy and cookie statement.