2022-10-24 Affiliation matching improvements (API only)
Based on user feedback, we've released a set of tweaks to ROR's affiliation matching service focused on improving precision and reducing false positive results. The updated affiliation matching service is now available in production, at the existing affiliation matching endpoint
https://api.ror.org/organziations?affiliation=. While the search behavior has changed slightly, request and response format remains remains unchanged. Many thanks to ROR integrators who helped to beta test these improvements! Read the rest of this post for details about the changes.
Check for exact matches first
- Previously, the entered string was split into multiple substrings and many searches were performed without checking to see if an exact match of the entered string existed in a ROR record.
- Now, a search for an exact match of the entered string in name, aliases, labels and acronyms fields is performed before performing additional searches. If there's an exact match with a perfect score of 1.0, the result is returned immediately and no further searches are performed. A new
EXACTis returned with any matches made using this method.
De-prioritize acronym search results
- Previously, results generated using the
matching_type: ACRONYM, which extracts and search for any sets of 3 or more capitalized letters (except ISO3 country codes) from the original search string, were weighted similarly to other matching types. This produced many false positive results.
matching_type: ACRONYMdoes not produce results with
Don't mark multiple results as chosen = True
- Previously, multiple results with a value of
chosenfield were sometimes returned.
chosenis only set to
Trueif there is a single result that is a highly probable match. Multiple results with high scores indicate ambiguity, so
chosenis set to
Falsefor all results in that case.
Generate fewer search substrings
- Previously, for search strings containing multiple words, many substrings were generated and searched independently, resulting in many cases of false positive results (irrelevant substrings with high matching scores)
- Now, search strings are split into multiple substrings only at , (comma) ; (semicolon) and : (colon) characters. The full original search string is also included.
Exclude search strings that match specific common phrases, country names, country codes and city names
- Previously, substrings which matched common phrases (such as "University Hospital"), country names/codes or city names produced many false positive results
- Now, substrings that match common phrases, country names, city names and iso2 or iso3 country codes are ignored.
Strip special chars (except &) from search strings
- Previously, special characters were left untouched in search strings and generated substrings, which occasionally resulted in errors or "missed" matches.
- Now, special characters (except &) are stripped from search strings and substrings
Only return results with a score of >= .5
- Previously, all results were included in the result set.
- Now, only results with a matching score >= .5 are included in the result set.
12 character minimum no longer required
- Previously, 0 results were returned for search strings of < 12 characters, unless an exact match as found.
- Now, there is no minimum character threshold.
Changes above are currently available in the existing affiliation matching endpoint,
The only usage changes that have been made are:
- A minimum of 12 characters is no longer required
- In results, a new matching_type
EXACThas been added
All other usage remains as described in REST API > Search ROR Records > Affiliation parameter for usage instructions.