2024-04-11 Schema & API v2

Introducing v2

After over a year planning and many rounds of community input, we are thrilled to release ROR's first major schema and API update (version 2.0)! Per ROR's versioning policy, v1 will continue to remain available in the API and data dump through at least April 2025, and will likely be available for 6-12 months beyond that date.

Below is a list of changes deployed to the schema, API, data dump and search user interface. For complete documentation in v2, see https://ror.readme.io/v2/docs .

Schema changes

The following changes have been implemented in schema v2.0, based on input received through multiple rounds of community feedback (see Schema v2 feedback documents). The previous schema (which was originally unversioned but is now referred to as v1.0) remains unchanged. For additional details and examples see schema v2 documentation and the v2.0 JSON schema document

  • Name information previously in name, acronyms, aliases, and labels fields is now contained in 1 parent field, names with subfields lang, value and types. Please note that the lang subfield has only been populated for names with labels in their types. The curation team will be working on adding language codes to other names types over the coming months.
  • Location information previously in addresses field is now in locations field with subfields geonames_id and geoneames_details. Many fields containing very granular information derived from Geonames have been removed, as this information is avilable directly from Geonames. Additionally, country code and name information previously in the country field has been moved to locations.geonames_details.country_code and locations.geonames_details.country_name
  • Website/domain information previously in links and wikipedia_url have been combined into a 1 parent field links with subfields type and value. The ip_addresses field has been removed (it was not populated by GRID for any records). The domains field has been added, however, please note that this field has not yet been populated. The curation team will be working on this over the coming months.
  • External identifiers information has been restructured within the existing external_ids field. Each item in external_ids now has subfields type, all and preferred. The data type for all is a list for each external_ids item, whereas it was previously a string for GRID IDs and a list for other ID types.
  • Administrative information was not included previously. A new parent field admin has been added, which contains subfields created and last_modified. Each of those subfields contains additional subfields date and schema_version. Created date for each record was extracted from previous GRID and ROR releases. Last modified dates were extracted from ROR releases only, as, at a minimum, each record in ROR has been modified by the ROR curation team to add a ROR ID in the id field.
  • Controlled lists previously had variations in casing. For example, values in the types and relationships.type fields began with an uppercase character, while values in status were lowercase and external ID types contained a variety of casings. In v2, allowed values in controlled lists are consistently lowercase, with the exception of country codes derived from ISO-3166, which are uppercase per the standard.

Important notes about v2 record data

There are several new fields/subfields in v2, and the dataset used in the beta has not been fully updated with values in all new fields/subfields. In particular:

  • Created/last modified dates HAVE been added to all records, using actual dates from GRID and ROR data releases.
  • Domains HAVE NOT been added. This field is currently an empty list for all records. This field requires careful curation to ensure accuracy. We plan to add data to this field over the coming months.
  • Language codes for items in the names fields are only included for names inherited from the labels field in the current schema. Language codes HAVE NOT been added for names inherited from the name and aliases fields in the current schema. We plan to add language codes over the coming months, with the goal of ensuring that (minimally) each name with “ror_display” in its types has a language code.

API changes

  • API now supports versioning, with v1 or v2 supplied in the path portion of a request, ex https://api.ror.org/v2/organizations. The same data is available in both versions; responses are formatted according to the version in the request path.
  • If no version is supplied, a default version is used. v1 will remain the default through April 2025.
  • For v2, in addition to following the v2 schema, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set.
  • A new organization type funder is available when filtering results based on organization type
  • Because v2 contains different fields from v1, fields available to search using the advanced query functionality https://api.ror.org/v2/organizations?query.advanced= are different from v1. See v2 advanced query documentation. A notable addition is the ability to search by created or last modified date!
  • All other API functionality is identical to v2; records in responses are simply returned in v2 format. Records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
  • v1 API functionality is unchanged. Records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.

Data dump changes

ROR data dumps continue to be available in Zenodo at https://doi.org/10.5281/zenodo.6347574. Beginning with release v1.45 on 11 April 2024, the following changes have been made to the data dump:

  • Data releases contain JSON and CSV files formatted according to both schema v1 and schema v2. This means that there are now 4 files in each data release instead of 2.
  • v2 files have _schema_v2 appended to the end of the filename, ex v1.45-2024-04-11-ror-data_schema_v2.json .
  • In order to maintain compatibility with previous releases, v1 files have no version information in the filename, ex v1.45-2024-04-11-ror-data.json
  • For both versions, the CSV file contains a subset of fields from the JSON file, some of which have been flattened for easier parsing. As ROR records and the ROR schema are maintained in JSON, CSVs are for convenience only. JSON remains the format of record.
  • In v2 dump files, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set.
  • In v2 dump files, records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
  • In v1 dump files, records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.

Release versioning has not been changed. The ROR API default version remains v1 and will be changed to v2 in April 2025. To align with the API, the data dump major version will remain at 1 until the API default version is changed to v2. At that time, the data dump major version will be incremented to 2 as noted in metadata for https://doi.org/10.5281/zenodo.6347574.

Search UI changes

  • The ROR search UI nows uses API v2
  • Sub-headings have been added to the Other names section to identify name types (acronyms, aliases, labels)
  • A link to the JSON view is included at the bottom of each record