2024-04-11 Schema & API v2
After over a year planning and many rounds of community input, we are thrilled to release ROR's first major schema and API update (version 2.0)! Per ROR's versioning policy, v1 will continue to remain available in the API and data dump through at least April 2025, and will likely be available for 6-12 months beyond that date.
Below is a list of changes deployed to the schema, API, data dump and search user interface. For complete documentation in v2, see https://ror.readme.io/v2/docs .
Schema changes
The following changes have been implemented in schema v2.0, based on input received through multiple rounds of community feedback (see Schema v2 feedback documents). The previous schema (which was originally unversioned but is now referred to as v1.0) remains unchanged. For additional details and examples see schema v2 documentation and the v2.0 JSON schema document
- Name information previously in
name,acronyms,aliases, andlabelsfields is now contained in 1 parent field,nameswith subfieldslang,valueandtypes. Please note that thelangsubfield has only been populated for names withlabelsin theirtypes. The curation team will be working on adding language codes to other names types over the coming months. - Location information previously in
addressesfield is now inlocationsfield with subfieldsgeonames_idandgeoneames_details. Many fields containing very granular information derived from Geonames have been removed, as this information is avilable directly from Geonames. Additionally, country code and name information previously in thecountryfield has been moved tolocations.geonames_details.country_codeandlocations.geonames_details.country_name - Website/domain information previously in
linksandwikipedia_urlhave been combined into a 1 parent fieldlinkswith subfieldstypeandvalue. Theip_addressesfield has been removed (it was not populated by GRID for any records). Thedomainsfield has been added, however, please note that this field has not yet been populated. The curation team will be working on this over the coming months. - External identifiers information has been restructured within the existing
external_idsfield. Each item in external_ids now has subfieldstype,allandpreferred. The data type forallis a list for eachexternal_idsitem, whereas it was previously a string for GRID IDs and a list for other ID types. - Administrative information was not included previously. A new parent field
adminhas been added, which contains subfieldscreatedandlast_modified. Each of those subfields contains additional subfieldsdateandschema_version. Created date for each record was extracted from previous GRID and ROR releases. Last modified dates were extracted from ROR releases only, as, at a minimum, each record in ROR has been modified by the ROR curation team to add a ROR ID in theidfield. - Controlled lists previously had variations in casing. For example, values in the
typesandrelationships.typefields began with an uppercase character, while values instatuswere lowercase and external ID types contained a variety of casings. In v2, allowed values in controlled lists are consistently lowercase, with the exception of country codes derived from ISO-3166, which are uppercase per the standard.
Important notes about v2 record data
There are several new fields/subfields in v2, and the dataset used in the beta has not been fully updated with values in all new fields/subfields. In particular:
- Created/last modified dates HAVE been added to all records, using actual dates from GRID and ROR data releases.
- Domains HAVE NOT been added. This field is currently an empty list for all records. This field requires careful curation to ensure accuracy. We plan to add data to this field over the coming months.
- Language codes for items in the names fields are only included for names inherited from the labels field in the current schema. Language codes HAVE NOT been added for names inherited from the name and aliases fields in the current schema. We plan to add language codes over the coming months, with the goal of ensuring that (minimally) each name with “ror_display” in its types has a language code.
API changes
- API now supports versioning, with
v1orv2supplied in the path portion of a request, exhttps://api.ror.org/v2/organizations. The same data is available in both versions; responses are formatted according to the version in the request path. - If no version is supplied, a default version is used.
v1will remain the default through April 2025. - For
v2, in addition to following the v2 schema, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set. - A new organization type
funderis available when filtering results based on organization type - Because v2 contains different fields from v1, fields available to search using the advanced query functionality
https://api.ror.org/v2/organizations?query.advanced=are different from v1. See v2 advanced query documentation. A notable addition is the ability to search by created or last modified date! - All other API functionality is identical to v2; records in responses are simply returned in v2 format. Records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
- v1 API functionality is unchanged. Records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.
Data dump changes
ROR data dumps continue to be available in Zenodo at https://doi.org/10.5281/zenodo.6347574. Beginning with release v1.45 on 11 April 2024, the following changes have been made to the data dump:
- Data releases contain JSON and CSV files formatted according to both schema v1 and schema v2. This means that there are now 4 files in each data release instead of 2.
- v2 files have
_schema_v2appended to the end of the filename, exv1.45-2024-04-11-ror-data_schema_v2.json. - In order to maintain compatibility with previous releases, v1 files have no version information in the filename, ex
v1.45-2024-04-11-ror-data.json - For both versions, the CSV file contains a subset of fields from the JSON file, some of which have been flattened for easier parsing. As ROR records and the ROR schema are maintained in JSON, CSVs are for convenience only. JSON remains the format of record.
- In v2 dump files, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set.
- In v2 dump files, records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
- In v1 dump files, records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.
Release versioning has not been changed. The ROR API default version remains v1 and will be changed to v2 in April 2025. To align with the API, the data dump major version will remain at 1 until the API default version is changed to v2. At that time, the data dump major version will be incremented to 2 as noted in metadata for https://doi.org/10.5281/zenodo.6347574.
Search UI changes
- The ROR search UI nows uses API v2
- Sub-headings have been added to the Other names section to identify name types (acronyms, aliases, labels)
- A link to the JSON view is included at the bottom of each record
