2024-04-11 Schema & API v2
Introducing v2
After over a year planning and many rounds of community input, we are thrilled to release ROR's first major schema and API update (version 2.0)! Per ROR's versioning policy, v1 will continue to remain available in the API and data dump through at least April 2025, and will likely be available for 6-12 months beyond that date.
Below is a list of changes deployed to the schema, API, data dump and search user interface. For complete documentation in v2, see https://ror.readme.io/v2/docs .
Schema changes
The following changes have been implemented in schema v2.0, based on input received through multiple rounds of community feedback (see Schema v2 feedback documents). The previous schema (which was originally unversioned but is now referred to as v1.0) remains unchanged. For additional details and examples see schema v2 documentation and the v2.0 JSON schema document
- Name information previously in
name
,acronyms
,aliases
, andlabels
fields is now contained in 1 parent field,names
with subfieldslang
,value
andtypes
. Please note that thelang
subfield has only been populated for names withlabels
in theirtypes
. The curation team will be working on adding language codes to other names types over the coming months. - Location information previously in
addresses
field is now inlocations
field with subfieldsgeonames_id
andgeoneames_details
. Many fields containing very granular information derived from Geonames have been removed, as this information is avilable directly from Geonames. Additionally, country code and name information previously in thecountry
field has been moved tolocations.geonames_details.country_code
andlocations.geonames_details.country_name
- Website/domain information previously in
links
andwikipedia_url
have been combined into a 1 parent fieldlinks
with subfieldstype
andvalue
. Theip_addresses
field has been removed (it was not populated by GRID for any records). Thedomains
field has been added, however, please note that this field has not yet been populated. The curation team will be working on this over the coming months. - External identifiers information has been restructured within the existing
external_ids
field. Each item in external_ids now has subfieldstype
,all
andpreferred
. The data type forall
is a list for eachexternal_ids
item, whereas it was previously a string for GRID IDs and a list for other ID types. - Administrative information was not included previously. A new parent field
admin
has been added, which contains subfieldscreated
andlast_modified
. Each of those subfields contains additional subfieldsdate
andschema_version
. Created date for each record was extracted from previous GRID and ROR releases. Last modified dates were extracted from ROR releases only, as, at a minimum, each record in ROR has been modified by the ROR curation team to add a ROR ID in theid
field. - Controlled lists previously had variations in casing. For example, values in the
types
andrelationships.type
fields began with an uppercase character, while values instatus
were lowercase and external ID types contained a variety of casings. In v2, allowed values in controlled lists are consistently lowercase, with the exception of country codes derived from ISO-3166, which are uppercase per the standard.
Important notes about v2 record data
There are several new fields/subfields in v2, and the dataset used in the beta has not been fully updated with values in all new fields/subfields. In particular:
- Created/last modified dates HAVE been added to all records, using actual dates from GRID and ROR data releases.
- Domains HAVE NOT been added. This field is currently an empty list for all records. This field requires careful curation to ensure accuracy. We plan to add data to this field over the coming months.
- Language codes for items in the names fields are only included for names inherited from the labels field in the current schema. Language codes HAVE NOT been added for names inherited from the name and aliases fields in the current schema. We plan to add language codes over the coming months, with the goal of ensuring that (minimally) each name with “ror_display” in its types has a language code.
API changes
- API now supports versioning, with
v1
orv2
supplied in the path portion of a request, exhttps://api.ror.org/v2/organizations
. The same data is available in both versions; responses are formatted according to the version in the request path. - If no version is supplied, a default version is used.
v1
will remain the default through April 2025. - For
v2
, in addition to following the v2 schema, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set. - A new organization type
funder
is available when filtering results based on organization type - Because v2 contains different fields from v1, fields available to search using the advanced query functionality
https://api.ror.org/v2/organizations?query.advanced=
are different from v1. See v2 advanced query documentation. A notable addition is the ability to search by created or last modified date! - All other API functionality is identical to v2; records in responses are simply returned in v2 format. Records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
- v1 API functionality is unchanged. Records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.
Data dump changes
ROR data dumps continue to be available in Zenodo at https://doi.org/10.5281/zenodo.6347574. Beginning with release v1.45 on 11 April 2024, the following changes have been made to the data dump:
- Data releases contain JSON and CSV files formatted according to both schema v1 and schema v2. This means that there are now 4 files in each data release instead of 2.
- v2 files have
_schema_v2
appended to the end of the filename, exv1.45-2024-04-11-ror-data_schema_v2.json
. - In order to maintain compatibility with previous releases, v1 files have no version information in the filename, ex
v1.45-2024-04-11-ror-data.json
- For both versions, the CSV file contains a subset of fields from the JSON file, some of which have been flattened for easier parsing. As ROR records and the ROR schema are maintained in JSON, CSVs are for convenience only. JSON remains the format of record.
- In v2 dump files, values in fields that contain multiple values are sorted by Unicode value, which is alphabetical for characters in the Basic Latin set.
- In v2 dump files, records added or last updated in v1 are mapped to v2 and created/last modified are populated based on changelogs from previous data dump releases.
- In v1 dump files, records added or last updated in v2 are mapped to v1 and contain empty or null values for fields that don't exist in v2.
Release versioning has not been changed. The ROR API default version remains v1 and will be changed to v2 in April 2025. To align with the API, the data dump major version will remain at 1 until the API default version is changed to v2. At that time, the data dump major version will be incremented to 2 as noted in metadata for https://doi.org/10.5281/zenodo.6347574.
Search UI changes
- The ROR search UI nows uses API v2
- Sub-headings have been added to the Other names section to identify name types (acronyms, aliases, labels)
- A link to the JSON view is included at the bottom of each record