Changelog
Source:NEWS.md
scholid 0.2.0
CRAN release: 2026-06-04
New identifier types
The package now supports 20 identifier types (up from 7 in 0.1.1). Each type provides structural validation, normalization from URLs and labels, and extraction from free text via the existing is_scholid(), normalize_scholid(), extract_scholid(), classify_scholid(), and detect_scholid_type() APIs.
New types in this release:
- ROR — Research Organization Registry iDs (checksum-validated)
- RRID — Research Resource Identifiers
- SWHID — Software Heritage persistent identifiers
-
OpenAlex — OpenAlex entity keys (
W,A,S, …) - bibcode — SAO/NASA ADS bibliographic codes
-
ISNI — International Standard Name Identifier (compact form; hyphenated ORCID-shaped strings remain
orcid) -
ARK — Archival Resource Keys (
ark:/NAAN/Name) - UniProt — UniProtKB accessions
- refseq — NCBI RefSeq accessions (versioned)
-
sra — INSDC Sequence Read Archive accessions (
SRR,SRX,SRP, …) -
geo — NCBI GEO accessions (
GSE,GSM,GPL,GDS) -
bioproject — INSDC BioProject accessions (
PRJNA,PRJEB, …) -
assembly — INSDC genome assembly accessions (
GCA_,GCF_, versioned)
Identifier definitions and validation rules are documented in the scholid_definitions vignette.
Internal improvements
- Introduced a central identifier registry as the single source of truth for type names, classification order, extraction patterns, and per-type metadata.
- Refactored per-type implementations to reduce duplication; exported APIs dispatch by naming convention (
is_<type>,normalize_<type>,extract_<type>). - Optimized
classify_scholid()anddetect_scholid_type()to avoid redundant work when resolving types.