§ 1.2 Schema Best Practices

The use of characters and data types can be updated dynamically as the needs of the schema change. These two portions of the schema are under less onerous change management rules than the other portions. Therefore, coverage of them here is abbreviated.

Acceptable Characters

1.2.1 All characters used in federated compliance mapping data structures will conform to the GRCschema.org character table’s rules.

Mapping products must enter text into several different JSON data types. Some of those datatypes, such as term definitions for the dictionary, or URLs, only accept certain characters. Therefore, the federated compliance mapping model maintains a character table of acceptable characters to be used in the framework.

Instead of listing all the characters here (since this list can be updated from time to time), we’ll present one row of the table and explain that row. The full list is maintained online at https://grcschema.org/CharTable. The table below is a representation and is comprised of the following fields:

Char

Unicode

Dec

Hex

Binary

HTML

Definition

!

U+0021

33

21

100001

!

The grapheme for the exclamation mark character. It is represented by the following codes: Unicode U+0021; Dec 33; Hex 21; HTML !. This character is compatible with all known dictionary entries.

Char – the actual grapheme or special character code (such as TAB for a tab character) of the character in question.

Unicode through HTML – the various representations of the character in different formats.

Definition – describes how the character is used, and whether it should be substituted when using it in dictionary entries, URLs, or Markdown language.

Data Types

1.2.2 All data types used in federated compliance mapping model will conform to the datatypes documented at GRCschema.org.

There are various datatypes that each of the JSON objects must follow. The federated compliance mapping model has moved beyond the basic JSON datatypes of null, text, string, etc. and also allows specific datatypes for such things as HostName, TelephoneNumber, Datetime-with-Timezone, etc.

The list of datatypes is maintained online at http://grcschema.org/DataType.

Schema naming

Schema naming will follow a standardized approach to field and table names, and will also follow all restrictions for SQL keywords as SQL databases are the most used in the industry.

1.2.3 Schema property nodes start with a lowercase character.

1.2.4 Schema data types begin with an uppercase character.

1.2.5 Properties are lowercase.

1.2.6 DataTypes and Things are camelCase.

1.2.7 Properties may separate words with underscores “_”.

1.2.7.a Refrain from using spaces “ “ and dashes “-“ in any node name.

Reserved Keywords

1.2.7.b Refrain from using reserved keywords in any JSON “thing” or “datatype”.

SQL Servers use reserved keywords for defining, manipulating, and accessing databases. Re-served keywords are part of the grammar of the Transact-SQL language that is used by SQL Servers to parse and understand Transact-SQL statements and batches. While it is syntactically possible to use reserved keywords by adding special identifiers in Transact-SQL scripts, it is not advisable. Therefore, the following strings are forbidden as field, object, types, and table names.

TERM

TERM

TERM

ABSOLUTE

FETCH

PRIVILEGES

ACTION

FILE

PROC

ADD

FILLFACTOR

PROCEDURE

ALL

FIRST

PUBLIC

ALLOCATE

FLOAT

RAISERROR

ALTER

FOR

READ

AND

FOREIGN

READTEXT

ANY

FOUND

REAL

ARE

FREETEXT

RECONFIGURE

AS

FREETEXTTABLE

REFERENCES

ASC

FROM

RELATIVE

ASSERTION

FULL

REPLICATION

AT

FUNCTION

RESTORE

AUTHORIZATION

GET

RESTRICT

AVG

GLOBAL

RETURN

BACKUP

GO

REVERT

BEGIN

GOTO

REVOKE

BETWEEN

GRANT

RIGHT

BINARY

GROUP

ROLLBACK

BIT

HAVING

ROUND

BIT_LENGTH

HOLDLOCK

ROW

BLOB

HOUR

ROWCOUNT

BOOLEAN

IDENTITY

ROWGUIDCOL

BOTH

IDENTITY_INSERT

ROWID

BREAK

IDENTITYCOL

ROWS

BROWSE

IF

RTRIM

BULK

IMMEDIATE

RULE

BY

IN

SAVE

CASCADE

INDEX

SCHEMA

CASCADED

INDICATOR

SCROLL

CASE

INITIALLY

SECOND

CAST

INNER

SECTION

CATALOG

INPUT

SECURITYAUDIT

CHAR

INSENSITIVE

SELECT

CHAR_LENGTH

INSERT

SEMANTICKEYPHRASETABLE

CHARACTER

INT

SEMANTICSIMILARITYDETAILSTABLE

CHARACTER_LENGTH

INTEGER

SEMANTICSIMILARITYTABLE

CHECK

INTERSECT

SESSION

CHECKPOINT

INTERVAL

SESSION_USER

CHR

INTO

SET

CLOSE

IS

SETUSER

CLUSTERED

ISOLATION

SHUTDOWN

COALESCE

JOIN

SIZE

COLLATE

KEY

SMALLINT

COLLATION

KILL

SOME

COLUMN

LANGUAGE

SPACE

COMMIT

LAST

SQL

COMPUTE

LEADING

SQLCODE

CONNECT

LEFT

SQLERROR

CONNECTION

LENGTH

SQLSTATE

CONSTRAINT

LEVEL

STATISTICS

CONSTRAINTS

LIKE

STRVAL

CONTAINS

LINENO

SUBSTRING

CONTAINSTABLE

LOAD

SUM

CONTINUE

LOCAL

SYSTEM_USER

CONVERT

LONGVARBINARY

TABLE

CORRESPONDING

LOWER

TABLESAMPLE

COUNT

LTRIM

TEMPORARY

CREATE

MATCH

TEXTSIZE

CROSS

MAX

THEN

CURDATE

MERGE

TIES

CURRENT

MIN

TIME

CURRENT_DATE

MINUTE

TIMESTAMP

CURRENT_TIME

MODULE

TIMESTAMPVAL

CURRENT_TIMESTAMP

MONTH

TIMEVAL

CURRENT_USER

MONTHNAME

TIMEZONE_HOUR

CURSOR

NAMES

TIMEZONE_MINUTE

CURTIME

NATIONAL

TO

CURTIMESTAMP

NATURAL

TODAY

DATABASE

NCHAR

TOP

DATE

NEXT

TRAILING

DATEVAL

NO

TRAN

DAY

NOCHECK

TRANSACTION

DAYNAME

NONCLUSTERED

TRANSLATE

DAYOFWEEK

NOT

TRANSLATION

DBCC

NULL

TRIGGER

DEALLOCATE

NULLIF

TRIM

DEC

NUMERIC

TRUNCATE

DECIMAL

NUMVAL

TRY_CONVERT

DECLARE

OCTET_LENGTH

TSEQUAL

DEFAULT

OF

UNION

DEFERRABLE

OFF

UNIQUE

DEFERRED

OFFSET

UNKNOWN

DELETE

OFFSETS

UNPIVOT

DENY

ON

UPDATE

DESC

ONLY

UPDATETEXT

DESCRIBE

OPEN

UPPER

DESCRIPTOR

OPENDATASOURCE

USAGE

DIAGNOSTICS

OPENQUERY

USE

DISCONNECT

OPENROWSET

USER

DISK

OPENXML

USERNAME

DISTINCT

OPTION

USING

DISTRIBUTED

OR

VALUE

DOMAIN

ORDER

VALUES

DOUBLE

OUTER

VARBINARY

DROP

OUTPUT

VARCHAR

DUMP

OVER

VARYING

ELSE

OVERLAPS

VIEW

END

PAD

WAITFOR

END_EXEC

PART

WHEN

ERRLVL

PARTIAL

WHENEVER

ESCAPE

PERCENT

WHERE

EVERY

PIVOT

WHILE

EXCEPT

PLAN

WITH

EXCEPTION

POSITION

WITHIN GROUP

EXEC

PRECISION

WORK

EXECUTE

PREPARE

WRITE

EXISTS

PRESERVE

WRITETEXT

EXIT

PRIMARY

YEAR

EXTERNAL

PRINT

ZONE

EXTRACT

PRIOR

Schema Structure

Establishing any data format for an API is a common problem – especially when trying to make the API easy to use so anyone can use short strings to reference common things, while also providing a robust interchange format to use in federated projects. Leveraging information from multiple APIs becomes problematic due to namespace or document format conventions that may differ between API endpoints. Moreover, the same principles are often repeated across different endpoints using arbitrary identifiers (name, email, website, etc.); a federated project community needs to stop repeating itself (DRY concept) and reuse common conventions. This model is based on the principles of separation of data model from syntax, the use of discoverable identifiers describing document contents, and general organizing principles that allow documents to be machine understandable (read, interpreted as JSON-LD using Linked Data). Key among these is the notion of vocabulary re-use, so that each endpoint does not need to separately describe the properties and structure of their JSON documents.

Schema Context

1.2.8 All keys used within a JSON document can have unambiguous meaning, being bound to URLs which describe their meaning.

1.2.8.a The context for the schema structure will be defined at grcschema.org, with the top level context being “Thing”.

An example of the bound, unambiguous context is shown below that uses IRIs (Internationalized Resource Identifiers as described in RFC3987) for unambiguous identification. The idea is to use IRIs to assign unambiguous identifiers to data that may be of use to other developers:

{ "@context":{ "schema": "http://grcschema.org/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfa": "http://www.w3.org/ns/rdfa#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#" }, "@id": "Thing", "@type": "schema:Thing", "rdfs:comment": "The most generic type of item.", "rdfs:label": "Thing" }

1.2.9 Each JSON object will be placed in a hierarchy with a single top-level object (type) with cascading, subordinate objects (types) as necessary.

1.2.10 Multiple array values are presumed unordered.

1.2.11 Every JSON object will have core meta data (coreMetaData) associated with it.

1.2.11.a Core meta data will include date_created to record the creation date of the record.

1.2.11.b Core meta data will include date_modified to record the date the record was last modi-fied (this will be the same as the creation date when an object is first created).

1.2.11.c Core meta data will include created_by to record the id of the person or agent that cre-ated the object.

1.2.11.d Core meta data will include modified_by to record the id of the person or agent that last modified the object.

1.2.11.e Core meta data will include a live status (live_status) that is a Boolean value, and if the object is no longer live, it will not be removed from the database but only marked as dep-recated (Boolean value of 0).

1.2.11.f Core meta data will include the name of the table that contains the object.

When creating the checksum for the RESTful record validation, the Verhoeff calculation will be used to create the initial checksum from a combination of these three fields:

  • date_modified: This is the most recent modification date of the record.

  • id: This is the ID field of the record in question.

  • live_status: This is a Boolean field indicating if the record is live (1) or deprecated (0).

In combination, these three tell the user which record they are dealing with, when it was last modified, and whether it remains live and ready for use.

The checksum will return a dihedral check integer of those three values that can be used on the receiving end to verify that the record of that ID was last modified on that date and has that live status.

When creating or updating a record in the system

When creating or updating a record, the system will calculate the checksum based on the input and will record the checksum in the checksum node. Given the modification_date of 6/17/2020, ID of 1 and live_status of 1, Verhoeff ( 6/17/2020 & 1 & 1 ; 0 ; 0 ) returns the value 5.

Testing the checksum

Upon receiving data from the system, the user will be to calculate a boolean value by simply reversing the calculation. Given the modification_date of 6/17/2020, ID of 1 and live_status of 1, NOT Verhoeff ( 6/17/2020 & 1 & 1 ; 0 ; 0 ) returns the value 1. Modification of any of the values (date, ID, live_status) returns 0.

The checksum calculation

Returns the Verhoeff dihedral check digit of numericString. Use this function to verify a numeric string protected by Verhoeff check digit, or to generate the correct Verhoeff check digit for a given numeric string.

Format

Verhoeff ( numericString ; index ; checkSum )

Parameters

numericString - a string of numeric characters (digits) or field containing numeric characters. In this case, the string is comprised of the following: date_created & id & live_status.

index - indicates the digit position of the current iteration - needs to be initialized to zero (0) when calling the function.

checkSum - indicates the check digit of the current iteration. This needs to be initialized to zero (0) when calling the function

Data type returned = number

Calculation

Let ( [ n = Right ( numericString ; 1 ) ; p = Let ( [ array = "01234567891576283094580379614289160435279453126870428657390127938064157046913258" ; start = 10 * Mod ( index ; 8 ) + n + 1 ] ; Middle ( array ; start ; 1 ) ) ; d = Let ( [ array = "0123456789123406789523401789563401289567401239567859876043216598710432765982104387659321049876543210" ; start = 10 * checkSum + p + 1 ] ; Middle ( array ; start ; 1 ) ) ; len = Length ( numericString ) ; nextString = Left ( numericString ; len - 1 ) ] ; Case ( len > 1 ; Verhoeff ( nextString ; index + 1 ; d ) ; d ) )

1.2.11.g Core meta data will include a Verhoeff checksum (checksum) that is a positive integer.

1.2.12 All nested arrays will include the unique identifier (id) for each nested object as well as the id of the related table in the format of the foreign table name followed by an underscore and "fk" (foreigntablename_fk).

An example of a nested table with its id as well as the id of the related table is shown below.

{ "@context": { "schema": "http://grcschema.org/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfa": "http://www.w3.org/ns/rdfa#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#" }, "id": "96", "@type": "Person", "name": "Barack Obama", "givenName": "Barack", "familyName": "Obama", "jobTitle": "44th President of the United States", "spouse": { "id": "13133", "@type": "Person", "name": "Michelle Obama", "Person_fk": "96" } }

§ 1.2.13 When using a property intended to reference another entity or external reference, those properties will be represented as full URLs.

Specification of Arrays

Arrays are specified by the keyword @set followed by JSON’s array brackets “[ ]”. The objects that follow denote the individual objects (fields) within the array.

"@set": [ { "@type": "NonStandardName", "name": "String", "id": "Integer", "organization_fk": "Integer" } ]

§ 1.2.14 When including arrays within objects, the @set will precede the array brackets “[ ]”.

Specification of Types

The @type keyword is used to set the type of a node or the datatype of a typed value (a value, which is a string, and a type, which is an IRI). In Linked Data, it is common to specify the type of a graph node; in many cases, this can be inferred based on the properties used within a given node object, or the property for which a node is a value.

1.2.15 The use of @type immediately following @context sets the 5.15 The use of @type immediately following @context sets the name of the table.

An example of this is shown below:

"@context":"http://grcschema.org/" , "@type":"Role"

1.2.16 When @type follows an object name and isn’t preceding an array it names a predefined group of fields.

Example below:

"coreMetaData": { "@type": "CoreMetaData" "date_created":"2014-10-05" "date_modified":"2014-10-05" "live_status":1 "table":"Language" }

1.2.17 When @type follows an object name and precedes an array it names the related table that should be added.

Example below:

"nonStandardNames": { "@type":"NonStandardNames" "Names": [ 0: { "id":1 "name":"افغانستان" } ] }

Here is an example where you’ve got @type meaning all of the above.

"geoLocation": { "@type":"GeoLocation" <- That describes this grouping of data "borders": { "@type":"Borders" <— That describes the related subtable and fields "border": [...] } "term": { "@type":"Term" <- That just describes a field from a related table "term.id":32131 } "callingCodes": { "@type":"CallingCodes" <— That describes the related subtable and fields "calling_codes": [ 0: {... } ] "timeZones": {... } "latitude":"33" "longitude":"65" }

Local Reference Identifiers

When creating new records (POST), or updating existing records (PATCH), an additional level of validation and non-repudiation is presented as an optional key-value pair. We suggest using a hash-based algorithm such as an md5 hash based on the contents of the fields being passed to the API for creation or updating, such as the one below:

local_reference_id": "81dc9bdb52d04dc20036dbd8313ed055"

§ 1.2.19 For POST and PATCH operations, you may send an optional local_reference_id for the core object and any sub-objects in unordered lists (\@set arrays). This lo-cal_reference_id will be returned (like an echo) for that object.

§ 1.2.20 It is recommended you use a Type 4 UUID which is unique for that object in your sys-tem and tie the federated ID to your record for later use in querying the federated system for that object.

POST Operation Example

For the core object, you may place a local_reference_id at the object root level, and it will be re-turned in the response.

SEND

{ "@context": "http://grcschema.org/", "@type": "NameSuffix", "abbreviation": "Adm", "local_reference_id": "81dc9bdb52d04dc20036dbd8313ed055", "suffix": "Admiral" }

RESPONSE

{ "@context":"http://grcschema.org/", "@type": "NameSuffix", "abbreviation": "Adm", "id": 2, "suffix": "Admiral", "local_reference_id": "81dc9bdb52d04dc20036dbd8313ed055" }

PATCH Operation Example

For the core object, you may place a local_reference_id at the object root level, and it will be returned in the response. Do not add local_reference_id to the container objects like CoreMetaDa-ta.

SEND & RESPONSE

{ "@context":"http://grcschema.org/", "@type": "NameSuffix", "CoreMetaData": { "@type": "CoreMetaData", "date_created": "Date", "date_modified": "Date", "created_by": "Integer", "modified_by": "Integer", "live_status": "Boolean", "checksum": "Integer", "validated": "Boolean" }, "abbreviation": "Adm.", "id": 2, "suffix": "Admiral", "local_reference_id": "81dc9bdb52d04dc20036dbd8313ed055" }

Last updated