Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.

JSON / J8 Notation

This chapter in the Oils Reference describes JSON, and its J8 Notation superset.

See the J8 Notation doc for more background. This doc is a quick reference, not the official spec.

Table of Contents
J8 Strings
json-string "hi"
json-escape \" \n \u1234
surrogate-pair \ud83e\udd26
u-prefix u'hi'
b-prefix b'hi'
j8-escape\u{1f926} \yff
no-prefix 'hi'
JSON8
json8-num
json8-str
json8-list
json8-dict
json8-comment
TSV8
column-attrs
column-types

J8 Strings

J8 strings are an upgrade of JSON strings that solve the JSON-Unix Mismatch.

That is, Unix deals with byte strings, but JSON can't represent byte strings.

json-string "hi"

All JSON strings are valid J8 strings!

This is important. Encoders often emit JSON-style "" strings rather than u'' or b'' strings.

Example:

"hi μ \n"

json-escape \" \n \u1234

As a reminder, the backslash escapes valid in JSON strings are:

\" \\
\b \f \n \r \t
\u1234

Additional J8 escapes are valid in u'' and b'' strings, described below.

surrogate-pair \ud83e\udd26

JSON's \u1234 escapes can't represent code points above U+10000 or 216, so JSON also has a "surrogate pair hack".

That is, there are special code points in the "surrogate range" that can be paired to represent larger numbers.

See the Surrogate Pair Blog Post for an example:

"\ud83e\udd26"

Because JSON strings are valid J8 strings, surrogate pairs are also part of J8 notation. Decoders must accept them, but encoders should avoid them.

You can emit u'\u{1f926}' or b'\u{1f926}' instead of "\ud83\udd26".

u-prefix u'hi'

A type of J8 string.

u'hi μ \n'

It's never necessary to emit, but it can be used to express that a string is valid Unicode. JSON strings can represent strings that aren't Unicode because they may contain surrogate halves.

In contrast, u'' strings can only have escapes like \u{1f926}, with no surrogate pairs or halves.

Escaping:

b-prefix b'hi'

Another J8 string. These b'' strings are identical to u'' strings, but they can also \yff escapes.

Examples:

b'hi μ \n'
b'this isn\'t a valid unicode string \yff\fe \u{3bc}'

j8-escape\u{1f926} \yff

To summarize, the valid J8 escapes are:

\'
\yff   # only valid in b'' strings
\u{3bc} \u{1f926} etc.

no-prefix 'hi'

Single-quoted strings without a u or b prefix are implicitly u''.

u'hi μ \n'  
 'hi μ \n'  # same as above, no \yff escapes accepted

They should be avoided in contexts where "" strings may also appear, because it's easy to confuse single quotes and double quotes.

JSON8

JSON8 is JSON with 4 more things allowed:

  1. J8 strings in addition to JSON strings
  2. Comments
  3. Unquoted keys (TODO)
  4. Trailing commas (TODO)

json8-num

Decoding detail, specific to Oils:

If there's a decimal point or e-10 suffix, then it's decoded into YSH Float. Otherwise it's a YSH Int.

42       # decoded to Int
42.0     # decoded to Float
42e1     # decoded to Float
42.0e1   # decoded to Float

json8-str

JSON8 strings are exactly J8 strings:

"hi 🤦 \u03bc"
u'hi 🤦 \u{3bc}'
b'hi 🤦 \u{3bc} \yff'

json8-list

Like JSON lists, but can have trailing comma. Examples:

[42, 43]
[42, 43,]   # same as above

json8-dict

Like JSON "objects", but:

Examples:

{"json8": "message"}
{json8: "message"}     # same as above
{json8: "message",}    # same as above

json8-comment

End-of-line comments in the same style as shell:

{"json8": "message"}   # comment

TSV8

These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.

column-attrs

!tsv8    name    age
!type    Str     Int
!other   x       y
         Alice   42
         Bob     25

column-types

The primitives:

Note: Can null be in all cells? Maybe except Bool?

It can stand in for NA?


Generated on Wed, 13 Mar 2024 14:59:38 -0400