S-expressions the wrong way

A bunch of stuff that lead me to SJON

S-expressions are attracted towards Lisp

S-expressions give us that natural syntax for expressing nested hierarchical trees, and once our program is written as trees, it becomes tempting to let those trees rewrite themselves, a path that leads naturally to macros making syntax expand into more syntax. Such is the Lisp deal.

What if we went the other way around?

Forget macros, lets keep the S-expressions as the way to go when dealing with representing things in a clean declarative way, but then instead of making S-expressions be able to generate new forms, how can I constrain them into very specific ones?

This is what led me to SJON, I don't know why I called it SJON, looks like JSON, and yeah looks funny, but my idea is to use S-expressions where the central tool is not the macro, but the schema.

Macros expand the syntax but schemas constrain it, they are not mutually exclusive but can be opposing forces that I won't even try to balance, macros are out of SJON and I want to go all-in a declarative hierarchical schema based thingy and use it for other stuff.

Ponzi schema

A parser, a validator and an evaluator walk into a bar and ask for editor support, this bar is where I find myself drinking more often than not. I keep redoing these same little things whenever I want to have a textual representation of my app config that can be manipulated.

It has happened to me with the gpu declarative configuration (pngine), with the wgsl minifier and validator (wgslender and miniray), and now I have come to this realization that S-expressions might be a product-market-fit for such a thing. Specially when I look at S-expressions as the smallest data representation that can also be code, the more this fit seems obvious.

Instead of badly rebuilding a parser, a validator, an evaluator and wrap them with half-baked editor support, I can leverage S-expressions to provide a canonical way of doing all of these in a general purpose approach for the scenarios of declarative configurations.

If I constrain S-expressions with a schema, they become the smallest thing bound to it that is a language rather than a document, and a language is what my tools actually need (not a document).

Symbolic pipelines

You can see this everywhere, these kind of multi-domain configs that should have been S-expressions languages cripple our daily lives. They show up not only in my attempts of doing demo engines and higher level manipulations of shader pipelines, but also really all around you.

But my point is a little bit extra more, i'm not going to stop at "your configs are S-expressions wannabes", and want to raise that with trying to show that great leverage can be achieved by combining schemas with S-expressions, which is what SJON is.

Shaders and pipelines

I mean, lets start with a programming language example before the config ones.

Skim quickly through the JS code for a render pipeline in WebGPU:

const pipeline = device.createRenderPipeline({
  layout,
  vertex: {
    module: shaderModule,
    entryPoint: "vs_main",
    buffers: [...]
  },
  fragment: {
    module: shaderModule,
    entryPoint: "fs_main",
    targets: [...]
  },
  primitive: {
    topology: "triangle-list",
    cullMode: "back"
  },
  depthStencil: {...}
})

It is a tree dressed up in a plain old JS object, things get a bit clearer if we bring S-expressions to the rescue:

(shader-module :name shader)

(render-pipeline
  (vertex
    (module shader)
    (entry vs_main))
  (fragment
    (module shader)
    (entry fs_main)
    (target bgra8unorm))
  (primitive
    (topology triangle-list)
    (cull back))
)

It might or might not read better than the previous one (taste varies), and honestly this can be seen as just dressing things up in parens.

Now let me indulge in bringing a schema to the plot, by adding this to the above:

; schema sketch
(schema webgpu
  (enum topology [triangle-list triangle-strip])
  (enum cull-mode [none front back])
  (enum texture-format [bgra8unorm rgba16float])

  (ref shader-module :by name)

  (form render-pipeline
    (layout    :type symbol :default auto)
    (vertex    :type vertex-stage)
    (fragment  :type fragment-stage)
    (primitive :type primitive-state :default (primitive
                                                (topology triangle-list)
                                                (cull back)))
    (lower-to js-render-pipeline-descriptor)
	)
)

This turns our authoring into a domain object that can be checked and have specific diagnostics and defaults. We now have something similar to what TypeScript gives us with tools like Zod etc.

But lets move this further a bit more, WebGPU has a lot of things that are only known at runtime, some of these come directly from the shader you write, among these are stuff such as the name of the entrypoint functions for pipelines. These come from the shader string, which is by TypeScript and Zod standards only a string.

If you bring a layer of reflection to the shader string and lift that string into known values and types at runtime, these then can be injected into the typesystem and used to validate that the pipelines are cool (with the right entry points, and bindings and etc...).

; reflected from the WGSL string (miniray/wgslender)
(reflect shader (entry-points [vertex_main fragment_main]))

(render-pipeline
  (vertex (module shader) (entry vs_main)))   ; vs_main is not in the reflected set
; => error: shader 'shader' has no entry point 'vs_main' (found: vertex_main, fragment_main)

In the browser you could do this with zod built at runtime, but that validator never leaves JavaScript and it's code you maintain by hand while SJON's schema is data, derived from reflection once, and the same check runs in native wgpu or Zig build.

The strudel is real

We've got a lot of systems that secretly needed a language, but instead of sharing one substrate, each one of these systems rebuilt the stack and none of it composes. The story goes like "oh we are not building a programming language, we are just adding variables, expressions, conditionals, references, templates, scopes, plugins, and editor support in a customized way".

And then this is everywhere, CI/CD pipelines? GitLab/Github workflows, terraform HCL all carry their loops, references and the occasional opaque string that needs to be parsed by their tools. JavaScript? the package.json is just a JSON file, but then it also carries an implicit dependency graph as well as script strings with shell interpretation. Your favourite code editor settings config? paths, selectors, language filters, plugin APIs, and UI preferences all jammed into JSON or some other format, and it gets worse, I heard that some companies like to place these things in distributed databases with real ACID properties on top of them because reasons or something.

I cannot treat these things as composable and we keep carrying special parsers, validators, evaluators and other sorts of tooling just to help us out with the worse-off language that was intended to make our life easier in the first place.

Save the trees

The problem is that the domain inevitably grows a language, and this is ok and expected, but then instead of admitting that, it hides the language across strings, keys, templates, and plugins. Take a look at Tailwind strings, those are opaque, hard to parse, and carry style variants, breakpoints, behaviours and naturally hide precedence and conflicts.

The fact that they live in strings is what makes them hostile to refactoring and traversing. The structure exists, but you need Tailwind’s parser to see it and work through it. Same thing for every other format of things we love to use, they grow a language and then hide its AST.

And guess what? expose the AST and the tooling generalizes.

I'm not claiming that parenthesis are good, S-expressions are not attractive because they look nice. They are attractive because they make the AST the surface syntax and once the AST is the surface syntax, then parsing, transformation, validation, editor support, and composition become general tools instead of bespoke infrastructure for every new config format.

We can pair the AST with a tiny schema validator (in that same AST) and get a very powerful tiny engine for such things.

Conclusion

So that is the whole "the wrong way" of it. Lisp points the tree outward and lets the forms rewrite themselves into more forms, and that road ends at macros. I wanted to point it the other way and instead of asking "what else can this become?", pin it down and ask "what exactly is this allowed to be?"

That pinning down is the schema, and the trick that makes SJON more than yet another parenthesised JSON is that the schema is written in SJON too. The thing that decides what's valid lives in the same AST as the thing being validated. This means that the grammar of your domain stops living in your head or in some checker bolted on the side and goes directly into the data, where it can travel and throw real diagnostics, and validate the same way in JS or Rust or Zig.

With SJON documents can become languages that carry both the shape that gets parsed and a set of rules about what that shape is allowed to mean. This second half is exactly what every format on this page kept reinventing badly, because if we expose the AST and get the parser for free, then adding a schema in that same AST we get the validator, the evaluator, and the editor support all out of one declaration instead of four hand-rolled subsystems.

So that bar I keep drinking at where the parser and the validator and the evaluator came in asking for editor support, it turns out they were ordering the same drink the whole time: a tiny constrained S-expression with a schema, which is SJON.

There is even a site for it too: SJON.