At the lexical level J is fairly unsurprising. A program consists of a series of sentences, one sentence per line (I think we can already see how the issue of breaking long lines never comes up). Comments are introduced with
NB. and finish at the end of the line. Sentences are decomposed into words.
blue blue_green 7 _7 - =: 'foo' i. $ " \\ [:
Names and numbers are pretty standard (but note that negative numbers are denoted using
_, like ML, because
- is a verb), but apart from those there is a bewildering array of punctuation available including items that are usually found as brackets, delimiters, and escapes in other languages.
There’s an additional quirk. A sequence of numbers separated by spaces forms a single word (a list). This only works for numbers. So
3 1 2 is a single word (a length 3 list), but
a b c is not.
Each word is categorised into a part of speech. J deliberately uses terminology from natural language grammar.
- Nouns: values to you and me.
2 3 4,
'foo', and so on.
- Verbs: functions or procedures.
- Adverbs: monadic higher order functions.
- Conjunctions: dyadic higher order functions.
", and perhaps lots more.
)(which have their conventional meaning) and maybe some more.
- Copula: What J calls
=.for assigning names to things.
As we’ve seen, verbs can either be monadic or dyadic, so that’s: term verb term or verb term. I’m being deliberately vague about what a term is (mostly because I don’t know).
Adverbs are always monadic and follow the verb: term adverb.
Conjunctions are always dyadic: term conjunction term.
Let c be a conjunction, v be a verb, and a be an adverb.
There’s an obvious
x v y v z ambiguity, as in
2^3^4. As previously discussed things group to the right, so this is
2^3^4 NB. some number much larger than 256 ((2^3)^4) 2.41785e24
Conjunction Adverb Precedence
v c v a? Is that
(v c v) a or
v c (v a)?
Let’s find out. Consider
*: @ + / 1 2 3. There’s two new symbols I need to explain. Monadic
*: is square (it squares its argument).
@ is a conjunction called Atop, it’s a kind of compose operator.
*: @ +/ 1 2 3 could mean either
(*: @ +) / 1 2 3 which would fold the dyadic
(*: @ +) over the list
1 2 3 or it could mean
*: @ (+/) 1 2 3 which would apply the monadic
*: @ (+/) to the list
1 2 3. (It could also mean
*: @ (+/ 1 2 3), but it doesn’t.) The
@ conjuction is defined so that when used dyadically (as in the first possible interpretation)
x (*: @ +) y means the same as
*: (x + y) (square the sum); when used monadically
*: @ (+/) y means the same as
*: (+/) y.
So the first interpretation would have the meaning
1 (*:@+) 2 (*:@+) 3 which is 676 (262). The second interpretation would have the meaning
*: (+/ 1 2 3) which is 36. Let’s see:
*:@(+/) 1 2 3 36 (*:@+)/ 1 2 3 676 *:@+/ 1 2 3 NB. Same as second example. 676
So we can see that the
@ conjunction binds more tightly than the
/ adverb. Conjunctions have higher precedence.
Operator precedence is not enough
Sadly, whilst it’s tempting to think that you can parse J using an operator precedence parser, you can’t. That’s because the reductions to apply depend not on the syntactic category of the items being parsed but on their runtime class.
Consider what might happen if you had a conjunction whose result was sometimes a verb and sometimes a noun. This is extremely unconventional, but possible. I introduce the notion here to show how certain methods of parsing are not possible. I borrow from the future and show you my evil conjunction:
ec =: conjunction define if. n = m do. *: else. 4 end. )
The evil conjunction ec takes two (noun) arguments and has the result
*: (which is a verb) if they are equal, and the result 4 (which is a noun) if not. Now consider
n ec m - 3. Is this
((n ec m)(- 3)) (monadic verb applied to
- 3) or
((n ec m) - 3) (dyadic
- applied to two nouns)? Actually it could be either:
7 ec 0 - 3 NB. 7 ec 0 evaluates to the noun 4 1 7 ec 7 - 3 NB. 7 ec 7 evaluates to the verb *: (square) 9
So the parsing of J is mingled with its execution. The way it actually works is that the sentence forms a queue of words. Words are moved from the right-hand end of the queue onto a stack. After every move the top four elements of the stack are considered and one of 9 possible reductions is applied. When no more reductions are possible another word is moved across to the top of the stack and the cycle repeats. This is all reasonably well described by the J dictionary appendix E.
It strikes me as a mix of elegant simplicity, stomach churning madness, and pragmatic considerations of actually implementing something on 1950’s hardware.