The Squeak Language stated semi-formally
The
following more or less formal specification of the Squeak Language is derived
from an EBNF definition of Squeak 2.7alpha by - Dwight Hughes. It comes from the Swiki, and should be considered only a snapshot of Squeak at this version.
By
"EBNF" is meant "Extended Backus-Naur Formalism". This is
a way to write formal grammars widely used in linguistics (and elsewhere) and
is named after its originators.
The
EBNF used here is defined as:
[ ... ] apply zero or one times;
[ ... ]* apply zero or more times;
[ ... ]+ apply one or more times;
... | ... choose one of the alternatives;
"..." use the literal characters enclosed;
( ... ) used for grouping.
In
what follows the defined terms are set in bold-face red, and I divide
the specification in some convenient groups, namely
This
sequence works from the smallest and simplest expressions in Squeak
(letters and digits) to the longest and most complex expressions in Squeak
(methods, messages, blocks).
1. Letters and digits
Since Squeak is a formal language like ordinary algebra - which I suppose you
to be somewhat familiar with - and is made up for a computer from strings of
characters it makes sense to
start with saying from what letters and other squiggles the well-formed
strings of the Squeak Language (SL) are composed
|
letter = uppercase | lowercase.
|
Letters
in SL are either uppercase or lowercase, and specifically these are
|
uppercase = "A" | "B" | "C" | "D"
| "E" | "F" | "G" | "H" |
"I" | "J" | "K" | "L" |
"M" | "N" | "O" | "P" |
"Q" | "R" | "S" | "T" |
"U" | "V" | "W" | "X" |
"Y" | "Z".
|
|
lowercase = "a"
| "b" | "c" | "d" | "e" |
"f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" |
"n" | "o" | "p" | "q" |
"r" | "s" | "t" | "u" |
"v" | "w" | "x" | "y" |
"z".
|
So the "letters" of Squeak are just what an English-speaking person
would expect and what can be found on a standard English qwerty-keyboard as
such. In any case, also in what follows, the vertical stroke "|"
without quotation-symbols is like an exclusive or. Thus in uppercase the
intended meaning is: Any one of "A" .... "Z" (without
surrounding quotation- marks) is an uppercase letter and indeed nothing else.
There
are more letter-like squiggles in English and on qwerty-keyboard, and for
Squeak these are nameda bit differently from standard EnglishL
|
character = ("[" | "]" | "{" | "}"
| "(" | ")" | "_" | "^" |
";" | "$" | "#" | ":" |
"-" | "|" | ".") | decimal_digit | letter |
special_character.
|
|
special_character = ("+" | "*" | "/" |
"\" | "~" | "<" | ">" |
"=" | "@" | "%" | "&" |
"?" | "!" | "`" | "," ).
|
What
are called the "characters" of SL are mostly used in Squeak
as grouping terms, and the special_characters indeed have special roles,
mostly in arithmetical or logical statements.
The
numerical symbols in Squeak are again what English speakers would expect:
|
decimal_digit = "0" | "1" | "2" | "3"
| "4" | "5" | "6" | "7" |
"8" | "9".
|
A
decimal_digit in Squeak is precisely what one would expect it is, and more
generally
are
squences of one or more digits. So "digits" are squences made up of
one or more digit-characters, but a terminological oddity (probably due to
the use of letters in arithmetical systems that have more that 10 basic
digits, for which capital letters are normally used) is that a digit is not
just a decimal_digit, but may also be am uppercase letter:]
|
digit = decimal_digit |
uppercase.
|
2. Interpunctions
If you
look closely at any normal English written text of a page or more in length,
you'll find that something like a third of it may consist of interpunction,
like blanks, dots, commas and the like. Interpunctions serve as means of
grouping characters and terms and to help the human reader. It is often
referred to as "whitespace", precisely because in printing practice
so much is indeed made up of white space without any character.
In
SL there is interpunction as well, and indeed its purpose is to help the
human readers of the Squeak Language:
|
whitespace = [space | tab
| newline]+.
|
This
is the whitespace in SL, in fact mostly defined by reference to a
standard keyboard, with a space and a tab key (both of which are represented
in the computer by specific numbers: computers have no use for
whitespace). Note that whitespace consists of one or more of space |
tab | newline, as indeed conforms to human writing and typing practice.
There
is a tricky bit involved in newlines, that correspond to the Enter-key on the
keyboard:
|
newline =
cr | lf | crlf.
|
The
tricky bit arises from the desire to cater to many OS-s: On a Mac newline is cr; on Unix newline
is lf; and on Dos newline is crlf (so a sequence of the previous two).
In
Squeak the standard newline is cr on all platforms, but this concerns only
text written inside Squeak, and not text written on other systems and filed
into Squeak, for which reason the lf and crlf exist in Squeak.
Finally,
the general function of interpunction is to separate a term from
surrounding terms:
|
separator = whitespace |
comment.
|
This
only adds comment, defined below as anything occuring between two
double quote-marks. Comments occur in the SL to help the user. When
Squeak parses a human users input it skips all comments, effectively treating
it as whitespace, with which it also does nothing (except permit its use).
3.1 Terms 1: Variable Identifiers of various kinds
Sofar
in fact we considered the smallest well-formed expressions in Squeak:
characters, digits and interpunction. Terms of a language are well-formed
expressions that are intermediate between characters and statements, and that
have some sort of meaning on their own. In English what are called
"terms" here are often also called "words" or
"phrases".
In
Squeak, most of the terms are called "identifiers", which is a
fancy name for "name". There are several kinds of them. In this
section I deal with the various kinds of variable identifiers in Squeak, used
for different purposes in different contexts, which the user may introduce for his own ends:
|
identifier =
letter [letter | decimal_digit]*.
|
This
defines the general set of identifiers: Any string that is made up of letters
or decimal_digits (and so without whitespace etc.). Likewise there is in
Squeak:
|
capital_identifier =
uppercase [letter | decimal_digit]*.
|
This
is just like an identifier, except that it starts definitely with an
uppercase letter. Squeak has both of them because of its convention (chosen
but not imposed by the Squeak parser, in most cases) to have
capital_identifiers for common names (names for possibly many things) and
other identifiers for individual names (names for precisely one thing).
Next,
there are in Squeak two explicit ways to mark special terms:
|
character_constant =
"$"(character | "'" | """).
|
|
symbol_constant = "#" symbol.
|
The
character-constants exist, among other things, to be able to deal with terms
like "+" without making Squeak regard them as instructions to
add.
The
symbol_constants exist, among other things, to make sure that Squeak treats
the identifier that follows "#" as a unique name in the system, and
is used everywhere in its code to name parts of it. What is a
"symbol" in Squeak is defined below:
|
symbol = identifier |
binary_selector | [keyword]+ | string.
|
"Binary_selector"
and "keyword" are defined further down, and "string"
immediately below. The general point of symbols in Squeak (as in many other
computer-languages) is to have unique names for things in the system.
One
important point to notice (that may differ from conventions in other
languages) is that for Squeak a string is in fact a symbol and so a kind of
complex constant. Strings are explicitly defined as
follows:
|
string = "'"
[character | "'" "'" | """]*
"'".
|
The
point of this definition is that a string is a sequence of characters
possibly with comments before, inside and after it (and comments are very
useful inside programming code, to explain what happens and may be
problematic).
3.2: Terms 2: Arrays
At
this point we have defined the letters, digits, interpunctions and variable
identifiers of the Squeak Language, but in fact have not yet introduced any
wherewithal to do anything useful. This we start now, with arrays.
An
array is a sequence of distinct components that can be stored and recalled as
a unit by a computer. It occurs in most computer-languages, since it provides
a basic way of storing and retrieving information.
|
array = "("
[number | symbol | string | character_constant | array]* ")".
|
Thus
an array is written as a bracketed sequence of items, that may in
general be about anything, including arrays. One main limitation on arrays,
also in Squeak, is that they must be pre-declared and have a fixed length. A
nice thing about the notation for arrays in Squeak is that the separator used
is not the comma, as in most languages, but the empty character. This is
easier to read, especially in long arrays.
Next,
often the most convenient thing to store something in computer-memory
is in an array of constants. In Squeak, this is defined with help of the
following term:
|
literal = (number |
symbol_constant | character_constant | string | array_constant).
|
The
literals consist of those items that are constants for Squeak. These
are used in the next definition:
|
array_constant = "#" array of literals
|
Here
we see a convention at work in the previous section, namely the use of the
prefix "#" to indicate that the rest of the string
following it is a symbol and so a constant.
3.3 Terms 3: Numerical Terms
In
Section (1) of this specification, digits were defined, which enables
Squeak to deal with terms for simple natural numbers. But there are
many more kinds of numbers in mathematics, and Squeak provides for
these as follows, in a way differing from other computer languages:
|
number =
["-"][radix"r"]["-"]digits["."digits]["e"["-"]exponent].
|
The
first character is an option "-" for negative numbers. The
radix (or base, in standard mathematics in English) is specified thus
It is
the number-base of the numerical expression following it (explained in
beginning algebra). NOTE:
In fact radix is between 2 and 36 inclusive (actually, Squeak checks only the
lower bound - you may make the upper bound as large as you wish, but you can
represent only the first 36 digits of the larger base; numbers entered using
them are interpreted correctly however). Also, the set of digits allowed in a number of radix N is the first N
characters of the string '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', while the
default radix = 10, as in ordinary mathematics, and need NOT be explicitly
written if it this is intended. In case you want to calculate on an alternative base, you
must explicitly specify the radix, and then Squeak will use it. Note the
radix in decimal_digits is followed by the letter "r", marking its
end and the beginning of the actual digits of the number.
Having
written the radix and the number, one may follow this by the letter
"e" for "exponent", followed by more digits:
|
exponent = decimal_digits.
|
This
is also as in elementary mathematics.
Note
that the given definition of "number" in Squeak allows Squeak to
represent and indeed calculate very many different kinds of numbers,
including binary (radix=2), octal (radix=8) and hexadecimal (radix = 16). If
you are neither a mathematician nor a computer scientist much of this will
probably be useless, but since computers store everyting in fact in powers of
2, these may be very handy.
3.4. Terms 4: Logical Constants
Now we
finally arrived at the beginning of Squeak's processing. First, there are the
all important:
|
pseudo_variable =
"true" | "false" | "nil" | "self" |
"super" | "thisContext" | "homeContext".
|
These
terms are called "pseudo-variables" because they have on the one
hand a variable meaning in variable contexts (rather like pronouns in
English) but on the other hand cannot be assigned a different content, as all
variables can be in Squeak.
It
need here only be note that "true", "false" and
"nil" are used for logical processing, while "nil" is
also used for assignments (see below). The "true" and
"false" are Squeak's Booleans, while "nil" is a
very convenient addition to these.
Next,
"self" and "super" are used to deal with Squeak's
so-called inheritance, and basically instruct Squeak where to find certain
methods (see below for methods).
Finally,
"thisContext" and "homeContext" are used mostly
internally by Squeak in the processing of blocks and other code, to keep
track of what is where and belongs to what.
I
noted above that "true" and "false" are Squeak's
Booleans. These are implemented in Squeak in another way than in other
languages, and these come with the following:
|
special_keyword =
"ifTrue:" | "ifTrue:ifFalse:" | "ifFalse:" |
"ifFalse:ifTrue:" | "whileTrue" |
"whileTrue:" | "whileFalse" | "whileFalse:" |
"and:" | "or:" | "to:do:" |
"to:by:do:".
|
Except
for the last two, all of these are used for dealing with logical alternatives
and possibilities, while the last two are used for passing parameters.
Three
remarks should be made:
First,
In Squeak, the colon ending a term is used to indicate that a parameter
follows. In the case of terms dealing with Booleans, these often are blocks
(more of which below).
Second,
there are some more terms in the category special_keyword, namely the
Booleans involving nil, such as "ifNil:". In general, it seems not
completely clear at present what is counted as belonging to the very basis of
Squeak, and what doesn't, and one reason is that in fact the logical
processing in Squeak is somewhat differently implemented from other
languages.
Third, in fact the operations indicated by these keywords are normally
optimized/inlined by Squeak into tests and jumps and are not sent as actual
messages. This happens for speed reasons, and because these tests and jumps
are quite simple and very universal. However, you can have the effect of
ordinary messages (useful for debugging) by using
|
#perform:,
#perform:with:, #perform:with:with:, and #perform:with:with:with:.
|
This
is another series of special_keyword. (Not recommended, unless you are
debugging.) The "with:" etc. is a general way to pass 1, 2 or 3
parameters in Squeak.
3.5. Terms 5: Logical Variables
The
parts in the previous section are concerned with processing by Squeak and are
constants. This section treats variables, and it should first be noted
that in Squeak variables are rather special and different from other
languages.
In
Squeak, variables are as it were named slots used for arbitrary storage. That
is, for Squeak a variable is processed as made up of a string which is a
Squeak identifier, and a contents, which has been assigned by Squeak or the
user. These pairs of a name and the contents it refers to is called
"variable" because the contents can be changed, while the name
remains the same.
Here
there are two points of importance: Squeak initializes anything it recognizes
as a variable as nil, until this is undone. (So all variables refer to some
contents, if only nil.)
Second,
and most important for Squeak: The contents of a variable are anything that
can be represented by a well-formed Squeak-expression. This gives Squeak very
great power, and it also liberates the user of having to add
type-declarations for variables, as is usual in other computing languages,
where there are variables but normally of a specific kind, that needs
explicit declaration, like "integer", "float" or
"string".
So the
expressions for v ariables that follows are in fact handles for
storage-spaces for the user of Squeak. They come in several kinds, depending
on the purpose they serve in Squeak:
|
variable_name = identifier.
|
This
concerns the names for storing the contents of the "variable" of
that name, by assignment (see
below). By convention variable names begin with lowercase, but this is not
enforced by the system, though it may ask when one uses an initial uppercase
whether the variable is to be stored as Global, i.e. accessible to the whole
system, and not just by the part in which it is declared
|
temporaries = "|" [variable_name]* "|".
|
Temporaries
are variables that are only accessible and maintained by Squeak when
processing the code in which they occur. They are declared by the user by
means of writing them between two bars, separated by whitespace. (Like
before, variables are initialized to nil when declared: As soon as you've
written "| blab blub |" in a Workspace Squeak has somewhere stored
"blab" and "blub" pointing to nil as long as nothing else
is assigned to them.
|
class_name =
capital_identifier.
|
In
fact Squeak's classes are Squeak's programs, and one must refer to these by
identifiers starting with an uppercase letter. (Note new class_names are
usually declared and added in a browser.)
There
are several types of variables, all named by identifiers I list here without
explications:
|
class_variable_name =
capital_identifier.
instance_variable_name =
identifier.
class_instance_variable_name
= identifier.
|
By
convention both instance variables and class instance variables begin with
lowercase, but this is not enforced by the system.
Sofar,
we have dealt with names for the programs in Squeak, and now we turn to
parts of programs of Squeak:
|
argument_name = identifier.
|
This
seems somewhat of a misnomer, since it refers to the names of methods (see below: What are
called "methods" in Squeak are in fact Squeak's programs, that are
collected in classes, where a class is a collection of programs for a
specific purpose).
There
is a somewhat important NOTE: argument_names cannot be assigned to (at least
it should be disallowed). By convention argument names begin with lowercase,
but this is not enforced by the system (though the parser may complain when
beginning with upper case). Thus, an argument_name in Squeak is not "an
object", because nothing can be assigned to it.
Now
the general approach of Squeak towards getting things done is to have written
or gotten somehow a class of behaviors, named by methods, which may be
executed by naming the class and the method and sending both to
Squeak.
The
class has an identifier, and the method an argument_name and possibly some
parameters. In Squeak, there are three basic kinds of messages: Those with no
parameters, those with one parameter, and those with more than one parameter.
These are distinguished by the following terms:
|
unary_selector = identifier.
|
By
convention unary message names begin with lowercase, but this is not enforced
by the system. A simple instance is: "2 sin" that when send to
Squeak will be calculated as the sinus of the number 2. Here "2" is
a name of the class (the nymber 2 in this case) while "sin" is the
name of a unary_selector.
There
are quite a few unary_selectors in Squeak, for quite a few different
purposes. It is a bit different with the next kind of message:
|
binary_selector = (special_character [special_character]) | ("-"
[special_character]) | "|".
|
The
difference is that binary_selectors are mostly used in mathematical contexts,
and are mostly the standard mathematical arithmetical terms like +, -, * etc.
Here
it should be remarked once more (without explanation) that in Squeak numbers
are represented in a somewhat different way than in other programming
languages. (This needs some getting used to, but Squeak is remarkably
powerful with numbers as well.)
|
keyword =
identifier ":".
|
This
is used to define key_word messages (below), that correspond mostly to the
methods with more than one parameter - of which there are very many in
Squeak. By convention keywords begin with lowercase, but this is not enforced
by the system.
4. Statements
We
arrive finally at the statements of Squeak, that the user needs to make
Squeak do anything. I start with the very basic one:
|
assignment_op = ":=" | "_".
|
This
is a binary term, used e.g. thus: myArray := #(5 'a' #(5 'a')). This
declares the variable "myArray" and assigns it the constant array
"#(5 'a' #(5 'a'))" (showing a reflexive feature possible in
Smalltalk that may interest logicians).
The
term ":=" is the classical Smalltalk operator of assignment. In
Squeak one can also write instead an underscore: "_" which is displayed as
left-arrow (but at present not in all fonts of Squeak).
|
message_expression =
unary_expression | binary_expression | keyword_expression.
|
These
are the three basic kinds of messages described in the prevuous section. The
basis for these expressions is
|
primary = variable_name |
argument_name | literal | block | brace_expression | "("
expression ")".
|
The
first three of these are names for constants in Squeak; the last three names
for expressions in Squeak that Squeak can calculate a value for.
|
unary_object_description =
primary | unary_expression.
unary_expression =
unary_object_description unary_selector.
|
These
two define the first of the three kinds of message_expression in Squeak. For
the second kind there are the following definions:
|
binary_object_description =
unary_object_description | binary_expression.
binary_expression =
binary_object_description binary_selector unary_object_description.
|
Next,
there is the last kind of message_expression, for which we need the
following:
|
keyword_expression =
binary_object_description [keyword binary_object_description]+.
|
At
this point the three kinds of messages of Squeak are defined.
The
next point is to define sequences of messages and relate them to methods. The
first is done as follows:
|
message_pattern =
[unary_selector | binary_selector argument_name | [keyword argument_name]+.
|
The
second thus:
|
method = message_pattern
[temporaries] [primitive_declaration] [statements].
|
The
extra in method compared to message_pattern consists of the wherewithall to
make more complicated calculations and logical decisions, and is defined as
follows, insofar as the necessary definitions have not been given yet:
|
primitive_declaration = "<" "primitive:" decimal_digits
">".
|
Squeak's
Virtual Machine comes with a considerable number of basic operations
implemented by primitives, all of which have a unique identifying number. (There
are efforts to add names to these, so that users have a better idea what they
do, but sofar this has not been done. To change primitives one has to change
and recompile Squeak's Virtual Machine, and indeed manage some programming in
C or C++).
To
define statements we
need to define the following
|
expression =
[variable_name assignment_op]* (primary | message_expression |
cascaded_message_expression)
|
which
is the somewhat misleading term used for a single statement of Squeak. The
only undefined term in it is defined thus:
|
cascaded_message_expression = message_expression [";" ( unary_selector |
binary_selector unary_object_description | [keyword
binary_object_description]+ ) ]+.
|
Note
this is much like message_expression.
The basic difference is related to the ";" which in turn is a way
to return everything calculated to the initial object named in the
message_expression. (This is explained elsewhere in more detail.)
We
arrive at statements, which in fact are sequences of expression
|
statements = [expression "."]* ["^"] expression
["."].
|
The
"^" is a constant of Squeak that in fact assures that Squeak
returns the value it has calculated. This is always the last statement
in a block or method (but because of logical alternatives needs not be the
last line in the block or method).
Note
also that expressions are separated by dots, and that it is a convention not
to write a dot behind the last expression in statements.
Finally,
we come to a powerful implementation of a basic method in mathematical
logic, called lambda-conversion. In Squeak this is implemented by so-called
blocks, defined thus:
|
block = "["
[[":" argument_name]+ "|"] [temporaries] [statements]
"]".
|
It
should be noted that as of version 2.6, Squeak has block local temporaries in
a somewhat limited form. Squeak does not yet handle blocks as full closures
-- block arguments are actually compiled as "hidden" temporaries
and block local temps have the same name scope as the method temps.
To
finish this semi-formal specification of the Squeak Language, it remains to
mention
|
comment =
""" [character | """ """ |
"'"]* """.
|
A
comment may appear anywhere in Squeak code, and simply acts the same as
whitespace as far as Squeak is concerned.