Welcome to the Squeak  pages of Maarten Maartensz. See:  Map + Tour + Tips + Notes + News + Home


Squeak-subjects

 

 

The Squeak Language stated semi-formally

The following more or less formal specification of the Squeak Language is derived from an  EBNF definition of Squeak 2.7alpha by - Dwight Hughes. It comes from the Swiki, and  should be considered only a snapshot of Squeak at this version. 

By "EBNF" is meant "Extended Backus-Naur Formalism". This is a way to write formal grammars widely used in linguistics (and elsewhere) and is named after its originators.

The EBNF used here is defined as:

[ ... ] apply zero or one times;
[ ... ]* apply zero or more times;
[ ... ]+ apply one or more times;
... | ... choose one of the alternatives;
"..." use the literal characters enclosed; 
( ... ) used for grouping.

In what follows the defined terms are set in bold-face red, and  I divide the specification in some convenient groups, namely


1.
Letters and digits
2.
Interpunctions
3.1:
Terms: Variable identifiers of various kinds
3.2:
Terms: Arrays
3.3:
Terms: Numerical terms
3.4:
Terms: Logical Constants
3.5:
Terms:  Logical Variables
4.
Statements 

This sequence works from the smallest and simplest expressions  in Squeak (letters and digits) to the longest and most complex expressions in Squeak (methods, messages, blocks).


1. Letters and digits

Since Squeak is a formal language like ordinary algebra - which I suppose you to be somewhat familiar with - and is made up for a computer from strings of characters it makes sense to start with saying from what letters and other squiggles the well-formed strings of the Squeak Language (SL) are composed

letter = uppercase | lowercase.

Letters in SL are either uppercase or lowercase, and specifically these are

uppercase = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z".


lowercase = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z".


So the "letters" of Squeak are just what an English-speaking person would expect and what can be found on a standard English qwerty-keyboard as such. In any case, also in what follows, the vertical stroke "|" without quotation-symbols is like an exclusive or. Thus in uppercase the intended meaning is: Any one of "A" .... "Z" (without surrounding quotation- marks) is an uppercase letter and indeed nothing else.

There are more letter-like squiggles in English and on qwerty-keyboard, and for Squeak these are nameda bit differently from standard EnglishL

 

character = ("[" | "]" | "{" | "}" | "(" | ")" | "_" | "^" | ";" | "$" | "#" | ":" | "-" | "|" | ".") | decimal_digit | letter | special_character.


special_character = ("+" | "*" | "/" | "\" | "~" | "<" | ">" | "=" | "@" | "%" | "&" | "?" | "!" | "`" | "," ).

What are called the "characters" of  SL are mostly used in Squeak as grouping terms, and the special_characters indeed have special roles, mostly in arithmetical or logical statements.

The numerical symbols in Squeak are again what English speakers would expect:

decimal_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".

A decimal_digit in Squeak is precisely what one would expect it is, and more generally

digits = [digit]+

are squences of one or more digits. So "digits" are squences made up of one or more digit-characters, but a terminological oddity (probably due to the use of letters in  arithmetical systems that have more that 10 basic digits, for which capital letters are normally used) is that a digit is not just a decimal_digit, but may also be am uppercase letter:]

digit = decimal_digit | uppercase.

2. Interpunctions

If you look closely at any normal English written text of a page or more in length, you'll find that something like a third of it may consist of interpunction, like blanks, dots, commas and the like. Interpunctions serve as means of grouping characters and terms and to help the human reader. It is often referred to as "whitespace", precisely because in printing practice so much is indeed made up of white space without any character.

In  SL there is interpunction as well, and indeed its purpose is to help the human readers of the Squeak Language:

whitespace = [space | tab | newline]+.

This  is the whitespace  in SL, in fact mostly defined by reference to a standard keyboard, with a space and a tab key (both of which are represented in the computer by specific numbers: computers have no use  for whitespace).  Note that whitespace consists of one or more of space | tab | newline, as indeed conforms to human writing and typing practice.

There is a tricky bit involved in newlines, that correspond to the Enter-key on the keyboard: 

newline = cr | lf | crlf.

The tricky bit arises from the desire to cater to many OS-s: On a Mac newline is cr; on Unix newline is lf; and on Dos newline is crlf (so a sequence of the previous two).

In Squeak the standard newline is cr on all platforms, but this concerns only text written inside Squeak, and not text written on other systems and filed into Squeak, for which reason the lf and  crlf exist in Squeak.

Finally, the general  function of interpunction is to separate a term from surrounding terms:

separator = whitespace | comment.

This only adds comment, defined below as anything occuring between two double  quote-marks. Comments occur in the SL to help the user. When Squeak parses a human users input it skips all comments, effectively treating it as whitespace, with which it also does nothing (except permit its use).

3.1 Terms 1: Variable Identifiers of various kinds

Sofar in fact we considered the smallest well-formed expressions in Squeak: characters, digits and interpunction. Terms of a language are well-formed expressions that are intermediate between characters and statements, and that have some sort of meaning on their own. In English what are called "terms" here are often also called "words" or "phrases".

In Squeak, most of the terms are called "identifiers", which is a fancy name for "name". There are several kinds of them. In this section I deal with the various kinds of variable identifiers in Squeak, used for different purposes in different contexts, which the user may introduce for his own ends:

identifier = letter [letter | decimal_digit]*.

This defines the general set of identifiers: Any string that is made up of letters or decimal_digits (and so without whitespace etc.). Likewise there is in Squeak:

capital_identifier = uppercase [letter | decimal_digit]*.

This is just like an identifier, except that it starts definitely with an uppercase letter. Squeak has both of them because of its convention (chosen but not imposed by the Squeak parser, in most cases) to have capital_identifiers for common names (names for possibly many things) and other identifiers for individual names (names for precisely one thing).

Next, there are in Squeak two explicit ways to mark special terms:

character_constant = "$"(character | "'" | """).


symbol_constant = "#" symbol.

The character-constants exist, among other things, to be able to deal with terms like "+" without making Squeak regard them as instructions to add. 

The symbol_constants exist, among other things, to make sure that Squeak treats the identifier that follows "#" as a unique name in the system, and is used everywhere in its code to name parts of it. What is a "symbol" in Squeak is defined below:

symbol = identifier | binary_selector | [keyword]+ | string.

"Binary_selector" and "keyword" are defined further down, and "string" immediately below. The general point of symbols in Squeak (as in many other computer-languages) is to have unique names for things in the system.

One important point to notice (that may differ from conventions in other languages) is that for Squeak a string is in fact a symbol and so a kind of complex constant.  Strings are explicitly defined as follows:

string = "'" [character | "'" "'" | """]* "'".

The point of this definition is that a string is a sequence of characters possibly with comments before, inside and after it (and comments are very useful inside programming code, to explain what happens and may be problematic).

3.2: Terms 2: Arrays

At this point we have defined the letters, digits, interpunctions and variable identifiers of the Squeak Language, but in fact have not yet introduced any wherewithal to do anything useful. This we start now, with arrays.

An array is a sequence of distinct components that can be stored and recalled as a unit by a computer. It occurs in most computer-languages, since it provides a basic way of storing and retrieving information. 

array = "(" [number | symbol | string | character_constant | array]* ")".

Thus an array is written as a bracketed  sequence of items, that may in general be about anything, including arrays. One main limitation on arrays, also in Squeak, is that they must be pre-declared and have a fixed length. A nice thing about the notation for arrays in Squeak is that the separator used is not the comma, as in most languages, but the empty character. This is easier to read, especially in long arrays.

Next, often the most convenient thing to store something in  computer-memory is in an array of constants. In Squeak, this is defined with help of the following term:

literal = (number | symbol_constant | character_constant | string | array_constant).

The literals consist of those items that are constants for Squeak. These are used in the next definition:

array_constant = "#" array of literals

Here we see a convention at work in the previous section, namely the use of the prefix "#" to  indicate that the  rest of the string following it is a symbol and so a constant. 

3.3 Terms 3: Numerical Terms

In Section (1) of this  specification, digits were defined, which enables Squeak  to deal with terms for simple natural numbers. But there are many more kinds of  numbers in mathematics, and Squeak provides for these as follows, in a way differing from other computer languages:

number = ["-"][radix"r"]["-"]digits["."digits]["e"["-"]exponent].

The first character is an option "-" for negative numbers. The radix  (or base, in standard mathematics in English) is specified thus

radix = decimal_digits.

It is the number-base of the numerical expression following it (explained in beginning algebra). NOTE: In fact radix is between 2 and 36 inclusive (actually, Squeak checks only the lower bound - you may make the upper bound as large as you wish, but you can represent only the first 36 digits of the larger base; numbers entered using them are interpreted correctly however). Also,  the set of digits allowed in a number of radix N is the first N characters of the string '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', while the default radix = 10, as in ordinary mathematics, and need NOT be explicitly written if it this is intended. In  case you want to calculate on an alternative base, you must explicitly specify the radix, and then Squeak will use it. Note the radix in decimal_digits is followed by the letter "r", marking its end and the beginning of the actual digits of the number.

Having written the radix and the number, one may follow this by the letter "e" for "exponent", followed by more digits:

exponent = decimal_digits.

This is also as in elementary mathematics.

Note that the given definition of "number" in Squeak allows Squeak to represent and indeed calculate very many different kinds of numbers, including binary (radix=2), octal (radix=8) and hexadecimal (radix = 16). If you are neither a mathematician nor a computer scientist much of this will probably be useless, but since computers store everyting in fact in powers of 2, these may be very handy.

3.4. Terms 4: Logical Constants

Now we finally arrived at the beginning of Squeak's processing. First, there are the all important:

pseudo_variable = "true" | "false" | "nil" | "self" | "super" | "thisContext" | "homeContext".

These terms are called "pseudo-variables" because they have on the one hand a variable meaning in variable  contexts (rather like pronouns in English) but on the other hand cannot be assigned a different content, as all variables can be in Squeak.

It need here only be note that "true", "false" and "nil" are used for logical processing, while "nil" is also used for assignments (see below). The "true" and "false" are Squeak's  Booleans, while "nil" is a very convenient addition to these.

Next, "self" and "super" are used to deal with Squeak's so-called inheritance, and basically instruct Squeak where to find certain methods (see below for methods).

Finally, "thisContext" and "homeContext" are used mostly internally by Squeak in the processing of blocks and other code, to keep track of what is where and belongs to what.

I noted above that "true" and "false" are Squeak's Booleans. These are implemented in Squeak in another way than in other languages, and these come with the  following:

special_keyword = "ifTrue:" | "ifTrue:ifFalse:" | "ifFalse:" | "ifFalse:ifTrue:" | "whileTrue" | "whileTrue:" | "whileFalse" | "whileFalse:" | "and:" | "or:" | "to:do:" | "to:by:do:".

Except for the last two, all of these are used for dealing with logical alternatives and possibilities, while the last two are used for passing parameters.

Three remarks should be made: 

First, In Squeak, the colon ending a term is used to indicate that a parameter follows. In the case of terms dealing with Booleans, these often are blocks (more of which below).

Second, there are some more terms in the category special_keyword, namely the Booleans involving nil, such as "ifNil:". In general, it seems not completely clear at present what is counted as belonging to the very basis of Squeak, and what doesn't, and one reason is that in fact the logical processing in Squeak is somewhat differently implemented from other languages. 

Third, in fact the operations indicated by these keywords are normally optimized/inlined by Squeak into tests and jumps and are not sent as actual messages. This happens for speed reasons, and because these tests and jumps are quite simple and very universal. However, you can have the effect of ordinary messages (useful for debugging) by using 

#perform:, #perform:with:, #perform:with:with:, and #perform:with:with:with:.

This is another series of special_keyword. (Not recommended, unless you are debugging.) The "with:" etc. is a general way to pass 1, 2 or 3 parameters in Squeak.

3.5. Terms 5: Logical Variables

The parts in the previous section are concerned with processing by Squeak and are constants. This section treats variables, and it should  first be noted that in Squeak variables are rather special and different from other languages.

In Squeak, variables are as it were named slots used for arbitrary storage. That is, for Squeak a variable is processed as made up of a string which is a Squeak identifier, and a contents, which has been assigned by Squeak or the user. These pairs of a name and the  contents it refers to is called "variable" because the contents can be changed, while the name remains the same. 

Here there are two points of importance: Squeak initializes anything it recognizes as a variable as nil, until this is undone. (So all variables refer to some contents, if only nil.)

Second, and most important for Squeak: The contents of a variable are anything that can be represented by a well-formed Squeak-expression. This gives Squeak very great power, and it also liberates the user of having to add type-declarations for variables, as is usual in other computing languages, where there are variables but normally of a specific kind, that needs explicit declaration, like "integer", "float" or "string".

So the expressions for v ariables that follows are in fact handles for storage-spaces for the user of Squeak. They come in several kinds, depending on the purpose they serve in Squeak:

variable_name = identifier.

This concerns the names for storing the contents of the "variable" of that name, by assignment (see below). By convention variable names begin with lowercase, but this is not enforced by the system, though it may ask when one uses an initial uppercase whether the variable is to be stored as Global, i.e. accessible to the whole system, and not just by the part in which it is declared

temporaries = "|" [variable_name]* "|".

Temporaries are variables that are only accessible and maintained by Squeak when processing the code in which they occur. They are declared by the user by means of writing them between two bars, separated by whitespace. (Like before, variables are initialized to nil when declared: As soon as you've written "| blab blub |" in a Workspace Squeak has somewhere stored "blab" and "blub" pointing to nil as long as nothing else is assigned to them.

class_name = capital_identifier.

In fact Squeak's classes are Squeak's programs, and one must refer to these by identifiers starting with an uppercase letter. (Note new class_names are usually declared and added in  a browser.)

There are several types of variables, all named by identifiers I list here without explications:

class_variable_name = capital_identifier.
instance_variable_name = identifier.
class_instance_variable_name = identifier.

By convention both instance variables and class instance variables begin with lowercase, but this is not enforced by the system.

Sofar, we have dealt with names for the programs in Squeak, and now we turn to parts  of programs of Squeak: 

argument_name = identifier.

This seems somewhat of a misnomer, since it refers to the names of methods (see below: What are called "methods" in Squeak are in fact Squeak's programs, that are collected in classes, where a class is a collection of programs for a specific purpose).

There is a somewhat important NOTE: argument_names cannot be assigned to (at least it should be disallowed). By convention argument names begin with lowercase, but this is not enforced by the system (though the parser may complain when beginning with upper case). Thus, an argument_name in Squeak is not "an object", because nothing can  be assigned to it.

Now the general approach of Squeak towards getting things done is to have written or gotten somehow a class of behaviors, named by methods, which may be executed by naming the class and  the method and sending both to Squeak. 

The class has an identifier, and the method an argument_name and possibly some parameters. In Squeak, there are three basic kinds of messages: Those with no parameters, those with one parameter, and those with more than one parameter. These are distinguished by the following terms:

unary_selector = identifier.

By convention unary message names begin with lowercase, but this is not enforced by the system. A simple instance is: "2 sin" that when send to Squeak will be calculated as the sinus of the number 2. Here "2" is a name of the class (the nymber 2 in this case) while "sin" is the name of a unary_selector.

There are quite a few unary_selectors in Squeak, for quite a few different purposes. It is a bit different with the next kind of message:

binary_selector = (special_character [special_character]) | ("-" [special_character]) | "|".

The difference is that binary_selectors are mostly used in mathematical contexts, and are mostly the standard mathematical arithmetical terms like +, -, * etc.

Here it should be remarked once more (without explanation) that in Squeak numbers are represented in a somewhat different way than in other programming languages. (This needs some getting used to, but Squeak is remarkably powerful with numbers as well.)

keyword = identifier ":".

This is used to define key_word messages (below), that correspond mostly to the methods with more than one parameter - of which there are very many in Squeak. By convention keywords begin with lowercase, but this is not enforced by the system.

4. Statements

We arrive finally at the statements of Squeak, that the user needs to make Squeak do anything. I start with the very basic one:

assignment_op = ":=" | "_".

This is a binary term, used e.g.  thus: myArray := #(5 'a' #(5 'a')). This declares the variable "myArray" and assigns it the constant array "#(5 'a' #(5 'a'))" (showing a reflexive feature possible in Smalltalk that may interest logicians).

The term ":="  is the classical Smalltalk operator of assignment. In Squeak one can also write instead an underscore: "_" which is displayed as left-arrow (but at present not in all fonts of Squeak).

message_expression = unary_expression | binary_expression | keyword_expression.

These are the three basic kinds of messages described in the prevuous section. The basis for these expressions is

primary = variable_name | argument_name | literal | block | brace_expression | "(" expression ")".

The first three of these are names for constants in Squeak; the last three names for expressions in Squeak that Squeak can calculate a value for.

unary_object_description = primary | unary_expression.
unary_expression = unary_object_description unary_selector.

These two define the first of the three kinds of message_expression in Squeak. For the second kind there are the following definions:

binary_object_description = unary_object_description | binary_expression.
binary_expression = binary_object_description binary_selector unary_object_description.

Next, there is the last kind of message_expression, for which we need the following:

keyword_expression = binary_object_description [keyword binary_object_description]+.

At this point the three kinds of messages of Squeak are defined.

The next point is to define sequences of messages and relate them to methods. The first  is done as follows:

message_pattern = [unary_selector | binary_selector argument_name | [keyword argument_name]+.

The second thus:

method = message_pattern [temporaries] [primitive_declaration] [statements].

The extra in method compared to message_pattern consists of the wherewithall to make more complicated calculations and logical decisions, and is defined as follows, insofar as the necessary definitions have not been given yet:

primitive_declaration = "<" "primitive:" decimal_digits ">".

Squeak's Virtual Machine comes with a considerable number of basic operations implemented by primitives, all of which have a unique identifying number. (There are efforts to add names to these, so that users have a better idea what they do, but sofar this has not been done. To change primitives one has to change and recompile Squeak's Virtual Machine, and indeed manage some programming in C or C++).

To define statements we need to define the following

expression = [variable_name assignment_op]* (primary | message_expression | cascaded_message_expression)

which is the somewhat misleading term used for a single statement of Squeak. The only undefined term in it is defined thus:

cascaded_message_expression = message_expression [";" ( unary_selector | binary_selector unary_object_description | [keyword binary_object_description]+ ) ]+.

Note this is much like message_expression. The basic difference is related to the ";" which in turn is a way to return  everything calculated to the initial object named in the message_expression. (This is explained elsewhere in more detail.)

We arrive at statements, which in fact are sequences of  expression

statements = [expression "."]* ["^"] expression ["."].

The "^" is a constant of Squeak that in fact assures that Squeak returns the value it has calculated.  This is always the last statement in a block or method (but because of logical alternatives needs not be the last line in the block or method). 

Note also that expressions are separated by dots, and that it is a convention not to write a dot behind the last expression in statements.

Finally, we come to a  powerful implementation of a basic method in mathematical logic, called lambda-conversion. In Squeak this is implemented by so-called blocks, defined thus:

block = "[" [[":" argument_name]+ "|"] [temporaries] [statements] "]".

It should be noted that as of version 2.6, Squeak has block local temporaries in a somewhat limited form. Squeak does not yet handle blocks as full closures -- block arguments are actually compiled as "hidden" temporaries and block local temps have the same name scope as the method temps.

To finish this semi-formal specification of the Squeak Language, it remains to mention

comment = """ [character | """ """ | "'"]* """.

A comment may appear anywhere in Squeak code, and simply acts the same as whitespace as far as Squeak is concerned.

 

 


Welcome to the Computing  pages of Maarten Maartensz. See:  Map + Tour + Tips + Notes + News + Home