Forth Meets Smalltalk (FMS) version 3.0 October 2010 Douglas B. Hoffman ============================= WHAT IS FMS? ============================= FMS stands for Forth Meets Smalltalk. Presented here is a "specification" for the FMS Forth object programming extension. An implementation of the specification can be done in different ways as long as the behaviors adhere to the specification. I provide two example reference implementations, each using different implementation techniques with each technique having relative strengths and weaknesses. The *use* of each implementation, however, is identical. Note that this is not a proposal for a Forth ANS standard, although I suppose it could be used as one. ============================= BACKGROUND ============================= In the past few years I have written several different object extensions and looked closely at a few written by others. What I present here is a specification for what I believe to be an excellent and practical Forth object extension. FMS is a distillation of what I have found to be an easy to use yet complete object system. There are strong opinions about what such a specification should be. So I try to provide rationale for the specifications chosen. FMS syntax often resembles Neon. Andrew McKewan presented a Neon-like ANS Forth extension in 1997 (Object Oriented programming in ANS Forth, Forth Dimensions, March 1997). FMS syntax often resembles Neon and the example reference implementations are loosely based on McKewan's work. But there are significant differences with FMS. The FMS class building syntax and behaviors bear a strong resemblance to Neon with the notable exceptions of object-message ordering and the use of methodless instance variable access (and a few other features as detailed below). Object-message ordering, as opposed to message-object, seems to be a requirement for a Forth object extension in order to gain wide acceptance. I agree that this ordering holds some inherent advantages, though not overwhelmingly so. There are also advantages to message-object ordering, in particular it makes it easy to do away with wordlists, but using wordlists is not a large disadvantage. With at least one other object extension the use of wordlists presents a more serious problem (whether or not this problem can be resolved is unknown to me). See "THE PROBLEM WITH WORDLISTS" below. Basically, the problem with wordlists is that there is not an ANS way to have wordlists persist after a dictionary save. So each Forth will have its own likely unique way of doing this. On balance I chose object-message ordering for FMS. Instance variables (ivars) can be easily be declared as embedded objects or declared using a generic ivar primitive. This is the way Neon, Mops, SwiftForth's SWOOP, McKewan-97, and Win32Forth behave, and perhaps others. The alternative would be to either not use objects-as-ivars, which I think would be a mistake, or to create the desired ivar object at instantiation and manually store it in a container-type ivar. The latter is inconvenient, tedious, and perhaps most importantly does not allow using embedded objects-as-ivars in dictionary based objects (unless one issues some kind of renew message at each programming session or each time a turnkey is launched). When defining a class, messages sent to self can either be early bound or late bound by using the SELF or [SELF] pseudo ivars respectively. Some believe that all messages to self should be late bound and I understand the rationale for this belief. But in practice I think that this is best specified by the programmer during the design of the classes. Always using late binding to self is certainly the most general technique as it allows for maximum code reuse in subclasses. But there is a performance cost because late binding is always slower than early binding. Also, it is easy to just redefine specific superclass methods to use late binding to self when an opportunity becomes apparent for code reuse in a subclass. This is the way Neon, Mops, SwiftForth's SWOOP, McKewan-97, and Win32Forth behave. Besides, when methodless ivar access is used, instead of sending a message to the ivar, we have in essence "early bound" that part of the method definition. So the only way to remain pure in terms of always having late bound messages to self is to only use messages when accessing instance variables. This could be done but methodless ivar access, for non-complex ivar interaction, offers the advantages of speed and simplicity and also seems to be a desired feature by most Forth programmers. I do not disagree. Note that messages sent to public objects are always late bound. However sending a message to an embedded ivar-as-object will usually be early bound because we do not expect the class of an ivar to change. But if we are using an ivar as a container of an object then messages sent to the "contents" of that container will, and must, be late bound. Some claim that if we just make late binding fast enough then we should not care if late or early binding is used. This sounds reasonable but in practice early binding will always have a significant efficiency advantage. ============================================= THE SPECIFICATION of FMS 3.0 IN GENERAL TERMS ============================================= We will use Smalltalk terminology for most object related discussion and descriptions, especially the following terms: Class Object Instance Variable (or ivar for short) Subclass Superclass Inheritance Message Method SELF (pseudo ivar only used in method definitions) SUPER (pseudo ivar only used in method definitions) Message Sending Early and Late binding of messages to methods Polymorphism - FMS is a class-based (as opposed to prototype-based) single inheritance model. Full inheritance of all methods and data (ivars) is achieved in subclasses. - A class is not an object. It is best thought of as an object template or an "object factory". - Smalltalk-like duck typing is used. Any message can be sent to any object. If the message is not valid for that object then an error occurs. - Messages can be ticked and the resulting xt used in the expected way on objects. - When defining a class, creating a message name, creating the associated method code, and binding that message name to that code are all done in one step, as in Smalltalk and SWOOP. There is no need to declare interfaces. Interfaces are not used. - If a new method (message) over rides an existing method then the object system will just handle that implicitly, as in Smalltalk. No manual programmer declaration of "override" or whatever is required. The intent should be obvious. Over riding methods is a very common thing to do when defining new classes. - Declaring instance variables when defining a class includes the ability to simply declare the ivar as an object of an existing class. The alternative would be to first declare a container ivar, then instantiate an object of the desired class, and finally storing that object in the container. However, this can easily be done with FMS if desired. - If an instance variable is declared to be of a certain class then messages sent to that ivar will be early bound for efficiency (speed). After all, we do not expect that the class of the ivar will change. But certainly late bound messages can be sent to the "contents" of an ivar if that ivar is some sort of container of an object. - When defining a class it is the choice of the programmer to use early or late binding when sending a message to SELF. If one insists that all messages to SELF be late bound, then what are we to make of methodless ivar accessing? Giving the programmer full control over whether to use early or late binding, or no binding at all, seems to be more in the spirit of Forth programming. Note that there is nothing inherent about FMS that prevents us from making all messages to self late bound. Admittedly, defining all messages to self as late bound could result in a possibly easier ability to reuse method definitions later on in subclasses. This is a real and valid benefit. The downside, in addition to slower program execution due to many more late binds, is the greater likelihood of infinite loops when a subclass method uses a superclass method which references back to the subclass via late binding. In practice it has been the author's experience that when an opportunity for code reuse efficiency via late binding to self occurs, it is a simple matter to just redefine the superclass binding involved by changing SELF to [SELF] just for the particular method involved, taking care to not produce the infinite loop mentioned above. - Sending a message is a "special" event and it should be clear, when reading source code, when a message send occurs. There are many ways to do this. In FMS we require that all message names end in colon ( message: ). Some seem fixated on wanting to hide the use of a message send thus making it indistinguishable from a normal Forth call. This a mistake, in my opinion. For example, if we define "@" to be a message then we can wind up with code that looks like the following: foo @ \ a normal fetch from the VARIABLE foo, or the address presented by the execution of foo. bar @ \ a message send (here, @ is a message) to the OBJECT bar, or the object presented by the execution of bar. What if we wish to apply the "normal" Forth word @ to whatever object bar leaves on the stack? Perhaps more importantly, how do we know, when reading the source code, if @ is a message or just the normal Forth word fetch? Bar may be, for example, a rectangle object and the @ message may return four items on the stack (top, left, bottom, and right coordinates). It is very simple to mostly avoid such confusion by adopting the recommended message naming convention. If there is a name conflict with another word that ends in colon then that conflict should be handled as you would any naming conflict. For example if we name a selector FIELD: and our Forth system reports that we have just redefined FIELD: then we can either ignore the redefinition or choose another name for our message. Simple. - Methodless ivar access. Conventional OOP practice would dictate that the data contained in an object must *only* be accessed via a message send to the object. This provides a level of security for a program's data. At one time I felt strongly about this and always required message sends. But others have convinced me that it is probably more in the Forth tradition to allow data access without having to define accessor messages. So I have provided for methodless access in FMS. I would simply say that one should use methodless ivar access with care. - There is a way to declare instance variables as members of records with a REC{ ... }REC syntax. Normally the ivars in an object each have a header that identifies the class of the ivar. But you may want to use the ivar list as you would use a structures list, that is with nothing but data in contiguous cells (with perhaps padding for alignment). Records allow you to easily do this. The only restriction on ivars declared within a record is one cannot send messages to the record ivar that also involve sending late-bound messages to SELF as defined in the class of the ivar (if the record ivar is declared as an embedded object). For clarity, this is illustrated in the following example: Consider a class, named VAR, that is defined as: :class VAR cell bytes data :m !: ( n -- ) data ! ;m :m @: ( -- n ) data @ ;m :m print: self @: . ;m \ early bound @: message sent to self :m printLate: [self] @: . ;m \ late bound @: message sent to self ;class Note that method print: uses an early bound message to self but method printLate: invokes a late bound message to self. Now consider a class, named TEST, that uses two VAR embedded objects as an ivars: :class TEST var x \ x is a normal ivar embedded object rec{ var y }rec \ y is an ivar embedded object as a record :m printX: x printLate: ;m :m printY: y printLate: ;m \ this method will fail :m printY2: y print: ;m \ this method will work ;class Note that if ivar y above were used as a *container* of a VAR object then one could use the printLate: message. i.e., :m printY3: y @ printLate: ;m \ this method will work if y contains an object - If a new message name is created that is the same as an existing (non-message) public name, then that name conflict is handled just like any other public name, then that name conflict is handled just like any other name conflict in your Forth. You may want to observe the "redefined" warnings issued by your Forth system when compiling an FMS class or making a new FMS selector. - Objects may be instantiated as named dictionary-based or as nameless heap-based. - Implicit initialization of objects and all of an object's ivars (including ivars of any superclasses) is supported. - Indexed ( one dimension) objects/ivars are explicitly supported - Error checking for invalid message sends and out of bounds indices are provided. - When defining a class, the default superclass will be OBJECT if no superclass is explicitly provided. - Arrays of a class of objects can be easily created in the dictionary using objArray() example: :class point var x var y :m show: x @ . y @ . ;m ;class 20 objArray() point points() 0 points() show: 1 points() show: ... - PUBLIC and PRIVATE declarations for methods or ivars are not supported (but they could be). Especially with methodless ivar access, I do not see the point in trying to make code more "secure" by using PUBLIC and PRIVATE. - Class instance variables are not supported (but they could be). - Multiple inheritance is not supported (but it could be). ======== WHY OOP? ======== Some claim that the main value of object programming is for creating graphical user interfaces. While useful for that, there is no reason to limit OOP use to GUIs. Some of the important benefits of object programming include (but are not limited to): A higher order of information hiding and so easier handling of program complexity; a consistent way to organize code; a higher frequency of code reuse; a significant reduction in the number of Forth words to be remembered. For example, the *same* descriptive names for messages, such as size: or search:, can be reused for an unlimited variety of data types (objects). ================== THE FMS USER WORDS ================== The FMS user words are often the same as in Neon, but there are some very distinct differences in the usage of some of the words. Below is a list of the FMS user words with a brief description of their purpose. Stack effects are not shown. A detailed usage glossary for these words, with stack effects, follows the list. 1) :CLASS \ begin a class definition 2) \ directs message to a specific superclass 11) OBJECT \ the root class for all classes 12) HEAP> \ instantiate an object in the heap 13) 14) IV \ methodless access to an ivar while interpreting 15) [IV] \ methodless access to an ivar while compiling 16) " -- ) Begins the definition of a new class. This is a defining word. The name of the new class must come directly after :CLASS. The class name is later used in the following four ways. 1) At compile time to instantiate a dictionary object. Simply execute the class name and an unnamed dictionary object is returned on the stack. The object can be, and usually is, then stored as a constant or in a value (or any place the programmer wishes to put it). 2) At run time to instantiate an unnamed object in the heap. See HEAP>. 3) To create ivar definitions in other class definitions. These ivars then behave as embedded objects of the given class, responding to messages appropriately. 4) As a pseudo object. A message can be sent to a superclass name when the message is preceded by SUPER>. This will result in use of the named superclass message. " -- ) Declares the superclass of a new class being defined. " -- ) The sole primitive for defining new ivars. Bytes requires the size of the ivar, in addressable units and must be followed by the name of the ivar. Only used inside a class definition. :M ( "spaces" -- methodXT ) Begins a new method definition. Can only be used inside a class definition. Performs three functions simultaneously: 1) Defines a new message name only if it has not previously been defined in any class (message names have global scope). Note two things about message names: a. All message names must end in colon. b. The method compiler will detect and warn of name collisions between a message name and an already-defined non-message name. 2) Defines the method that is to be invoked when an object or ivar of that class receives the given message. 3) Implicitly over rides, if necessary, existing methods in the superclass chain that are associated with the given message. ;M ( methodXT -- ) Ends a method definition and stores the method's XT where it can be subsequently retrieved. SELF ( "spaces"-- ) or if no following message ( -- addr ) addr = base address of object A pseudo ivar only used in method definitions. When it is the receiver of a message it will compile an early bound message send using the method that has already been defined either in the current class or in the superclass chain hierarchy. It cannot be used without a following message. [SELF] ( "spaces" -- ) or if no following message ( -- addr ) addr = base address of object A pseudo ivar only used in method definitions. When it is the receiver of a message it will compile a late bound message send using the method that has already been defined either in the current class or in the subclass chain hierarchy. When used without a message it will return the base address of the object exactly with SELF. SUPER ( "spaces" -- ) A pseudo ivar only used in method definitions. As the receiver of a message it will compile the method that has already been defined one level up in the superclass chain hierarchy, skipping over the already defined method in the current class definition. SUPER must be followed by a message name. If the method has not been redefined then this use of SUPER will be equivalent to the use of SELF. SUPER> ( "spaces" "spaces" -- ) Similar to SUPER except that a method of a class from *any* superclass in the inheritance chain will be used. SUPER> must be followed by a superclass name and a message name. HEAP> ( "spaces" -- ^obj ) Instantiates a nameless object on the heap. Must be followed by the name of a class and will return an object pointer. is a compile time only word. It cannot be used outside a word definition. above. IV ( ^obj -- ivar-addr ) \ input stream: "spaces" Provides methodless access to the ivars of an object. Interpret use only, cannot be compiled. Essentially a programmer's convenience tool. [IV] ( ^obj -- ivar-addr ) \ input stream: "spaces" Version of IV for to be used for compilation state. " -- ) The recommended way to create a new selector that is not immediately associated with any method. Can be used inside or outside a class definition. As long as the message is only used as late bound ( such as ' [self] message: ' ) then compilation will proceed as expected. But before the message is actually sent at run time there must be a method defined for the selector (or message). See example class SEQUENCE for how this may be used. ERRORCHECK ( -- flag ) A constant used as a compiler directive. If set to TRUE, which is recommended during development, then error checking will be performed for the validity of messages sent to objects/ivars and error checking will be performed for the validity of an index sent to an indexed object via ?IDX. After a program has been debugged, it should then be recompiled with errorcheck set to false for somewhat faster program execution. REC{ ( -- ) Used in the ivar declaration list in a class definition. Marks the beginning of a record. The following list of ivars will comprise a contiguous list of data, essentially as in a structure list. The record list is ended with }REC. It is important to know that late-bound messages may *not* be sent to members of a record. }REC ( -- ) Used to mark the end of a record list. See REC{ above. INIT: ( -- ) \ strongly recommend that INIT: not consume or return stack items I believe that a proper object system should have a default initialization method that is automatically invoked whenever an object is instantiated. In FMS we use the INIT: message. Class OBJECT, the root class of all classes, has a default INIT: method which does nothing so it is not necessary to define an INIT: method unless you need it. Also, years ago in Mops it was observed that any explicit call to INIT: was always preceded by a call to SUPER INIT: (actually, in Mops/PowerMops it would be CLASSINIT: SUPER ). Therefore it was decided that whenever an INIT: method was defined then the object system would also automatically call SUPER INIT: "prior" to sendiing the INIT: message to the newly created object. If the SUPER INIT: call was not wanted, then one could undo what it did in the INIT: method of the object's class. This behavior might best be also explained by an example as follows: :class var1 cell bytes data1 :m init: -1 data1 ! ;m :m @: ( -- n) data1 @ ;m ;class var1 v1 v1 @: . -1 ok \ object v1 will automatically be instantiated to -1 Now consider a subclass of var1: :class var2 className" "name()" run time execution of name(): ( idx -- ^obj(idx) ) Note that if the errorCheck flag is true then the index passed to "name()" will be checked for validity. ========= USING FMS ========= The following FMS code example should give you an idea of what programming in FMS is like. Note the syntax similarities to Neon, including simple use of embedded objects as ivars in class definitions. Also note the object-message syntax. \ Begin definition of a new class, named var. The implicit superclass is class OBJECT. :class var cell bytes data \ define an ivar, named data, using the bytes primitive :m !: ( n -- ) data ! ;m \ ivar addr is obtained by executing its name :m +: ( n -- ) data +! ;m :m @: ( -- n ) data @ ;m :m p: ( -- ) [self] @: . ;m \ print self, late bound ;class var x \ Instantiate an object in the dictionary named x. 45 x !: \ Store 45 in object x by sending the !: message. x p: \ Print object x by sending the p: message. 45 :class point var x \ ivar as embedded object of class var var y :m show: x p: \ access ivar via message send y @ . \ methodless ivar access inside a class definition ;m :m dot: ." Point at " [self] show: ;m \ this is a late bind of show: to self ;class point origin \ instantiate a point object named origin \ methodless ivar access outside a class definition 5 origin IV x ! 8 origin IV y ! origin dot: \ send the dot: message to origin Point at 5 8 :class rectangle point ul point lr :m show: ul dot: lr dot: ;m :m dot: ." Rectangle, " self show: ;m \ this is an early bind of show: to self ;class rectangle r r dot: Rectangle, Point at 0 0 Point at 0 0 :class label-point and myClass to myHeapObject ; \ instantiate an object in the heap myHeapObject myMessage: \ send a message to the heap object myHeapObject