Discrete Bindings Implementation Notes

The X3J20 standard requires quite flexible and dynamic discrete variable bindings, especially when it comes to global variables. This documents some of the aspects and workings of discrete bindings.

Global Bindings

Those are probably the worst. X3J20 mandates few things:

  1. A <<program element>> may reference any global name regardless of whether the definition of the global name proceeds or follows the <<program element>>.
  2. It is erroneous if two or more <<program element>> definitions use the same identifier as a global name.
  3. A reference to a name at some point in a program definition is resolved to the specific binding of the name that exists in the scope that is available at that point.
  4. The binding of a name within a scope may be specified as an error binding. Any reference to a name which resolves to an error binding is erroneous.
  5. A name scope may be defined as a composition of other, already defined, name scopes. ... If a binding for the same name appears in both the inner scope and the outer scope, the inner scope binding is said to shadow the outer scope binding. It is the inner scope binding that is available as part of the composite scope.
  6. A <<Smalltalk program>> introduces a name scope, called the global scope, that is available to all parts of the program. The <<program element>> clause of a <<Smalltalk program>> logically includes the definitions of any standard or implementation program elements used by the program.

Especial headache for us is requirement 1. and 6. I read X3J20 the following way:

X3J20 says:

Immediately prior to the execution of a Smalltalk program all statically created objects are in their initial state as defined by the Smalltalk program and the values of all discrete variables are undefined. Execution proceeds by sequentially executing each initializer in the order specified by the program definition. If a program accesses any variable that has not been explicitly initialized either by an initializer or by an assignment statement its value will be the object named nil.

I read this as:

X3J20 says:

A complete program is treated as a concatenation of the interchange files from which it is composed. Any names or objects that are predefined by an implementation are treated as if their definitions preceded the first file in this concatenation.

I interpret this as:

Defining and Referencing a Global Binding

Obviously, this is the part where we process <<program element>> definitions, but for now ignoring the initializers they may contain. Two scenarios exist:

  1. A definition references a global that's already defined. This is the classical C style, forward-only way of doing things.
  2. A definition references a global name that follows the definition, i.e. it will be defined later. This is the case we will concentrate on.

One might think that a. is a simple case, but in fact, X3J20 says that variables are resolved from the following composite scope:
((global scope + pool variable scope) + class variable scope) + class instance / instance variable scope

That puts us in a situation where a method may reference to variable X that's defined in the global scope, e.g. a class, but later in the code a pool variable may be defined, which appears in the pool variable scope and thus shadows the global variable. Therefore, even if we can resolve the binding at the time we encounter it in the source code, we can't be sure it resolves correctly. The only options we have is to defer this process until all definitions are filed-in. Every binding (except arguments and temporaries) are handled as case b.

We don't know how we'll end resolving the bindings. What we do is:

  1. Create an unresolved binding that we will later resolve to a concrete binding.
  2. Set attributes on the temporary binding depending on the constraints that may be placed on it.
    1. Source code position, so we can complain and display errors if needed.
    2. May be other attributes ... if needed.
    3. Side note: The outer scope may reject shadowing (redefining) of a global in the inner scope. The purpose is to prohibit shadowing of certain important globals, such as Object, True, False etc. to avoid having exceptionally complex system.
  3. Create an unresolved binding instance each time a binding is needed. Have the binding know where it's used and how to replace itself with a concrete binding, i.e. have a delegate we can call.
  4. Keep a list of the unresolved bindings as long as we are reading the definitions.
  5. When definition phase has finished, but before running the initializers, process the list of temporary bindings.
    1. Try to resolve each temporary binding to a concrete binding.
    2. If succeeded, run the "replace myself" delegate on the binding, so it will replace itself with the concrete binding where it's been used.
    3. If no "replace myself" delegate exists, the unresolved binding will have to keep reference to the concrete binding. This will result in double-dispatching and a slight performance hit, but the program will run as expected.
    4. If no concrete binding exists, keep the binding binding. May be flag it as unresolved. In this case it is an error binding and usage (reading/writing of the variable) will result in an runtime error.

Hopefully, when all source code files are read and the above is executed, there will be no unresolved bindings left.

Same rules are similar for all types of globals: classes, pool, global variables and global constants.

Initializing and Using a Global Binding

A global binding is similar to Association, except that besides key and value, it may have additional attributes it may need for its scope and context.

A binding's value is accessed through accessor methods / properties. This is semantically equal to reading or setting the value of the variable. Constant bindings of course have only getter method and no setter method.

A special private setter method exists on all bindings, so they can be initialized by the initializer - as part of the source code reading process.

The environment keeps a list of unresolved bindings (hopefully empty) that if used are equal to error binding.

Implementation Notes

Because of the requirements listed above, we will not modify the live environment directly. We will need to process the source code inside a transaction. For this purpose:

  1. Create a transaction object called InstallerContext. Preferably for concurrency, this object should block and disallow other changes to the environment while the operation is running.
  2. File in the source code (interchange files) and process definitions creating unresolved bindings where needed.
  3. Resolve the unresolved bindings, and fail if there still are unresolved bindings left.
  4. Validate other rules, such as class consistency etc. as described in X3J20.
  5. Commit definitions to the environment - possible by creating a local copy of it (for concurrency purposes). At this point, definitions should be consistent, but not initialized.
  6. Run initializers. Those are run outside of the transaction context, because we have no way to roll back the changes. For example, an initializer may overwrite an arbitrary global, even is's not defined as part of the filed in source code.