DDL Internals

To have a full idea of how the DDL works it is a good idea to take a look at its internals. This section briefly describes some of the sections that might be of interest to the and application developer who might wish to use the DDL or extend it and to a computer science student who is here out of curiosity. These sections are expected to act as brief roadmap to source code.

 

Note:

The DDL project homepage is http://ddl.sscli.net .

You can mail the authors at spark@sscli.net and dolly@sscli.net

 

Lexical Analyzer and Parser

The lexical analyzer is a hand written extensible tokenizer that is implemented in file ddllex.h and ddllex.cpp. It has support for C style tokenizing and two character and single character operators. It can optionally be set to be CRLF invisible. It also maintains a row + column count on each token returned which can be used for error reporting. The tokenizer does not load the source buffer that is expected to be tokenized. The external user of the tokenizer is expected to do this and pass the buffer to the tokenizer.

 

Here are one or two things worth documenting in the tokenizer source.

 

·         The tokens belong to a class defined in ddltoken.h and ddltoken.cpp. All tokens returned are allocated instances of this type. To cleanup one must explicitly de-allocate their memory.

·        
All the ‘type’ of the tokens are non zero positive values. A zero values token indicates the end of the token stream. All token types are defines. The defines are either in ddltoken.h or in the lemon generated header file lang.h.

 

Often token of a similar types have been grouped together and exposed to the parser as a single token type. For example one token type represents all the single parameter functions. This is set by getparsetype().

 

Lemon generates certain defines itself. These defines are expected as the defines set for the tokens it parses also. So some of the defines have been forced to be in lang.h.

 

The parser generator used is called lemon. It is a table driven LALR single token look-ahead parser generator. It is identical to yacc or bison in functionality, though not compatible with these. Lemon has a few said advantages over these. The notable ones among these is that lemon is generates faster parsers than the other two. Also the fact that lemon parsers are completely self contained objects and do not use any globals. Thus multiple streams of parsing can occur simultaneously.

 

A more detailed discussion on lemon is given in the documentation about the Script Engine or can be found on the web.

 


Internal Data Structures

 

This defines some of the data structures used internally by the DDL.

 

The SDL

The SDL or the Structure Definition List is an internal link list of all the Structure Definitions that are found in the DDL source. A basic SDL link list is as shown :

When a particular structure is read it creates a data structure under the SD. The data structure for a sample structure specification is shown.

 

 

A link list of nodes is the backbone of the entire structure. Each element can be a data element, a expression element or a link list node itself. The last case is for a when block. Every node in the nested link list is a  when block start. Under each of those is a binary node. The left pointer of the binary node contains the body of the when block and the right pointer contains the condition to be satisfied for this body to be accepted.

 


The Iterative Resolution algorithm -

The heart of the DDL

 

The following is a brief representation of the iterative resolution algorithm. The iterative resolution algorithm is the heart of the DDL. Using the resolution process the DDL decides which members exist in a particular instance of a structure. This process involves analyzing the given script with the data source with internal objectives such as speed, minimal data reading and handling. As far as possible the DDL avoids reading data member values unless they are absolutely necessary to predict the existence of other members.

 

 

 

The first diagram shows a script of a structure and the corresponding data structure available in memory after the parse phase.

 

The DDL when asked to resolve this structure must me given one piece of information – the actual location of this structures instance in the data source.

Once the DDL is asked to resolve initially a single scan process kicks in.

 

The single scan process does the following

·         Keeps track of the current address and the current element it is trying to resolve.

·         Every single instance data member that it encounters is immediately added to the resolved list.

·         Every data member array is analyzed for array size.

·         If the size is a constant the element is added to the resolved list

·         If the size if till type array the read ahead is done and the array is found out and the element is added to the list.

·         If the size is an expression that depends on other values in the structure it is added only if all other elements it uses are already in the resolved list.

·         If none of these then it is added to the unresolved member list and address cumulating is discontinued.

·         If the element is a expression type it is added to the unresolved list but the address cumulating is continued.

·         If it is a when type address cumulating is continued is the when subtree has no data members. The location of the when is added to the unresolved list.

·         If address cumulating is discontinued then the address of an elements can no longer be calculated and all the following elements ignored.

·         If an element has an explicit address and address cumulating is reestablished from that point on and that address is taken as the current address.

 

The end of the first scan the data structures and the resolved and unresolved elements look like this.

 

 

 

 

 

The resolved list is a object of type StructureInstance. It internally maintains a linklist of all members that are resolved call the MemberInstances.

 

The unresolved list or is a list of nodes each containing a URML or unresolved member list. In the first pass each URML is created and has only a single node in it. Each node has a pointer to the actual location in the data structure as shown by the arrows.

 

Now the iterative resolution kicks in. Here the DDL iterates over the unres. In each pass expects at least one element to be resolved or the iteration is stopped and an error is reported.

 

Each resolution pass in adds element to the resolved list if all the elements that it depends on are available.

If any when body is resolved the initial single pass scan is down inside the when body.

 

The result of the first iterative sweep is shown

 

 

 

 

 

Here notice:

The ‘when’ has been resolved, by reading values of its variables, in this case a1. Let us assume it was true. Elements a2, d2 , c2 and d2 have been added.

c2 is accessible after the when because it has an explicit address.

The URML to the earlier ‘when’ has gone and has been replaced by 2 URMLs.

The first on is a URML to an unresolved nested ‘when’.

The second URML has an entry for the nested when as well as a pointer to the location to continue after the nested when has been resolved. Thus that particular URML has two nodes. Note that since content of the when body has already been resolved till its last node (ie d2 was resolved) first of these nodes is a null. If there was an trailing unresolved section in the when this would not be null.

 

After the next pass we have -

 

 

 

 

 

After the next pass of the IR algorithm, the internal when has been analyzed. The section again splits the URML into 2. One for its nested when and one for its trailing block. This has the same king of null address and secondary location address.

 

The element after the when can be read because the content of the when contains no data members therefore the address can be read.

 

In the next pass 

The nested when gives true and exposes x

Each of the staring null are skipped.

 

This is shown :

 

 

 

 

Now the locations 21 are addresses of location that have explicit addresses given. The iterator knows that explicitly addressed locations are handled by the first pass and so it drops this location.

 

The location 23 contains as yet unresolved members, x is available and so the two members j1 and k1 are added. Also c1 and d2 get added as x is available.

 

The current unresolved and resolved lists become :

 

 

Now since k1 is available g1 and h1 get resolved.

 

 

 

 

 

Once h1 is available c1 and d1 get added which empties the unresolved list and therefore no more iterations take place.

 

 

 

 

 

 

If the iteration procedure stops and there are still unresolved elements in the unresolved list then the structure is irresolvable. Such a case would have occurred in this test case if the condition for the variable x was false.

In true structures however this is rare.

 

This completes the brief description of the working of the Iterative resolution algorithm.