Designing ParaSail, a new programming language: Rev 3.8 release brings initial I/O library, import clauses

Thursday, January 10, 2013

Rev 3.8 release brings initial I/O library, import clauses

We are pleased to release alpha revision 3.8 of the ParaSail compiler and virtual machine, available at the same URL:

http://bit.ly/Mx9DRb

This release includes a number of new features, including support for the import clause, and a standard library now defining modules with names like PSL::Core::Boolean and PSL::Containers::Set (rather than simply Boolean or Set). There are some new I/O modules, including PSL::Core::IO, PSL::Core::File_Input_Stream, and PSL::Core::File_Output_Stream. These last two support simple textual file I/O. We will be providing additional libraries oriented toward a graphical user interface in an upcoming release. The reference manual has also been updated to cover some of the newer features. As usual, consult the release notes for more details on the 3.8 revision.

We will write a separate blog entry about the design of the import clause -- it brought out some interesting issues. Every ParaSail module has an implicit import of PSL::Core::* and PSL::Containers::* so the introduction of the import clause shouldn't cause any compatibility issues with existing code.

8 comments:

UnknownJanuary 18, 2013 at 6:03 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJanuary 18, 2013 at 6:05 PM
== Apologies: Some of the symbols in my previous post has been stripped away by the HTML sanitizer ===

Hi Tucker,

I came across your paper recently on Parasail and am very interested. I working on programming language design at Berkeley and have also recently become annoyed at the number of bugs that arise due to pointers.

I would like to know your opinion on the following design problem. Because of the move, copy, and swap semantics of Parasail, the user is limited to creating only tree-like data structures. How do you handle the use cases that are typically solved using parent pointers in other languages?

For example, it is reasonable for a "Company" object to contain a list of "Employee"s, but shouldn't it also be reasonable to query an "Employee" and ask him what company he works for?

As another example, we can think of "Directory"s as containing other "Directory"s and "File"s, but wouldn't it be reasonable to be able to ask a "File" for its path?

Both the above examples are typically solved with a parent pointer in other languages. I am not fond of this solution, but have not yet found a decent alternative and am interested in Parasail's solution.

Thank you very much
-Patrick
ReplyDelete
Replies
Tucker TaftJanuary 18, 2013 at 6:38 PM
The simplest solution is to adopt what is done in relational databases, and use "foreign" keys (unique id's) rather than pointers. In fact, it is often the case that with situations like "Companies" and "Employees," a given person might work for multiple companies, and clearly a company has multiple employees, so you end up representing these as separate tables, with sets of unique id's linking them together. That is, you don't store the employee directly in the company data structure, but rather you store the employee's social security #, or equivalent, and then you look that up in your person table.

With directories and files, you can often simply retain the full name in the in-memory directory/file "handle," so that you can always retrieve it for an open file. For systems like Unix that allow a file to be in multiple directories ("hard links") then you end up storing unique IDs again ("inode numbers" in Unix parlance) rather than files inside directories.

I thought about allowing some kind of "parent" reference, but it seemed like a relatively special case, and it added a lot of complexity. Variations on the above two approaches (using Unique IDs, or remembering the full "path" as part of the "handle" on an element of a tree) seemed more flexible, ultimately.
ReplyDelete
Replies
UnknownJanuary 18, 2013 at 7:37 PM
My experience with using the foreign key solution is that the logic does not seem so different from just using pointers. For example, I could just use a unique 64-bit integer as my foreign key for all my "Employee"s and "Company"s, and have one large table that associates keys with records. But the logic for manipulating this data structure will be as error-prone as the logic for manipulating a pointer-based structure. It will actually be more error prone, because now I have to be aware of the difference between a key and a record, I have to look up the key in the correct table, and I can no longer rely upon the garbage collector to clean up unreachable records anymore.

Do you have an example where refactoring a pointer-based implementation of a program into a foreign key implementation of a program results in substantially cleaner code?

Thank you very much for the quick reply.
-Patrick
ReplyDelete
Replies
UnknownJanuary 19, 2013 at 11:29 AM
Those explanations are very clear. Thank you. I particularly like the distinction between the objects that have independent existence and the objects that are "components". I had a similar but less well-formed thought when I was refactoring a while ago. I classified pointers as being one of two types, one type of pointer indicated "contains", and another type of pointer indicated "linked to". Child pointers would be an example of "contains" pointers, and parent pointers would be an example of "linked to". Using this classification, I made a strict rule for myself that functions may only mutate objects that are "contained" by its arguments, i.e. that only objects reachable by following "contains" pointers from the arguments are allowed to be mutated. When this guideline is not followed, I found that the scope of the effects of a function quickly become unintuitive.

Your explanation of the tables is very clear, and seems like the reasonable way to do things. Let me give some more details about my example to try and spot where I fell off the correct path.

I was programming a compiler, and designed a set of AST nodes for use by the compiler. The compilers would read in text, and convert it to AST nodes, and subsequent transformations will be done entirely with the AST nodes. Because of various quirks of the semantics of the language, the AST for a program is not actually a tree but a graph.

The first incarnation used pointers for everything. AST nodes have child pointers that point to child statements, and parent pointers that point to the containing statement. I did not follow the coding guideline that I specified in the first paragraph and the code became messy quickly.

After this, I noticed that I conceptually find tables much clearer than objects and pointers, so I refactored the code to represent AST nodes as records in a table, with foreign keys in place of pointers. Here are the problems that I ran into:

1) Originally, I had different tables for different types of AST nodes. But I noticed that there were quite a few statements in the language where the type would not be known. (i.e. The child could be a FunctionNode, or a VarNode, or a ClassNode!) And I was storing each of them in separate tables! In these circumstances using a foreign key was not enough, I also had to remember which table to look up the key in.

That was not very elegant, so I eventually decided to put all of the nodes into one table, and have a field that indicates what type of record it is.

2) Whereas in the pointer-based implementation I was manipulating objects directly, in the table based implementation I was manually responsible for keeping track of keys and records. Thus to get the name of a child statement, I would manually retrieve the child's record by looking up its key. This extra distinction between key and record led to an increase in LOC because of having to look up keys manually. I would also often forget to look up a key before accessing a field, leading to tedious bug hunting.

3) During program transformations in the original implementation, new nodes were created and existing nodes were modified. Thus, if by following the root node (e.g. the top level FunctionNode) some AST nodes were no longer reachable, they were automatically cleaned up by the garbage collector.
In the table based implementation, I was manually creating and assigning keys to records. Thus at the end of each transformation step, I was forced to manually do a traversal through the records in the table and clean up the ones that were no longer reachable.

That was my experience with attempting to refactor my pointer-based code. It did not go well, but I was using a dynamically-typed language. I can understand your point about using a statically-typed language to help many of the above errors.

I am keen to hear about which parts you would have done differently.

Thanks very much
-Patrick
ReplyDelete
Replies
UnknownJanuary 19, 2013 at 5:57 PM
You are completely correct. I had been using the VarNode, and FunctionNode objects to store information that would traditionally have been stored in a SymbolTable. Because some ASTNodes would operate on VarNodes and FunctionNodes, the resulting AST is a graph and not a tree.

The parent pointers did, as you predicted, add a lot of complexity to the implementation. I was using them for determining the enclosing scope of variables and functions which were used for some of the analysis stages. During program transformations, if I wasn't careful to update the parent pointers properly, the pointers would often become inconsistent.

One thing that does bother me is that I haven't determined exactly why parent pointers add so much complexity to the code. Initially, I thought that if I simply use proper OO methodology and hide the handling of parent pointers below a simple API then the complexity would be nicely contained, but this turned out not to be the case. Do you have any thoughts/experience with this?

I will follow your advice and try to refactor as per your suggestions. I am quite excited about how it might simplify my compiler. So thank you very much. To me it seems that these pointer issues, and data-structure design, affect software complexity much more than functional vs. imperative styles. I will be watching the development of Parasail with great interest.

Thank you very much again
-Patrick
ReplyDelete
Replies

Add comment

Designing ParaSail, a new programming language

Thursday, January 10, 2013

Rev 3.8 release brings initial I/O library, import clauses

8 comments:

Contributors

Blog Archive