Thursday, August 30, 2012

Example of doubly linked list in ParaSail

A question was posted asking how you would create in a pointer-free language like ParaSail a doubly-linked list, with constant time addition on either end of the list, and constant time removal anywhere within the list.  One straightforward way to do this is to use a Vector to store the nodes, and then use indices into the vector to create the doubly linked list. Below is such an implementation.


// Illustration of a doubly-linked list implemented in ParaSail

// Copyright (C) 2012, AdaCore, New York, NY
// To be used only for Personal, Academic, or Evaluation Purposes;
// Not for Commercial Production Use.
// Report errors at http://groups.google.com/group/parasail-programming-language

// There are two modules here, plus two test programs.
// The first module, Doubly_Linked_Index, provides a doubly-linked
// list of indices, the indices being usable with another vector
// where the actual data can be stored.

// The second module, Doubly_Linked_List, uses the Doubly_Linked_Index
// to provide the "Elem_Id" type, which is used to identify nodes in the
// linked list.  It also has a formal type "Element_Type" which is the
// type of the actual data to be stored in the linked list.

interface Doubly_Linked_Index<Max_Len : Univ_Integer := 100_000> is
  // Doubly-linked list acting as index into a separate vector of data
    type Elem_Id is new Integer<1..Max_Len>;
    op "[]"() -> Doubly_Linked_Index;
      // Create an empty linked list

    func Append(var DLI : Doubly_Linked_Index) -> Elem_Id;
      // Add a new node onto end of linked list

    func Prepend(var DLI : Doubly_Linked_Index) -> Elem_Id;
      // Add a new node at beginning of linked list

    func Insert_Before(var DLI : Doubly_Linked_Index; Elem_Id) 
      -> optional Elem_Id;
      // Insert new node before given element Id

    func Insert_After(var DLI : Doubly_Linked_Index; Elem_Id) 
      -> optional Elem_Id;
      // Insert new node after given element Id

    func Remove(var DLI : Doubly_Linked_Index; Elem_Id);
      // Remove specified element from linked list (if present)

    op "in"(Elem_Id; DLI : Doubly_Linked_Index) -> Boolean;
      // Return True if given element is in linked list

    func Remove_First(var DLI : Doubly_Linked_Index) -> optional Elem_Id;
      // Remove first element from linked list

    func Remove_Last(var DLI : Doubly_Linked_Index) -> optional Elem_Id;
      // Remove last element from linked list

    func Remove_Any(var DLI : Doubly_Linked_Index) -> optional Elem_Id;
      // Remove arbitrary element from linked list

    func Count(DLI : Doubly_Linked_Index) -> Univ_Integer;
      // Return count of elements in linked list

    func Max_Id_In_Use(DLI : Doubly_Linked_Index) -> optional Elem_Id;
      // Return max Elem_Id in use
end interface Doubly_Linked_Index;


interface Doubly_Linked_List <Element_Type is Assignable<>; Max_Len : Univ_Integer := 100_000> is // Doubly-linked list type DLI is new Doubly_Linked_Index<Max_Len>; type Elem_Id is DLI::Elem_Id; op "[]"() -> Doubly_Linked_List; // Create an empty linked list func Append(var DLL : Doubly_Linked_List; Elem : Element_Type) -> Elem_Id; // Add a new element onto end of linked list func Prepend(var DLL : Doubly_Linked_List; Elem : Element_Type) -> Elem_Id; // Add a new element at beginning of linked list func Insert_Before (var DLL : Doubly_Linked_List; Elem : Element_Type; Elem_Id) -> optional Elem_Id; // Insert new element before given element Id func Insert_After (var DLL : Doubly_Linked_List; Elem : Element_Type; Elem_Id) -> optional Elem_Id; // Insert new element after given element Id func Remove(var DLL : Doubly_Linked_List; Elem_Id); // Remove specified element from linked list (if present) op "in"(Elem_Id; DLL : Doubly_Linked_List) -> Boolean; // Return True if given element is in linked list func Remove_First(var DLL : Doubly_Linked_List) -> optional Element_Type; // Remove first element from linked list func Remove_Last(var DLL : Doubly_Linked_List) -> optional Element_Type; // Remove last element from linked list func Remove_Any(var DLL : Doubly_Linked_List) -> optional Element_Type; // Remove arbitrary element from linked list func Count(DLL : Doubly_Linked_List) -> Univ_Integer; // Return count of elements in linked list func Max_Id_In_Use(DLL : Doubly_Linked_List) -> optional Elem_Id; // Return max Elem_Id in use op "indexing"(ref DLL : Doubly_Linked_List; Elem_Id) -> ref optional Element_Type; // Return a reference to specified element end interface Doubly_Linked_List;
// Here are the implementations of the above two modules class Doubly_Linked_Index is interface Node<> is // Each node on list has next and prev Id var Next : Elem_Id; var Prev : Elem_Id; end interface Node; var Links : ZVector<Node>; // zero-th element is header var First_Free : Elem_Id; var Num_Free : Univ_Integer; func Add_Node(var DLI : Doubly_Linked_Index; Next, Prev : Elem_Id) -> New_Id : Elem_Id is // Add a node with given Next/Prev fields to Links vector var New_Node for DLI := Node::(Next => Next, Prev => Prev); if DLI.First_Free != 0 then // Reuse a free node` New_Id := DLI.First_Free; DLI.First_Free := DLI.Links[New_Id].Next; DLI.Num_Free -= 1; DLI.Links[New_Id] <== New_Node; else // Add a new node at end of Links vector New_Id := Length(DLI.Links); DLI.Links <|= New_Node; end if; DLI.Links[Prev].Next := New_Id; DLI.Links[Next].Prev := New_Id; return New_Id; end func Add_Node; // {Num_Free < Length(Links); First_Free in 0..<Length(Links)} // {(for all N of Links => N.Next in 0..<Length(Links) and then // N.Prev in 0..<Length(Links)} exports op "[]"() -> Doubly_Linked_Index is return (Links => [(Next => 0, Prev => 0)], First_Free => 0, Num_Free => 0); end op "[]"; func Append(var DLI : Doubly_Linked_Index) -> New_Id : Elem_Id is // Add a new node onto end of linked list return Add_Node(DLI, Next => 0, Prev => DLI.Links[0].Prev); end func Append; func Prepend(var DLI : Doubly_Linked_Index) -> Elem_Id is // Add a new node at beginning of linked list return Add_Node(DLI, Next => DLI.Links[0].Next, Prev => 0); end func Prepend; func Insert_Before(var DLI : Doubly_Linked_Index; Follower : Elem_Id) -> New_Id : optional Elem_Id is // Insert new node before given element Id if Follower in DLI then return Add_Node(DLI, Next => Follower, Prev => DLI.Links[Follower].Prev); else // Follower not in linked list return null; end if; end func Insert_Before; func Insert_After(var DLI : Doubly_Linked_Index; Elem_Id) -> optional Elem_Id is // Insert new node after given element Id if Elem_Id in DLI then return Add_Node(DLI, Next => DLI.Links[Elem_Id].Next, Prev => Elem_Id); else // Elem_Id not in linked list return null; end if; end func Insert_After; op "in"(Elem_Id; DLI : Doubly_Linked_Index) -> Boolean is // Return #true if given element is in linked list // NOTE: All elements on free list have Prev pointing to themselves return Elem_Id in 1..<Length(DLI.Links) and then DLI.Links[Elem_Id].Prev != Elem_Id; end op "in"; func Remove(var DLI : Doubly_Linked_Index; Elem_Id) is // Remove specified element from linked list (if present) if Elem_Id in DLI then // Not on the free list, so OK to remove ref Elem => DLI.Links[Elem_Id]; DLI.Links[Elem.Prev].Next := Elem.Next; DLI.Links[Elem.Next].Prev := Elem.Prev; // Mark as being on the free list Elem.Prev := Elem_Id; // Add to the free list Elem.Next := DLI.First_Free; DLI.First_Free := Elem_Id; DLI.Num_Free += 1; end if; end func Remove; func Remove_First(var DLI : Doubly_Linked_Index) -> optional Elem_Id is // Remove first element from linked list const First := DLI.Links[0].Next; if First == 0 then // List is empty return null; else Remove(DLI, First); return First; end if; end func Remove_First; func Remove_Last(var DLI : Doubly_Linked_Index) -> optional Elem_Id is // Remove last element from linked list const Last := DLI.Links[0].Prev; if Last == 0 then // List is empty return null; else Remove(DLI, Last); return Last; end if; end func Remove_Last; func Remove_Any(var DLI : Doubly_Linked_Index) -> optional Elem_Id is // Remove arbitrary element from linked list if Count(DLI) mod 2 == 0 then // Remove First if Count is odd, Remove Last if Count is even return Remove_Last(DLI); else return Remove_First(DLI); end if; end func Remove_Any; func Count(DLI : Doubly_Linked_Index) -> Univ_Integer is // Return count of elements in linked list return Length(DLI.Links) - DLI.Num_Free - 1; end func Count; func Max_Id_In_Use(DLI : Doubly_Linked_Index) -> optional Elem_Id is // Return max Elem_Id in use return Length(DLI.Links) - 1; end func Max_Id_In_Use; end class Doubly_Linked_Index;
class Doubly_Linked_List is var Index : DLI; var Data : ZVector<optional Element_Type>; // zeroth element is used to represent a missing element exports op "[]"() -> Doubly_Linked_List is // Create an empty linked list return (Index => [], Data => [null]); end op "[]"; func Append(var DLL : Doubly_Linked_List; Elem : Element_Type) -> Result : Elem_Id is // Add a new element onto end of linked list Result := Append(DLL.Index); DLL.Data[Result] := Elem; end func Append; func Prepend(var DLL : Doubly_Linked_List; Elem : Element_Type) -> Result : Elem_Id is // Add a new element at beginning of linked list Result := Prepend(DLL.Index); DLL.Data[Result] := Elem; end func Prepend; func Insert_Before (var DLL : Doubly_Linked_List; Elem : Element_Type; Elem_Id) -> Result : optional Elem_Id is // Insert new element before given element Id Result := Insert_Before(DLL.Index, Elem_Id); if Result not null then DLL.Data[Result] := Elem; end if; end func Insert_Before; func Insert_After (var DLL : Doubly_Linked_List; Elem : Element_Type; Elem_Id) -> Result : optional Elem_Id is // Insert new element after given element Id Result := Insert_After(DLL.Index, Elem_Id); if Result not null then DLL.Data[Result] := Elem; end if; end func Insert_After; func Remove(var DLL : Doubly_Linked_List; Elem_Id) is // Remove specified element from linked list (if present) Remove(DLL.Index, Elem_Id); end func Remove; op "in"(Elem_Id; DLL : Doubly_Linked_List) -> Boolean is // Return True if given element is in linked list return Elem_Id in DLL.Index; end op "in"; func Remove_First(var DLL : Doubly_Linked_List) -> Result : optional Element_Type is // Remove first element from linked list const Id := Remove_First(DLL.Index); if Id not null then // Remove element from Data vector Result <== DLL.Data[Id]; else // List is empty Result := null; end if; end func Remove_First; func Remove_Last(var DLL : Doubly_Linked_List) -> Result : optional Element_Type is // Remove last element from linked list const Id := Remove_Last(DLL.Index); if Id not null then // Remove element from Data vector Result <== DLL.Data[Id]; else // List is empty Result := null; end if; end func Remove_Last; func Remove_Any(var DLL : Doubly_Linked_List) -> Result : optional Element_Type is // Remove arbitrary element from linked list const Id := Remove_Any(DLL.Index); if Id not null then // Remove element from Data vector Result <== DLL.Data[Id]; else // List is empty Result := null; end if; end func Remove_Any; func Count(DLL : Doubly_Linked_List) -> Univ_Integer is // Return count of elements in linked list return Count(DLL.Index); end func Count; func Max_Id_In_Use(DLL : Doubly_Linked_List) -> optional Elem_Id is // Return max Elem_Id in use return Max_Id_In_Use(DLL.Index); end func Max_Id_In_Use; op "indexing"(ref DLL : Doubly_Linked_List; Elem_Id) -> ref optional Element_Type is // Return a reference to specified element if Elem_Id in DLL.Index and then Elem_Id in 1..<Length(DLL.Data) then return DLL.Data[Elem_Id]; else // return reference to null value stored in zeroth element return DLL.Data[0]; end if; end op "indexing"; end class Doubly_Linked_List;
// Two test programs, one for the Doubly_Linked_Index module // and one for the Doubly_Linked_List module. func Test_DLI() is var DLI : Doubly_Linked_Index := []; Println("Append three times"); const Id1 := Append(DLI); const Id2 := Append(DLI); const Id3 := Append(DLI); for Id in DLI forward loop Println("Found Elem_Id = " | Id); end loop; Println("Prepend twice"); const Id4 := Prepend(DLI); const Id5 := Prepend(DLI); Println("Count = " | Count(DLI)); for Id in DLI forward loop Println("Found Elem_Id = " | Id); end loop; Println("Insert before original second append"); const Id6 := Insert_Before(DLI, Id2); Println("Count = " | Count(DLI)); for Id in DLI forward loop Println("Found Elem_Id = " | Id); end loop; Println("Remove " | Id1 | " and " | Id4); Remove(DLI, Id1); Remove(DLI, Id4); Println("Count = " | Count(DLI)); for Id in DLI forward loop Println("Found Elem_Id = " | Id); end loop; Println("Reverse loop"); for Id in DLI reverse loop Println("Found Elem_Id = " | Id); end loop; Println("Append one and prepend two"); const Id7 := Append(DLI); const Id8 := Prepend(DLI); const Id9 := Prepend(DLI); Println("Count = " | Count(DLI)); for Id in DLI forward loop Println("Found Elem_Id = " | Id); end loop; Println("Reverse loop"); for Id in DLI reverse loop Println("Found Elem_Id = " | Id); end loop; Println("Unordered loop"); for Id in DLI loop Println("Found Elem_Id = " | Id); end loop; end func Test_DLI;
func Test_DLL() is var DLL : Doubly_Linked_List<Univ_Enumeration> := []; Println("Append three times"); const Id1 := Append(DLL, #one); const Id2 := Append(DLL, #two); const Id3 := Append(DLL, #three); Println("Count = " | Count(DLL)); Println("Item 2 = " | DLL[Id2]); Println("Item 3 = " | DLL[Id3]); var I := 0; for Elem in DLL forward loop Println("Found Elem = " | Elem); I += 1; if I > 5 then exit loop; end if; end loop; Println("Prepend twice"); const Id4 := Prepend(DLL, #four); const Id5 := Prepend(DLL, #five); Println("Count = " | Count(DLL)); for Elem in DLL forward loop Println("Found Elem = " | Elem); end loop; Println("Insert before original second append"); const Id6 := Insert_Before(DLL, #six, Id2); Println("Count = " | Count(DLL)); for Elem in DLL forward loop Println("Found Elem = " | Elem); end loop; Println("Remove " | Id1 | " and " | Id4); Remove(DLL, Id1); Remove(DLL, Id4); Println("Count = " | Count(DLL)); for Elem in DLL forward loop Println("Found Elem = " | Elem); end loop; Println("Reverse loop"); for Elem in DLL reverse loop Println("Found Elem = " | Elem); end loop; Println("Append one and prepend two"); const Id7 := Append(DLL, #seven); const Id8 := Prepend(DLL, #eight); const Id9 := Prepend(DLL, #nine); Println("Count = " | Count(DLL)); for Elem in DLL forward loop Println("Found Elem = " | Elem); end loop; Println("Reverse loop"); for Elem in DLL reverse loop Println("Found Elem = " | Elem); end loop; Println("Unordered loop"); for Elem in DLL loop Println("Found Elem = " | Elem); end loop; end func Test_DLL;

9 comments:

  1. why can't you use only a single vector that contains Next, Prev and Data?

    ReplyDelete
    Replies
    1. You absolutely could. I wrote it as two separate modules to provide a bit of separation of concerns, but I'm not convinced it was worth it!

      Delete
  2. on a more general note. obviously you can use vectors to implement any "circular" data structure. however this does not seem to be a good idea.

    what would happen if you had a bug in the implementation of the linked list inside a vector? the program, of course, would not crash. but it might read invalid data from the vector or overwrite valid data in the vector with invalid data.

    at this point one has to remember that "crashes" (run-time checks) were invented in order to prevent exactly this kind of thing from happening. debugging a "crash" is usually much easier than wondering why a program returns invalid data or/and enters an infinite loop (which is what usually happens when there is a bug in a system without run-time checks).

    so, this doesn't fit very well in a massively parallel language does it? would you have any better idea on circular structures? maybe come up with annotations that make sure there aren't any bugs in the implementation of a circular structure? or something all new, such as reference counting?

    ReplyDelete
    Replies
    1. You seem to be suggesting that using pointers provides an easier-to-debug situation than using indices. I am surprised by that. Run-time checks are not much use with detecting "stray" pointers, or dangling references. They can detect null pointers, but certainly array indexing can detect use of zero or out-of-range indices more easily.

      In any case, I would be interested in seeing a case where a solution using pointers would be easier to debug. I can accept that in some cases pointers are more efficient than array indices, though compilers these days are getting pretty efficient at array indexing. Also, as we move into a 64-bit world, array indices can remain 32 bits, or even 16 bits, while pointers will generally have to move to being 64 bits. Finally, using arrays rather than pointers into a heap tends to provide better locality of reference, which can be more important than any potential savings which might be gained using pointer dereference rather than array indexing.

      Delete
  3. well i don't think you got what i meant, so here's an example. all indices are always in range of inside the vector, of course, so no crash would ever happen. now, suppose you have a bug such as you free a type member but forget to delete the index that "points" to it. the index is still in range for this vector, but now points to a vector member that contains invalid data (the same vector member can later be allocated again for another member of your type). but the old index still points to it and through the old index, a function could overwrite the new member with some old data. this kind of stuff would then continue undetected for a long (running) time. and with no memory debugger, you have no chance to catch the bug at the moment when it happened, such as you could if you used pointers; it would only show itself much later, when the program does something really unpredictable because of the invalid data it was fed with a while back.

    so with this solution, instead of "stray" pointers, you'd have "stray" indices, which result in "index allocation" bugs instead of "memory allocation" bugs. these would be in essence the same, but limited to inside a single vector. all the known kinds of such bugs can happen (double free, etc.)

    so yes, i think using pointers would be a little bit easier to debug than indices, since there are tools for this (malloc debuggers). i'm pretty sure that if you used an entirely vector-based allocation for all objects, those bugs would appear as often as memory allocation bugs would appear in a similar program that used pointers instead of indices. luckily, you only do this for circular objects, which are supposed to be rare. but i'd still say that for circular objects, it's not a good solution (just as it wouldn't be a good solution for all objects). so the question then is, is it possible to come up with something better, somehow?

    ReplyDelete
    Replies
    1. If you look at how this is done above, when you Remove an element from the list, it is actually set to null. The indices are recycled, but the actual storage is not. The "indexing" operation is set up so that it returns a "null" if the index refers to a deleted element. So I believe you get the general safety of arrays, while also being able to detect misuse of indices.

      Delete
  4. yes you're right, i missed this part! indeed it is a lot safer this way. the price you pay for this, of course, is an endlessly expanding element vector. i'd call this a memory leak though.

    the idea that i had - but i don't know if it's possible to do - was to write a class invariant annotation that makes sure that the free list is consistent (does not point to anything that's pointed to by something else, and it points to everything that nothing else points to). but i don't know if the compiler could check this? i'm not really familiar with Hoare logic. if this kind of thing could work for array indices, i'd think it would work for pointers, too.

    ReplyDelete
    Replies
    1. Actually, the elements are available for reuse, but while they are not in use, any reference to them is detected.

      You are right that writing an invariant is the best way to go. This can presumably be done with pointers or with indices.

      A key advantage of indices is the ability to support divide-and-conquer without any danger of collision, because of the easy way to use the indices themselves as part of the division criteria (e.g. only process indices < or > than some dividing line). The others I have also mentioned, such as the better locality of reference of an array vs. data spread over a heap, and the size of indices vs. the size of pointers.

      Delete
    2. Another thing you can do with indices is devote some number of bits of the index to "check" bits. For example, you could use the low order 8 bits as check bits, and divide by 2^8 before using the index as a vector index. Before using the index, you could verify that the check bits match some check bits stored in each element. When an element is freed, you would always increment (mod 2^8) the value of the check bits.

      You can do this with pointers as well, but of course then your "pointers" become a pair [pointer, check-bits], which can be more painful to use. Since this only works if you never deallocate the memory, you are effectively back to the vector-like solution, except that your elements are scattered around in memory rather than being collected into a single region.

      Delete