Fast Dense-Sequence Algorithm
| |
|
This is an algorith I
cooked up for a Sequence object with no O(n) or worse functions.
Home
| |
Foreward
I worked out a data-structure that I've never heard of, below. It's similar in
some way to an AVL tree, but has a very different purpose. If anybody's heard
of it before, please write to me (address on my front page) and tell me about
it.
Update: Apparently (although I have not confirmed this) Knuth described an algorithm
very similar, but not quite identical, to this one.
Update May 2009: It has been pointed out somewhere that this is equivalent to
Algorithm C in Knuth's "Art of Computer Programming", section 6.2.3
"Balanced Trees" in volume 3 (Sorting and Searching). Also, this algorithm can
be straightforwardly translated into a B-tree form, which is very useful for
certain types of tasks.
Introduction
A dense sequence, or dense array, is a fundamental data structure. It acts
as a variable-size array (which it may or may not be in practice), and is
indexed by an integral key, or index. Each entry has a key contiguous to the
entries on either side of it.
This document describes a particularly efficient method for dealing with such objects.
The structure generally looks like this:

This structure has the following functions to access it, in general terms:
- insert(index, obj)
Insert an object obj in the array at the location given by
index.
- remove(index)
Removes the object at index and returns the value.
- at(index)
Returns the object at index.
There are more, but most can be implemented using the above three (and maybe a
size() routine, which is always trivial).
Array Representation
The standard representation of an array is simply as a system array. That is,
the index is used as an offset (multiplied by some constant) into a contiguous
block of memory. This has advantages and disadvantages. Let's consider the
standard complexities of this representation.
A variable-size array (assuming no reallocation penalty, which is also not true)
has the following complexities:
- insert(index, obj) - O(n)
- remove(index) - O(n)
- at(index) - O(1)
Simply put, this means that inserting or removing an object takes longer
proportional to the objects inserted, although picking an object out takes
only constant time. This imposes a noticable penalty on the system.
An astute reader will notice that I gloss over the linked-list implementation.
Linked lists have O(n) complexity on all three of these functions, but
generally have another set of functions for manipulating objects based on
how they relate to other objects. This set of functions operate in O(1) time,
but since these are not related to indexes, I'm going to gloss over them for
the time being.
The Problem
Trees are commonly used for the so-called "sparse array", where each object is
identified by a unique integer key, but the keys are "sparse", meaning they do
not occupy a contiguous range of integer values. Because of this, many
implementations using balanced trees, such as AVL and Red-Black trees, exist
to tackle this problem already.
The issue with a dense array is the reordering problem. Specifically, if you
insert a value at one index, all indexes after it shift up by one. This is,
of course, O(n) in the case of the simple system array. This does not
generally exist in sparse arrays, so it is not an issue in other tree
implementations.
Since all operations on a system array must move considerable amounts of the
array through memory, the performance is limited to O(n).
However, a structure exists which performs all operations in the following
complexities:
- insert(index, obj) - O(log n)
- remove(index) - O(log n)
- at(index) - O(log n)
Note that this is actually slightly slower in the at() function, although
the limits of physical hardware will make this difference matter very little.
Generally speaking, O(log n) complexity is a very efficient algorithm, and
an array with an O(log n) insertion routine will be able to do the above
reordering on a 4Gb array in a mere 32 steps. Contrast to the 4Gb data
move that would be required in the worst case (2Gb in the best case) by
the system array implementation.
The Fast Array
The solution is what I call simply "The Fast Array". It is obviously a tree
structure, but the reordering problem is eliminated by making the structure
itself perform the ordering.
The fast array is defined as:
- A leaf node, with an associated data object.
- An internal node, which must have both left and right children, which must
themselves be a fast array, and must also contain a count of the number of
leaf nodes of the left subtree.
This looks like:

The numbers across the bottom are the index values associated with that node,
and the numbers in the circles are the number of left leaves below that node.
Lookups
Lookups are done by comparing the lookup value to the node, branching left
if the lookup value is less, or subtracting the node value and branching right
if it is greater than or equal to. For instance, looking up index 2
in the example results in first visiting the parent node. Since the 2
in the node is greater than or equal to 2, the right branch is taken and the
search index becomes 0 (2-2=0). On the next branch, 0 is less than 1, so the
left branch is taken. This arrives at box "2".

Note that the numbers assigned to the boxes are not stored in the boxes. There
is no need for the leaf nodes to keep track of their index, in fact that would
defeat the purpose since then they would need reordering, leading back to the
old O(n) problem above. Instead, their position in the tree designates their
index number.
Insertions
Insertions are performed by searching down the tree as in the search,
incrementing a count whenever a left branch is taken, but not changing it
on a right branch. When a leaf-node is reached, it is replaced with a new
internal node with a count field of 1, with the original leaf and the new
leaf in whichever order was required (new before original for "insert before"
calls, original before new for "insert after" calls).

Notice that, in the example, the last node was after the insertion. It can
easily be shown that it was automatically renumbered to "4", at the same time
the original node "2" became "3". Notice though that the choice of which of
the two nodes on the new internal node is the new one is arbitrary. The
original node "2" could still be "2" with a new node "3" after it, as this
is totally implementation-defined.
The final step of insertion is rebalancing, which will be discussed below.
Deletions
Deletions are essentially the mirror-image of insertions. The tree is
searched for the node, with all left branches decrementing the count, and
all right branches not changing it. When the associated leaf is found, the
leaf is removed, the parent is removed, and the sibling is attached to the
parent's parent. The counts have already been fixed, so all that remains is
the general-form rebalance.
Rebalance
Rebalancing is started at the root node. When the numeric value of the root
is greater than half the number of leaves, then roll the tree left. When the
numeric value of the root is less than one quarter the number of leaves, roll
the tree right (I think this function will keep the tree from oscillating,
although I really ought to prove it, or at least test it).
Almost forgot to mention, if you happen to roll the tree, you will have to
descend the children and do the same thing. The number of leaves in a subtree
can easily be found from the number of leaves in the tree itself minus the
numeric value of the tree for the right node, the numeric value of the tree
for the left node. Simplest to do is descend both children in this way, since
only one will actually need to be descended. This could probably be optimized
but really should not need to.
When rolling a tree to the left, add the number in the old parent to the number
in the new parent. When rolling a tree to the right, subtract the number
in the new parent from the number in the old parent. These simple rules will
keep the integrity of the structures supporting the tree.
This page was written on July 18, 2001 and the sole property and
Copyright (c) 2001 of Jennifer E. Elaan
|
All material on these pages is Copyright (c) Jennifer E. Elaan.
|
|