Algorithms and Data Structures

From GPWiki
Jump to: navigation, search

What is an Algorithm?

An Algorithm itself can be described as a solution to a problem by following a set of steps. Other algorithm-based solutions have such names as:

  • Recipies, e.g. how to bake a cake.
  • Directions for traveling from point A to point B.
  • Tutorial, how to do something.

A computer system is itself a giant algorithm that follows a straight line following executions in such a fast rate that its not experienced as an algorithm.

Pseudocode can (and should) be used to describe that particular solution. Pseudocode is simply human readable executions for the algorithm.

Growth of Functions

Asymptotic Notation

When we have an asymptotic upper bound of a function we use O-notation. The definition is

O(g(n)) = \{f(n) | \exist c>0,  \exist n_0>0,  \forall n \geq n_0: 0 \leq f(n) \leq c \cdot g(n) \}


Thus, to prove that  f(n) \in O(g(n)) one must find constants  c and  n_0 such that

f(n) \leq c \cdot g(n) \qquad \forall n \geq n_0


Examples:

 \log 8n \in O(n^3) \mbox{ because } \log 8n \leq c \cdot n^3 \mbox{ for }c=3 \mbox{ and }n \geq n_0=1

 n^7 \in O(2^n) \mbox{ because } n^7 \leq c \cdot 2^n \mbox{ for }c=1 \mbox{ and }n \geq n_0=37

 n^5 \cdot \log n \in O \left ( \frac{1}{n} \cdot n! \right ) \mbox{ because } n^5 \cdot \log n \leq c \cdot \frac{1}{n} \cdot n! \mbox{ for }c=42 \mbox{ and }n \geq n_0=42


The relation between the sizes of asymptotic functions is

c < \log(n) < \log(n)^c < n^k < n < n^c < c^n < n!

where c is a constant greater than 1, k is constant smaller than 1 and n is the input size.


Examples:

 \log(8n) < n^3

 n^7 < 2^n

 \log(n) \cdot n^5 < (\frac{1}{n}) \cdot n!


In any asymtotic function, only the most important part of the function is examined. The constants and less important parts are completely disregarded.

For instance, the functions

O(2n! \cdot \log(3n^3) + \frac{6^n}{4})

belongs to

O(n! \cdot \log n)

It is important to note that the bounding functions are just that - bounding. Unless the function is a specifically tight bound, it will generally not matter how tight the bound is.

For instance, this means that n is O(n!), as well as O(n). It is generally prefferable to use bounds that are as tight as possible.


Broadly speaking, the asymptotic notations correspond to the following comparison:

Notation Corresponding notation
f(n) = O(g(n)) f(n) \leq g(n)
f(n) = \Omega(g(n)) f(n) \geq g(n)
f(n) = \Theta(g(n)) f(n) = g(n)
f(n) = o(g(n)) f(n) < g(n)
f(n) = \omega(g(n)) f(n) > g(n)


External information:


Asymptotic Comparisons

(...)


Asymptotic Notations For Loops

(...)


Heapsort

Heapsort is a sorting algorithm that relies on the heap structure. The idea is that it is possible to easily store a binary tree structure in a one-dimensional array.

Binary tree

A binary tree is a tree where each node has 0, 1 or 2 children. The node with no parent is called the root of the tree, and the nodes with no children are called the leaves of the tree.


Navigating the tree:

If the first element of the array is referred to as element 0, then, as a general rule, the children c1 and c2 of a parent node p contained within the array can be found by the formulae:


element#(of-Child-1)=(element#(of-parent))*2 + 1

element#(of-Child-2)=(element#(of-parent))*2 + 2


Where element#(x) is the element number of the item x contained in the array.

It is crucial to understand how a computer navigates a binary tree if one is to write tree-based algorithms. Heapsort is a tree based algorithm.

Heaps

The concept at work behind heaps is that a clear relationship exists between a parent node and the children nodes that belong to it. Heaps are either maxheaps or minheaps - that is, either the value the children is to be sorted by is smaller than that of their parent(maxheap), or it is larger than that of their parent(minheap). A minheap has the lowest number as it's root, maxheap has the largest number as it's root.

The heap data structure must be build and maintained when you are given a data set, if you are to run heap-structure-based algorithms on it. The most interesting algorithm, to us, is the heapsort algorithm, as most of the other algorithms only serve to give us access to basic functions such as insertion and deletion, and to make sure the heap is maintained upon completing insertion or deletion.

Basic maintenence:

There are two maintenence operations: bubble up, and bubble down.

Bubble up is a reccursive method that is run on a child-node which could have an illegal property as compared to it's parent node. It will examine the child and the parent, and if the property is wrong (eg: if for a maxheap, the child is larger than the parent, or if for a minheap, the child is smaller than the parent), it will swap the child and parent, and then call the same procedure on the parent, and otherwise do nothing.

Bubble down is the opposite, but with one additional thing to take into account; bubble up only has to compare with the one parent, but bubble down has to compare with both children, to ensure that the child that becomes the new parent does not break the property between itself and it's former sibling. This is taken care of, in practice, by always swapping the parrent with the largest child, if the algorithm is implemented in a maxheap, and always with the smallest child if it's implemented in the minheap.

Heapsort

The principle advantage to heapsort are that it is not expensive to build storagevise, and that it always sorts in (n log(n)) time. It typically remains slower than quicksort and mergesort in the average case, though. While quicksort is often faster, some inputs can make quicksort O(n*n), so heapsort is often used in real-time systems where having it take nearly the exact same time every time for n inputs is more important than decreasing the average case sort time.

Quicksort

Worst-case: \Theta(n^2)

Best-case: O(n \cdot lg(n))

Expected time is the best-case time.


Merge Sort

Expected time: O(n \cdot lg(n))

Sorting in Linear Time

Algorithm Time Worst-case Best-case Expected In-place Stable Recursive
Counting sort \Theta(k + n)  ?  ? \Theta(k + n) (because k = O(n)) No Yes No
Radix sort \Theta(d(k + n))  ?  ?  ? No* Yes No*
Bucket sort  ?  ?  ? \Theta(n) No* Yes* No*

where

k is the maximum domain for n

b is the number of digits

* means that it depends on the internal sorting algorithm used

Hash Tables

See Hash Table for an expanded article.

Hash tables work by indexing the given value by finding a "key" or hash value, and then addressing it at the position. Most often, you use mod (modulo) of the height (which is defined "m") of the hash table to figure out where this key should be positioned as this will never allow the value to exceed the hash table. Example:

If the length of the hash table is 17, the positions vary from 0 to 16. The function to decern the position would, normally, be h(k) = k mod 17. This example only works for integers, of course.

Hash Functions

Functions for finding a hash adress based upon a specific key vary a lot. The basic idea is that you want a function which provides a good spread of hashes across the set of keys you want to adress.

A good spread is generally attained by using a modulus function, because the modulus function limits the maximum value of the output hash number, but several other things may factor into deciding which hash function to use. If you know that the set of keys you are operating with conform to certain general tendencies, you may want to design a hash function which suits your needs. If, for instance, all your keys are numerically even numbers, it is easy to accept that the hash function ((n/2) mod p)(where p is a prime number of a suitable size) gives a spread where the set is distributed tighter than the function (n mod p).

Similarly, the prime number you choose to mod against can be chosen differently depending upon how efficient you want the hash table to be in various areas, primarily the time of the insertions, deletes, and lookups, contra the size of the data structure.

Designing excellent hashing functions is hard to do for an inexperienced or even moderately seasoned programmer without some degree of practical testing, but designing an acceptable hashing function is luckily far easier, and can, in practice, be undertaken if one is aware of some generally applicaple guidelines. Going into detail with such guidelines seems to be beyond the scope of this exam, however.

Open Addressing

Linear Probing

The idea of Linear probing is to simply move the hash table to the next value, so if value 5 was taken, we add i (which has increased to 1), and if that spots taken, we increase i to 2, and so on.

h(k,i) = (h'(k) + i) \mod m

where

i = 0, 1, ..., m - 1

k is the key

Quadratic Probing

Same principle as Linear Probing, but this time, we incorporate a quadratic function to avoid having a long line around one point. For instance; 6 is taken, 7 as well as 8, 9 and 10, Linear Probing would take quite long in the long run for more values, where as quadratic functions make a bigger "hole", so to speak.
h(k,i) = (h'(k) + c_1 \cdot i + c_2 \cdot i^2) \mod m

where

i = 0, 1, ..., m - 1

k is the key

c_1 and c_2 are auxiliary constants

Double Hashing

h(k,i) = (h_1(k) + i \cdot i + h_2(k)) \mod m

where

i = 0, 1, ..., m - 1

k is the key.

h_1 and h_2 are auxiliary hash functions, ie.

h_1(k) = k \mod m

h_2(k) = 1 + (k \mod m')

Binary Search Trees

Properties:

  • For each node, y, in the left subtree of each node, x, it holds that key[y] \leq key[x]
  • For each node, y, in the right subtree of each node, x, it holds that key[x] \leq key[y]

(...)


Transveral

Pre-order (prefix) traversal

pre-order(node)
    print node.value
    if node.left  != null then visit(node.left)
    if node.right != null then visit(node.right)

This recursive algorithm prints the values in the tree in pre-order. In pre-order, each node is visited before any of its children. Similarly, if the print statement were last, each node would be visited after all of its children, and the values would be printed in post-order. In both cases, values in the left subtree are printed before values in the right subtree.

Post-order (postfix) traversal

visit(node)
    if node.left  != null then visit(node.left)
    if node.right != null then visit(node.right)
    print node.value

In-order (infix) traversal

in-order(node)
    if node.left  != null then visit(node.left)
    print node.value
    if node.right != null then visit(node.right)


Example:

Binary_tree_%28letters%29.png

  • Preorder (NLR) traversal yields: A, H, G, I, F, E, B, C, D
  • Postorder (LRN) traversal yields: G, F, E, I, H, D, C, B, A
  • In-order (LNR) traversal yields: G, H, F, I, E, A, B, D, C
  • Level-order traversal yields: A, H, B, G, I, C, F, E, D


External information:


Red-Black Trees

Properties:

  1. A node is either red or black.
  2. The root is black.
  3. All leaves are black.
  4. Both children of every red node are black.
  5. All paths from any given node to its leaf nodes contain the same number of black nodes.


All operations on red-black trees takes O(lg(n)) time.

(...)


External information:


Disjoint Sets

The max height of an n-sized union made with union-by-size has height: O(lg(n))

(...)


Transition Systems

(...)


Invariants

Must hold true at

  • Initialization
  • Maintenance: Before and after each loop
  • Termination