CSI 801

Lecture #2

Algorithm's and Data Structures

September 5, 1996

John Wallin

I'm sorry Dave, I can't do that.
Hal, 2001, A Space Odessey



Preliminaries

group assignments

Is everyone assigned to a group?

Is everyone registered for the class?

Introductions

Who are you?


A Short Review

Computers work use digital circuits.

Electronic Switches

to

Digital Logic

to

CPU

----------------------

Computers can only do about five things:

simple math

byte operations

simple decisions

store and retrieve data


Internal Data Representations

Bytes

single group of 8 bits

Integers

groups of 2 or more bytes

1 sign bit with a mantissa

Which is one?

00000000 00000001

or

00000001 00000000


Which is one?

00000000 00000001

or

00000001 00000000

Answer:

It depends.

The byte order for integers is machine dependent.

Little Endian

vs

Big Endian


Floating Point Numbers

X = 2.343 x 1012

start with byte representation

32 bits = 4 bytes

00000000 00000000 00000000 00000000


X = +/- (0.b1 b2 b3... ) x 2m

1 bit = sign of mantissa

1 bit = sign of exponent

z bits = the integer exponent

30 - z = the mantissa

Bit order is usually:

sign

sign

exponent

mantissa


Example:

52.234375

52 = 110100

0.234375 = 0.001111

therefore

52.234375 = 110100.001111

or

0.110100001111 x 2110

In this example, we have the exact representation of a binary number


Roundoff Error

What happens when your number converts into a repeating binary number?

0.101010101010...

The number can only be given to the precision of the computer.

The computer truncates the number.

Especially important when you

subtract numbers which are close to the same value.

234.34565432245332 -

234.34565432245341


Significant Digits

given a floating point number with two sign bytes and a mantissa of size z

what is the smallest positive number that can be represented in the system?

0.000000001 x 20

2^(-z)


Overflow and Underflow Errors

A floating point number can only represent a range of numerical values.

If z=the number of bits in the mantissa

the largest number it can represent is:

2^(31 - z-1)


Informal Programming Assignment

What are the smallest positive numbers which can be represented with single precision floating point numbers on the SGI machines?

What are the largest positive numbers which can be represented with single precision floating point numbers on the SGI machines?


Basic Data Structures

We store data with variables.

A = 3

B= 324.34653

C = 'f"

D = 'uberdog"

We can also create arrays of data

a[0] = 232

a[1] = 343

...

or

b[0][0] = 345.33

b[0][1] = 2334.343

...


Arrays

An Array is one of the most basic types of data structures.

In memory:

a[0] is next to a[1]

Pointers are addresses of data.

Variables refer to the data themselves.

Imagine a two dimensional array b.

Which elements are adjacent?

A(I,J) is next to A(I+1,J) ?

Or

A(I,J) is next to A(I,J+1) ?


Column Major

in C, elements are stored in adjacent columns

tmp[row][column]

Tmp[2][12] is

two rows of 12 columns

tmp[i][j] is next to tmp[i][j+1]

------------------------

Row Major

in Fortran, elements are stored in adjacent rows

tmp(i,j)

tmp(i,j ) is next to tmp(i+1,j)

---------

it is generally faster and more efficient to access data elements which are adjacent in memory


Pointers and Variables

variables

name associated with a value and a data type

pointers

a name associated with a memory location and a data type


Pointers in C and Fortran

Pointers in C

int *a;

int b;

int c

c = 1242;

*a = 2353;

&b = a;

Pointers in Fortran

int a(100), b(100), c;

int i

equivalence (a,b)

c = 1242

for i=1,100

a(i) = i

enddo

ALL VARIABLES PASSED BETWEEN SUBROUTINES IN FORTRAN ARE POINTERS!


Why Use Pointers

Allocation and deallocation of memory

Rapid transfer of data

Pointer Arithmetic

Dynamical Data Structures


Virtual Memory

most modern computers use disk drives to expand program memory

Page Faults

RAM and disk are exchanged during the execution of a program

Page Faults are slow.

Chunks of memory are loaded from disk.

You want to minimize page faults by locating commonly used variables close together in RAM.


Derived Data Types

data types which are created from several basic or derived data types

allows you to create new data types by encapsulating data

can be made into arrays

can be very useful for science work


the Atom

Derived Data Types

position

x, y, z

velocity

vx, vy, vz

acceleration

ax, ay, az

species, mass, ionization


The Atom

Internal Representation

float x, y, z;

float vx, vy, vz

float ax, ay, az;

byte species;

double mass;

short int ionization;

float = 4 bytes

byte = 1 byte

double = 8 bytes

short int = 2 bytes

9 * 4 + 1 + 8 + 2

=

47 bytes for each record


The Diatomic Molecule

another derived data types

position

x, y, z

velocity

vx, vy, vz

acceleration

ax, ay, az

vibrational state, rotational state

components

atom1, atom2


Data Structures

dynamically - not predefined and can be updated with new data

organized - created to optimize some set of tasks, usually searching

data sets - groups of basic or derived data types

A way to organize your data


Typical Data Structure Operations

from Cormen et al.

Search(S, k)

Insert(S, x)

Delete(S, x)

Minimum(S)

Maximum(S)

Successor(S,x)

Predecessor(S,x)

where

S = a set of data, k = a key, and x = a data element


A Random Array of Data

elements are not in order

most operations require you to search the data set

max, min, successor, predecessor

deletes require memory to be moved

-------------------------------------

A Sorted Array of Data

elements are in order

most operations require no searching

max, min, successor, predecessor

deletes and inserts still require memory to be moved


Stacks

basic data structures

last in- first out

(LIFO)

input data is pushed on the stack

2 23 56 3

push 5

5 2 23 56 3

output data is popped from the stack

pull x

2 23 56 3

and x=5

similar to HP calculators


Queues

basic data structures

first in- first out

(FIFO)

input data is enqueued

2 23 56 3

enqueue 5

5 2 23 56 3

output data is dequeued

dequeue x

5 2 23 56

and x=3

similar to supermarkets


Link Lists

basic data structures

useful for tracking binned data

---------

1) each bin points to the first element in the bin

2) each element points to the next element in the same bin

3) each element may point to the previous element in the same bin

4) the final element in a bin points to nothing


Construction of Linked Lists

derived data types

element list

data

pointer to the next element

(pointer to the previous element)

-----------

bins list

# elements in the bin

Head of Chains (HOC)

pointer to first data element


Linked Lists

science applications

tracking hurricanes

cities: Miami, Charlottesville, etc.

Hurricanes: A50, B50, C50,...., F96

each Hurricane hit a major city

List may be some thing like:

Miami: B53, E59, A67, G75, A91

Charlottesville: H92, F96

The HOC(Miami) = B53

Next(B53) = E59

Next(E59) = A67

Next(A67) = G75

Next(G75) = A91

Next(A91) = null


Why not use simple arrays for each City?

Waste of memory

----------

Why not just sort every hurricane by city?

Waste of cpu time


Binary Trees

basic data structures

1) data is divided into nodes

2) each node has left and right children (pointers)

3) each node has a key used for comparison

4) each node has a parent (pointer)


Creating Binary Trees

elements must be inserted as nodes

nodes must be inserted at empty children

the position of the data is determined by the nodes' keys


Scientific Binary Trees

imagine an expert system used to identify animals

Does the animal have four legs?

Yes-

Does the animal have fur?

No-

Does the animal have two legs?

Etc...

Trees can be used to maintain and update searchable data lists.


Scientific Tree Structures

searchable databases

calculation of clustering

multipole methods

images storage and compression (quad trees)


Unbalanced Trees

What happens if your tree is unbalanced?


Strategies for Balancing Trees

hashing

random generation of keys from data

recursive bisection

dividing data into exactly equal groups


Algorithms

A set of instructions used to solve a problem.

'The information superhighway will save the environment and revolutionize education in the 21st century."

-An Al Gore -ism


Order of Calculations

How many operations will it take to complete this algorithm?

Binary Tree Searches

log N operations

Finding the minimum of a data set

N operations

Fast Fourier Transforms

N log N

Finding the minimum difference between data values

N^2 operations

Simple inversion of an N x N matrix

N^3 operations


Long Multiplication

How many operations does it take to multiply an N-digit number by another N-digit number?

1 2 6 3 7 3 7 3

x 3 4 5 6 7 8 7 4

---------------

First digit is multiplied by N digits

Second digit is multiplied by N digits

....

Nth digit is multiplied by N digits

Also, there are also additions involved.

This is an N^2 algorithm.

Can this be done more efficiently?


Long Multiplication

Can this be done more efficiently?

Yes

FFT(product) = FFT(number1) + FFT(number2)

FFT's take O(N log N) operations.

Inverse FFT's take O(N log N)

The operations will take O(N log N) operations.

This is must better for large N.


N^2 vs N log N

Does this really make a difference?

Assume each operation takes 1 microsecond and we have a million digits.

N^2 = 1,000,000 second

= 11.6 days

N log N = 14 seconds

70,000 times faster!

This makes some naive assumptions, but the difference is very dramatic.

This is MUCH better than getting a faster machine.


Algorithms

3 key ideas

iteration

recursion

bisection


Iteration

cyclically refining an answer

Example:

solving well behaved

transcendental equations

X = sin (X + e)

trial solutions

reform equation as

X(I+1) = sin(X(I) + e)


X(I+1) = sin(X(I) + e)

Let e= 0.3

x(0) = 0.5

then :

x(1) = 0.7174

x(2) = 0.8507

x(3) = 0.9131

x(4) = 0.9367

x(5) = 0.9447

x(6) = 0.9473

x(7) = 0.9481

x(8) = 0.9484

x(9) = 0.9485

...

x(100) = 0.9485

the interaction converges


Linear Congruential Generators

LCG's

random number generates based on iterations

x(i+1) = (x(i) * a + c ) mod m

Generally

x(i) = integer between 0 and m-1

a , c integers

Park and Miller's minimal generators

a = 16807

m = 2^31 - 1 = 2147483647

c= 0


Random Numbers?

Are LCG's truly random?

No.

Are LCG's truly reliable?

No. It depends on the values of a, c, and m. Bad values can lead to disastrous results.


Bad Random Numbers

the infamous IBM LCG

a = 65539

m = 2^31

11 planes are visible when you plot

x(i+1) vs x(i)

'We guarantee that each number is random individually, but we don't guarantee that more than one of them is random."

anonymous computer consultant

from Press et al.


Recursion

executing a subroutine from within the same subroutine

Example:

Factorial

n!

Some C code:

fact(int x) {

if (x > 1)

return fact(x-1)*x;

else

return 1;

}


Recursion

Fibbinouci Series

1 1 2 3 5 7 9 ....

N(i) = N(I-1) + N(I-2)

Is this something that recursion can be applied to?


Bisection

divide and conquer techniques

assume your problem scales as N^2

if you can break the problem into too groups of N, the calculation will take 2 (N/2)^2 calculations

You can recursively repeat this technique

Can change N^2 into

N log N


Bisection

Science Examples

Fast Fourier Transform

used in signal processing and field equations

changes N^2 to N log N

----------

Multipole Expansion

used primarily to solve field equations

changes N^2 to N log N

often uses tree data structures


The Bubble Sort

a bad algorithm

N(N-1) calculations

O(N^2)

compares every number with every other number


Bubble Sort Pseudocode

Bubble(A, r)
for i=1, r-1
x = A(i)
for j=i+1, r
if x < A(j) then
exchange A(i) and A(j)
x = A(i)


Bubble Sort in Action

34 35 43 57 12 23 45 23

x = 34

12 35 43 57 34 23 45 23

x=12

12 35 43 57 34 23 45 23

x = 35

12 35 43 57 34 23 45 23

x = 34

12 34 43 57 35 23 45 23

x = 23

12 23 43 57 35 34 45 23

etc...


Quick Sorts

a good algorithm

two parts

a partitioning routine

a quicksort routine

Typical performance

N log N

Worst Case

N^2

example taken from

Cormen, Leiserson, and Rivest

Introduction to Algorithms


Quicksort Pseudocode

QuickSort(A, p, r)
if p < r
 then q = Partition(A,p,r)
Quicksort(A,p,q)
Quicksort(A,q+1,r)

Notes:

initially call with

A = unsorted array

p = 1

r = size of (A)


Partition Pseudocode

Partition(A, p, r)
x = A[p]
i = p - 1
j = r + 1
while TRUE
  do repeat j = j -1
    until A[j] <= x
     repeat i = i + 1
    until A[i] >= x
  if i < j
   then exchange A[i] with A[j]
   else return j
From Cormen et al.

Divides and partitions array into above and below x

Returns the index where this partition occurs


Quick Sort in Action

34 35 43 57 12 23 45 23

x = 34

23 35 43 57 12 23 45 34

23 23 43 57 12 35 45 34

23 23 12 57 43 35 45 34

j=3 is returned

thus

23 23 12 < 34

57 43 35 45 34 >= 34

repeat with smaller arrays


Unix Literacy

Compilers

cc gcc

f77

CC g++

basic options/flags

-g enable debugging

-o output file name

-O0 -O1 -O2 -O3

set optimization level

-ccompile only, do not link

-llink to this library

-L path of other libraries

-Ipath of included files

cc -o dog -lm -g dog.c

to compile a simple on file program

cc -o dog.o -g -c dog.c

cc -o cat.o -g -c cat.c

cc dog.o cat.o -lm -o bigdog

Other Unix Commands

alias mroe 'more'

path

source

.cshrc

This file is executed whenever you start a new C-shell.

redirecting io

cmd >> newfile

cmd > existing file

piping

cmd1 | cmd2

echo

env

emacs


Homework Assignment

1) edit your .cshrc file and modify your path to include

~jwallin/bin/

2) source the .cshrc file to update your path

3) execute the command "ddog" and pipe the results into a

file

4) create an alias in your .cshrc file for ddog called "mdog"

5) pipe the set of environmental variables into a file

6) cp the files ~jwallin/bin/test.c, ~jwallin/bin/test.f,

~jwallin/bin/test.cc into your directory

7) compile them with either the system or gnu compilers

8) execute the fortran and c++ codes, and pipe the results into a file

9) execute the compiled test.c code

10) using emacs, copy your .cshrc file, your ddog results,

and the results of your compilations into an html file

11) call the html file "hw2.html" and place it in your public_html

directory

12) add a link to your main html page with for this assignment

13) make sure the permissions are set so I can read them through

the web!

_______________________________

Return to contents page