CSI 801

Lecture #7

Software Engineering II

October 17, 1996

John Wallin

I programmed for three days
And heard no human voices.
But the hard drive sang.
Zen and the art of programming, Geoffrey James


Test Review

1. A particular computer has the following basic data types for representing floating point numbers.

Single Double

Mantissa 23 bits 47

Exponent 7 15

Sign bits2 2

These numbers have the form

+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)

(a) Define the term round-off error: (5 pts)

Round-off a type of error associated with the finite precision of numeric representations in computers. Since numbers are represented with a finite number of bits, the number of signficant digits is fixed. The result of some calculations may have error because of this loss of signficiant digits.


Test Review

(b) What types of calculations are likely to cause round-off error? Why are these calcuations likely to cause this problem? (5 pts)

Both addition and subtraction are likely to cause round-off error. When two numbers of greatly different magnitudes are added, the result will be approximately the largest number. If round-off error occurs, it may be EXACTLY the largest number.

X1 = X + E

where E << X, X exactly equal X1

Subtraction of two number of different magnitudes can lead to similar problems.


Test Review

(c) What is the smallest number greater than ONE which can be represented by the single data type? (5pts)

+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)

n = 23, k = 7

the number 1 is represented by

+0.1000000000 x 2^(+0000001)

[ 2^-1 * 2^1]

So the smallest number greater than one is:

+0.1000 ... 001 x 2^(+0000001)

this is equal to

1 + 2^(-23)*2^1 or

1 + 2^(-22)


Test Review

(d) What is the largest number which can be represented by the double data type? (5pts)

+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)

n = 46, k = 15

The largest number will be

+0.111 ... 111 x 2^(1111....111)

The exponent is 2^16 -1

so this will be ABOUT

2^(2^16 -1)

The mantissa is very close to one. In fact, it is

just 1 part in 2^47 from one, so the largest number is:

(1 - 2^(-47) ) * 2^( 2^(16) -1)

Note: IF the system uses +0 as equal to

1.0000000000 for the exponent and for the mantissa, the largest number will be

1.0 x 2^(2^16) = 2^(65536)


Test Review

2. You have just been hired to create a new software system to simulate fish populations living in a two dimensional version of the Chesapeake bay. Each fish has a slightly different state defined by its age, size, speed, location, and species. Each fish can only interact with fish within 100 meters. The only interactions possible are to eat another fish or be eaten by another fish.

(a) Define a derived data type which represents a fish. (5pts)

struct fish {

double x, y;

double vx, vy;

float size;

float age;

int species;

} ;

The elements must be specified within the data type, and their basic data types must be specified.


Test Review

(b) The essential performance issue in this code is to be able to find which fish are close enough to interact with each other. Given the choice of arrays, link lists, or trees, which type of data structure or structures would best be suited for simulating fish-fish interactions? Why would the other data structures be less suitable? (Hint : there may not be a unique answer to this question. Describe what choices YOU would make and WHY.) (5pts)

An array of fish would require you to determine the distance between every fish and every other fish. Although only a tiny fraction of the fish will be within 100 meters of each other, the (n-1) calculations must be computed for each fish leading to an N^2 routine. This is clearly not desireable.

If the fish are placed into 100 meter cells inside the bay, each fish could only interact with fish inside its own cell or its neighboring cells. The cell would point the first fish in the cell, and that fish would point to the next fish, and so on. This data structure would be a linked list . This type of data structure would significantly reduce the number of computations needed without creating muchadditional overhead.

If the fish are placed in a quad tree , it would be possible to search the tree for neighbors within a 100 meter area. This type of search will not be as efficient as a linked list, but will still be much more efficient than a simple array.


Test Review

(c) Speculate what types of validation might be used to test this code. What test cases should you use or what parameters should you monitor to see if the code is working? (Again, this is NOT a test about your knowledge of fish, but rather your ability to find suitable test cases in a complex problem.) (10pts)

A one fish test would be a good place to start for testing. Does the fish move through the bay correctly? Does it find its cell correctly?

A two fish test might be the next logical step. Does one eat the other ONLY when they are close together? Do the interactions work? Is the motion of two fish tracked correctly.

One preditor fish might be the next test.

Two preditors would be next.

Other tests are possible, depending on the interactions involved.


Test Review

3. A complex simulation you are writing requires you to bisect a list of numbers into two equal sized groups.

(a) Briefly discuss how you might use the quicksort code from your first assignment to bisect your data set. (5pts)

1) Set the Array A with elements 1 to N

2) Sort the list.

3) Take elements 1 to N/2, and elements N/2+1 to N into two groups


Test Review

(b) It is possible to bisect a random list into equal parts in less than n log n time. As part of the quicksort program, you wrote a routine called bisect which would divide a list into values above and below a key element.

The routine bisect had the inputs :

A - the array

i - the first element

j - the last element

When called, the routine re-orders the array into two groups

group 1 : elements i to q-1, which are smaller than A(q)

group 2: elements q to i, which are greater than or equal to A(q)

Explain how a fast bisection routine might work using the bisect routine as the basis. It may be helpful to write some pseudo code to explain the process. (Hint: the divide and conquer approach is most applicable here.) (10pts)


Test Review

Bisection can be done in a very efficient way using an approach similar to the quick sort. The basic idea is to use the bisect routine only on the subgroup which is near the median.

235475125690805465891367

using 23 as the key, we get

1213 547556908054658967 23

the median is in the right half- so use 54 as the new key

121323 75 56 90 80 54 65 89 67 54

again, the median is in the right half, so use 75 as the new key

12 13 23 56 54 65 67 54 90 80 89 75

the median is in the left half, so use 56 as the key

12 13 23 54 54 65 67 56 90 80 89 75

the median is in the right half, so use 65 as the key

12 13 23 54 54 56 67 65 90 80 89 75

now you can bisect the array


Test Review

Fast Bisection Algorithm

A is the input array

q = 0

i = 1

j = N

while (q != N/2) {

q = bisect(A, i, j)

if (q < N/2) i = q

else j = q

}


Test Review

Least Squares Fitting

4. Given a large set of data of the form

{xi, yi} where I = 1... N

You wish find the parameters a, b, c which provide the best fit of the data to the following equation:

y = a x^2 + b x + c

(a) Write the equation which defines chi^2 for the above problem. (5pts)

IF we assume that each data point is weighted the same:

Sumi=1,n ( yi - (a xi2 + b xi + c))2

A possibly useful reference:

"Data Reduction and Error Analysis for the Physical Sciences" by Philip Bevington

Excellent bedtime reading.


Test Review

(b) Find the analytic expressions which define the minimum of the chi^2 equation. (5pts)

We need to find the minimum of chi^2for each of the unknowns, a, b, and c.

D (chi^2)/ Da = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - xi2) = 0

D (chi^2)/ Db = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - xi) = 0

D (chi^2)/ Dc = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - 1) = 0

Where D represents the partial derivative.


Test Review

(c) Write a matrix equation which represents the above set of equations. (5Pts)

Rewrite the equations as :

Sum i=1,n(a xi4+ b xi3 + c xi2 = xi2 yi)

Sumi=1,n(a xi3+ b xi2 + c x i = xi yi)

Sum i=1,n(a xi2+ b xi + c = yi)

Which gives:

xi4 xi3 xi2 (a)= xi2 yi

xi3 xi2 xj(b)= xi yi

xi2 xi 1 (c) = yi


Test Review

(d) Write a short (< 1 page) pseudo code which will calculate the coefficients of the above matrix. (5 pts)

for n = 1, n

sx = sx + x(i)

sxx = sxx + x(i) * x(i)

sxxx= sxxx + x(i)*x(i) *x(i)

sxxxx = sxxxx + x(i)*x(i) *x(i) * x(i)

sy = sy + y(i)

sxy = sxy + x(i) * y(i)

sxxy = sxxy + x(i) * y(i) * y(i)

This is could be more efficient by using a temporary variable:

for n = 1, n

txx = x(i) * x(i)

sx = sx + x(i)

sxx = sxx + txx

sxxx= sxxx + txx*x(i)

sxxxx = sxxxx + txx * txx

sy = sy + y(i)

sxy = sxy + x(i) * y(i)

sxxy = sxxy + txx * y(i)

Giving

sxxxx sxxxsxx a = sxxy

sxxx sxx sx b=sxy

sxx sx N c = sy


Test Review

Computer Literacy

5. For the following questions, write the Unix command string or strings you would use to solve the follow problems. Exact syntax is not as important as understanding what commands can be used to solve specific problems.

(a) Find all the occurances of the string "spam" in the *.c files within a given directory. (5pts)

fgrep spam *.c


Test Review

(b) Determine how many times the string "spam" occurs in all the *.c files within a given directory. (5pts)

fgrep spam *.c | wc

or

fgrep -c spam *.c

This will display the number of lines with spam on it.


Test Review

(c) Change all the occurances of "spam" to "treet" in a given file. (5pts)

sed 's/spam/treet/g' filename > newfile


Test Review

(d) Add the directory "/usr/local/bin/gnu/duck" to your path for this and all future login sessions. (A procedure is probably more applicable than a command for this problem.) (5 pts)

1) edit your .cshrc file (or your .login file)

2) change your path variable

path = $path:/usr/local/bin/gnu/duck

3) exit the file

4) use the %source" command to re-initialize the .cshrc file


Test Review

(e) Create a directory which contains files that the world can read and write. (5pts)

mkdir newdir

chmod 555 newdir


Software Management

How to you manage a complex code project?

How do do you track code modifications?

How do you profile codes?

How do you optimize codes?

How do you debug codes?

How do you re-engineer existing projects?


Complex Code Elements

programs (subroutines and functions)

headers

object files

libraries

executables

shell scripts

input files and configuration files

output files


Dependence

output files depend on input files and executables

executable files depend directly on object files and libraries

libraries depend on object files

object files depend on headers and program files

input files depend on shell scripts

When one section is changed, the project must be updated correctly.


Dependence Trees

output files: inputfiles, executable

exectuable inputfile

executable: objectfiles, libs

cc -o executable -lm -llibs object

objectfiles: programfiles, headers

cc -c programfiles

A change in the input file would NOT require the software to be recompiled.


Makefiles

should be named "makefile" or

Makefile

executed by typing

make

make option

make -f filename

Internal Syntax:

when a depends on b, and is produced by some rule, you use

a:b

rule

as the general syntax, where there is a TAB in front of rule. Multiple rulesare possible.


#Sample Makefile

# a comment inside a Makefile

exe = dog

obj = dog.o

src = dog.c

CC = cc

LIB = -lm

CFLAG = -p

all: dog

dog.o:dog.c, dog.h

$(CC) -c $(CFLAG) $(src)

dog: dog.o

$(CC) -o $(exe) $(LIB) $(obj)

clean:

rm *.o

rm *~


Makefiles

Other standard targets include:

install and veryclean

However, any target can be defined within a Makefile. You are NOT limited to C or Fortran compilers.

The use of Makefiles is essential in any complex software system on Unix systems.


Tracking Code Modifications

one of the biggest problems in complex programs is tracking software modifications

ALWAYS make a copy of your project before you start modifying it!!!!

For complex projects, keep notes about what changes were made.

Make sure you can %back out" of changes, if things go bad.


SCCS

source code control system

a standard Unix toolset for code control

keeps track of changes in files

commands:

admin

get

delta

prs

use man and see Landau and Fink for details


Profiling Codes

We have already discussed timing abstractly.

Determining WHERE time is spend inside a program is essential to making it more efficient.

Unix has some standard ways of profiling code performance.


Using Prof

1) Compile your code with the %-p" option. This will add the profiling routines to your system.

2) Run your code on the sample problem.

3) Use profile to determine the time spent in each routine.

prof executable

is the general syntax. Use

"man prof" for more information.


Optimizing Code

there are lots of labor intensive ways to optimize code

-use loops correctly to minimize

page faults

-use temporary variables appropriately

-re-write equations, when appropriate

However, the first step may be to use the compiler optimization.

-00 to -O3 are semi-standard options in most Fortran and C compilers.


Debugging Codes

Using debuggers can save substantial amounts of time in code development.

A semi-standard set of debuggers is available on most Unix systems- dbx.

To use this system,

1) Compile your code with the %-g" flag and -O0 (no optimization).

2) Use the command

dbx executable

to start dbx


dbx Basic Commands

run

quit

print variable

print expression

trace function=

trace variable=

trace line-number=

trace=

list (10 lines)

list all

list first, last

step

next

cont


More dbx commands

stop variable

stop at line-number

stop in function

status

delete number

help


Object Oriented Programming - OOP

the leading programming paradigm in industry and computer science

languages include C++, Java, Smalltalk, Perl (5.0), Javascript

OOP applications include S+, IDL

Rapidly making inroads into science


Scientific Procedural Programming

C, Fortran, Basic etc.

1) Define your problem

2) Develop data structures which represent variables in the problem

3) Develop functions which act on the problem variables


Procedural Programming Problems

Data types may evolve with future usage.

All the functions will have to be rewritten to incorporate the new data types.

Program tasks are distributed into many routines and many data structures.


Consider the Fish

a fish has:

a position

a velocity

a size

a type

a fish can:

swim

eat

Why fish? Fish make good examples of classes, since they are always found in schools.


OOP

a fish could be represented as an Object

Objects have both data structures and behavior.

Attributes - things the fish has

Methods - things the fish can


OOP

four main ideas

identity

classification

polymorphism

inheritance


Identity

each fish is a unique fish

There can be many instances of a class.

Each instances is one object .

Similar attributes can be held by several objects.


Classification

each fish is not a river

An one abstract idea is completely encapsulated by a class.

Objects can interact with each other.


Polymorphism

fish move differently than cars

The same operation may behave differently on different classes.

Methods with the same name may cause similar actions in different objects or complete different actions.


Inheritance

a flying fish has most of the properties of a fish

flying fish have all the methods and attributes of a fish, plus the extra method:

fly

a new class flying_fish can inherit all the attributes and methods of the class fish


OOP Themes

abstraction

encapsulation

combining data and methods

reusability and generalizability

message passing

completeness (instead of usage)


C++ Implementation

multiple inheritence

public, protected, and private methods and attributes

pointers and pointer arithmetic

no ANSI standard, dozens of variations

hybrid procedural and OOP language


Java

Sun's C++ like language

simple inheritence

no pointers

only objects, no procedures

one standard

mostly interpreted


Javascript

hybred functional/OOP

no strong typing

interpreted inside Netscape

primarily a scripting language

very very immature

very domain specific


Object Modelling Technique

object model

encapsulation and relationship

dynamic model

control of objects

state diagrams

functional model

data flow diagrams


Some Pitfalls of OOP

it is not an "easy" paradigm shift

it will not greatly speed development times

good initial design is ABSOLUTELY esssential

using technologies just because they are "COOL" or "TRENDY" is not good software engineering


Pitfalls - from Webster

not enough education

substituting form for substance

not understanding OOP implications

abandoning good design


Some References

"Simple C++", Cogswell

"Programming in C++" Dewhurst and Stark

"Teach Your Self Java in 21 Days", Lemay and Perkins

"Object Oriented Modeling and Design", Rumbaugh, Blaha, Premerlani, Eddy, and Lorensen

"Pitfalls of Object Oriented Development", Webster



Copyright John Wallin 1996. All rights reserved.
Last Modified : Thu Oct 17 18:01:00 EST 1996 <jwallin@gmu.edu>