I programmed for three days
Zen and the art of programming, Geoffrey James
And heard no human voices.
But the hard drive sang.
Single Double
Mantissa 23 bits 47
Exponent 7 15
Sign bits2 2
These numbers have the form
+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)
(a) Define the term round-off error: (5 pts)
Round-off a type of error associated with the finite precision of numeric representations in computers. Since numbers are represented with a finite number of bits, the number of signficant digits is fixed. The result of some calculations may have error because of this loss of signficiant digits.
Both addition and subtraction are likely to cause round-off error. When two numbers of greatly different magnitudes are added, the result will be approximately the largest number. If round-off error occurs, it may be EXACTLY the largest number.
X1 = X + E
where E << X, X exactly equal X1
Subtraction of two number of different magnitudes can lead to similar problems.
+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)
n = 23, k = 7
the number 1 is represented by
+0.1000000000 x 2^(+0000001)
[ 2^-1 * 2^1]
So the smallest number greater than one is:
+0.1000 ... 001 x 2^(+0000001)
this is equal to
1 + 2^(-23)*2^1 or
1 + 2^(-22)
+/- 0.b1 b2 b3 b4 ... bn x 2 ^(m1 m2 m3 ... mk)
n = 46, k = 15
The largest number will be
+0.111 ... 111 x 2^(1111....111)
The exponent is 2^16 -1
so this will be ABOUT
2^(2^16 -1)
The mantissa is very close to one. In fact, it is
just 1 part in 2^47 from one, so the largest number is:
(1 - 2^(-47) ) * 2^( 2^(16) -1)
Note: IF the system uses +0 as equal to
1.0000000000 for the exponent and for the mantissa, the largest number will be
1.0 x 2^(2^16) = 2^(65536)
(a) Define a derived data type which represents a fish. (5pts)
struct fish {
double x, y;
double vx, vy;
float size;
float age;
int species;
} ;
The elements must be specified within the data type, and their basic data types must be specified.
An array of fish would require you to determine the distance between every fish and every other fish. Although only a tiny fraction of the fish will be within 100 meters of each other, the (n-1) calculations must be computed for each fish leading to an N^2 routine. This is clearly not desireable.
If the fish are placed into 100 meter cells inside the bay, each fish could only interact with fish inside its own cell or its neighboring cells. The cell would point the first fish in the cell, and that fish would point to the next fish, and so on. This data structure would be a linked list . This type of data structure would significantly reduce the number of computations needed without creating muchadditional overhead.
If the fish are placed in a quad tree , it would be possible to search the tree for neighbors within a 100 meter area. This type of search will not be as efficient as a linked list, but will still be much more efficient than a simple array.
A one fish test would be a good place to start for testing. Does the fish move through the bay correctly? Does it find its cell correctly?
A two fish test might be the next logical step. Does one eat the other ONLY when they are close together? Do the interactions work? Is the motion of two fish tracked correctly.
One preditor fish might be the next test.
Two preditors would be next.
Other tests are possible, depending on the interactions involved.
(a) Briefly discuss how you might use the quicksort code from your first assignment to bisect your data set. (5pts)
1) Set the Array A with elements 1 to N
2) Sort the list.
3) Take elements 1 to N/2, and elements N/2+1 to N into two groups
The routine bisect had the inputs :
A - the array
i - the first element
j - the last element
When called, the routine re-orders the array into two groups
group 1 : elements i to q-1, which are smaller than A(q)
group 2: elements q to i, which are greater than or equal to A(q)
Explain how a fast bisection routine might work using the bisect routine as the basis. It may be helpful to write some pseudo code to explain the process. (Hint: the divide and conquer approach is most applicable here.) (10pts)
235475125690805465891367
using 23 as the key, we get
1213 547556908054658967 23
the median is in the right half- so use 54 as the new key
121323 75 56 90 80 54 65 89 67 54
again, the median is in the right half, so use 75 as the new key
12 13 23 56 54 65 67 54 90 80 89 75
the median is in the left half, so use 56 as the key
12 13 23 54 54 65 67 56 90 80 89 75
the median is in the right half, so use 65 as the key
12 13 23 54 54 56 67 65 90 80 89 75
now you can bisect the array
Fast Bisection Algorithm
A is the input array
q = 0
i = 1
j = N
while (q != N/2) {
q = bisect(A, i, j)
if (q < N/2) i = q
else j = q
}
4. Given a large set of data of the form
{xi, yi} where I = 1... N
You wish find the parameters a, b, c which provide the best fit of the data to the following equation:
y = a x^2 + b x + c
(a) Write the equation which defines chi^2 for the above problem. (5pts)
IF we assume that each data point is weighted the same:
Sumi=1,n ( yi - (a xi2 + b xi + c))2
A possibly useful reference:
"Data Reduction and Error Analysis for the Physical Sciences" by Philip Bevington
Excellent bedtime reading.
We need to find the minimum of chi^2for each of the unknowns, a, b, and c.
D (chi^2)/ Da = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - xi2) = 0
D (chi^2)/ Db = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - xi) = 0
D (chi^2)/ Dc = Sum i=1,n 2 ( yi - (a xi2 + b xi + c)) x ( - 1) = 0
Where D represents the partial derivative.
Rewrite the equations as :
Sum i=1,n(a xi4+ b xi3 + c xi2 = xi2 yi)
Sumi=1,n(a xi3+ b xi2 + c x i = xi yi)
Sum i=1,n(a xi2+ b xi + c = yi)
Which gives:
xi4 xi3 xi2 (a)= xi2 yi
xi3 xi2 xj(b)= xi yi
xi2 xi 1 (c) = yi
for n = 1, n
sx = sx + x(i)
sxx = sxx + x(i) * x(i)
sxxx= sxxx + x(i)*x(i) *x(i)
sxxxx = sxxxx + x(i)*x(i) *x(i) * x(i)
sy = sy + y(i)
sxy = sxy + x(i) * y(i)
sxxy = sxxy + x(i) * y(i) * y(i)
This is could be more efficient by using a temporary variable:
for n = 1, n
txx = x(i) * x(i)
sx = sx + x(i)
sxx = sxx + txx
sxxx= sxxx + txx*x(i)
sxxxx = sxxxx + txx * txx
sy = sy + y(i)
sxy = sxy + x(i) * y(i)
sxxy = sxxy + txx * y(i)
Giving
sxxxx sxxxsxx a = sxxy
sxxx sxx sx b=sxy
sxx sx N c = sy
5. For the following questions, write the Unix command string or strings you would use to solve the follow problems. Exact syntax is not as important as understanding what commands can be used to solve specific problems.
(a) Find all the occurances of the string "spam" in the *.c files within a given directory. (5pts)
fgrep spam *.c
fgrep spam *.c | wc
or
fgrep -c spam *.c
This will display the number of lines with spam on it.
(c) Change all the occurances of "spam" to "treet" in a given file. (5pts)
sed 's/spam/treet/g' filename > newfile
1) edit your .cshrc file (or your .login file)
2) change your path variable
path = $path:/usr/local/bin/gnu/duck
3) exit the file
4) use the %source" command to re-initialize the .cshrc file
(e) Create a directory which contains files that the world can read and write. (5pts)
mkdir newdir
chmod 555 newdir
How to you manage a complex code project?
How do do you track code modifications?
How do you profile codes?
How do you optimize codes?
How do you debug codes?
How do you re-engineer existing projects?
programs (subroutines and functions)
headers
object files
libraries
executables
shell scripts
input files and configuration files
output files
output files depend on input files and executables
executable files depend directly on object files and libraries
libraries depend on object files
object files depend on headers and program files
input files depend on shell scripts
When one section is changed, the project must be updated correctly.
output files: inputfiles, executable
exectuable inputfile
executable: objectfiles, libs
cc -o executable -lm -llibs object
objectfiles: programfiles, headers
cc -c programfiles
A change in the input file would NOT require the software to be recompiled.
should be named "makefile" or
Makefile
executed by typing
make
make option
make -f filename
Internal Syntax:
when a depends on b, and is produced by some rule, you use
a:b
rule
as the general syntax, where there is a TAB in front of rule. Multiple rulesare possible.
# a comment inside a Makefile
exe = dog
obj = dog.o
src = dog.c
CC = cc
LIB = -lm
CFLAG = -p
all: dog
dog.o:dog.c, dog.h
$(CC) -c $(CFLAG) $(src)
dog: dog.o
$(CC) -o $(exe) $(LIB) $(obj)
clean:
rm *.o
rm *~
Other standard targets include:
install and veryclean
However, any target can be defined within a Makefile. You are NOT limited to C or Fortran compilers.
The use of Makefiles is essential in any complex software system on Unix systems.
one of the biggest problems in complex programs is tracking software modifications
ALWAYS make a copy of your project before you start modifying it!!!!
For complex projects, keep notes about what changes were made.
Make sure you can %back out" of changes, if things go bad.
source code control system
a standard Unix toolset for code control
keeps track of changes in files
commands:
admin
get
delta
prs
use man and see Landau and Fink for details
We have already discussed timing abstractly.
Determining WHERE time is spend inside a program is essential to making it more efficient.
Unix has some standard ways of profiling code performance.
1) Compile your code with the %-p" option. This will add the profiling routines to your system.
2) Run your code on the sample problem.
3) Use profile to determine the time spent in each routine.
prof executable
is the general syntax. Use
"man prof" for more information.
there are lots of labor intensive ways to optimize code
-use loops correctly to minimize
page faults
-use temporary variables appropriately
-re-write equations, when appropriate
However, the first step may be to use the compiler optimization.
-00 to -O3 are semi-standard options in most Fortran and C compilers.
Using debuggers can save substantial amounts of time in code development.
A semi-standard set of debuggers is available on most Unix systems- dbx.
To use this system,
1) Compile your code with the %-g" flag and -O0 (no optimization).
2) Use the command
dbx executable
to start dbx
run
quit
print variable
print expression
trace function=
trace variable=
trace line-number=
trace=
list (10 lines)
list all
list first, last
step
next
cont
stop variable
stop at line-number
stop in function
status
delete number
help
the leading programming paradigm in industry and computer science
languages include C++, Java, Smalltalk, Perl (5.0), Javascript
OOP applications include S+, IDL
Rapidly making inroads into science
C, Fortran, Basic etc.
1) Define your problem
2) Develop data structures which represent variables in the problem
3) Develop functions which act on the problem variables
Data types may evolve with future usage.
All the functions will have to be rewritten to incorporate the new data types.
Program tasks are distributed into many routines and many data structures.
a fish has:
a position
a velocity
a size
a type
a fish can:
swim
eat
Why fish? Fish make good examples of classes, since they are always found in schools.
a fish could be represented as an Object
Objects have both data structures and behavior.
Attributes - things the fish has
Methods - things the fish can
four main ideas
identity
classification
polymorphism
inheritance
each fish is a unique fish
There can be many instances of a class.
Each instances is one object .
Similar attributes can be held by several objects.
each fish is not a river
An one abstract idea is completely encapsulated by a class.
Objects can interact with each other.
fish move differently than cars
The same operation may behave differently on different classes.
Methods with the same name may cause similar actions in different objects or complete different actions.
a flying fish has most of the properties of a fish
flying fish have all the methods and attributes of a fish, plus the extra method:
fly
a new class flying_fish can inherit all the attributes and methods of the class fish
abstraction
encapsulation
combining data and methods
reusability and generalizability
message passing
completeness (instead of usage)
multiple inheritence
public, protected, and private methods and attributes
pointers and pointer arithmetic
no ANSI standard, dozens of variations
hybrid procedural and OOP language
Sun's C++ like language
simple inheritence
no pointers
only objects, no procedures
one standard
mostly interpreted
hybred functional/OOP
no strong typing
interpreted inside Netscape
primarily a scripting language
very very immature
very domain specific
object model
encapsulation and relationship
dynamic model
control of objects
state diagrams
functional model
data flow diagrams
it is not an "easy" paradigm shift
it will not greatly speed development times
good initial design is ABSOLUTELY esssential
using technologies just because they are "COOL" or "TRENDY" is not good software engineering
not enough education
substituting form for substance
not understanding OOP implications
abandoning good design
"Simple C++", Cogswell
"Programming in C++" Dewhurst and Stark
"Teach Your Self Java in 21 Days", Lemay and Perkins
"Object Oriented Modeling and Design", Rumbaugh, Blaha, Premerlani, Eddy, and Lorensen
"Pitfalls of Object Oriented Development", Webster