CSC148 SLOG at UofT

Thursday 27 March 2014

Assignment Topic Three: Sorting and Efficiency (Last post for the class)

The run-times of algorithms in computer science are categorized using different function types (logarithmic, exponential, polynomials, etc.) and the relationships known as big-Oh, big-Omega, and big-Theta. In this particular course only big-Oh was considered with any depth. Given an input of size n and an average run-time of g(n) for some algorithm, g(n) is said to be in big-Oh of another function f(n) - denoted O(f(n)) - if for a certain input size and beyond, f(n) multiplied by some constant is an upper bound for g(n). Mathematically speaking there exists a real constant and a natural constant such that for all inputs, if n is greater than or equal to the natural constant then g(n) is less than or equal to f(n) multiplied by the real constant. Big-Omega mirrors this relationship with a lower bound and big-Theta is the case when both the lower and upper bound requirements are met for one f(n).

These notations are useful for approximating nasty representations of run-times and comparing algorithms. In regards to comparison, the notation used with a particular function can be viewed as a set of functions who satisfy the "big" definition with respect to that particular function. Those sets that revolve around faster run-times are subsets of those centered at slower run-times.

For example: O(1) is a subset of O(log(n)) which is a subset of O(n) and so on.

Efficiency is often taught in conjunction with all kinds of sorting algorithms. These sorting algorithms all have well understood big-Oh classes. The weakest of them are the bubble, selection, and insertion sorting algorithms whose run-times are all O(n^2). I'm going to give a whirlwind tour as to how these algorithms work. Also each of these algorithms are traditionally demonstrated on a list L containing integers. To be sorted means that the integers are arranged "smallest-to-largest" from index 0 to len(L) - 1.

Bubble Sort:

At a given step a starting value is assigned and is compared with the next element. If the value is greater than the element their positions are swapped, otherwise the next element becomes the value for comparison as you iterate over the list. When a given element assigned to the comparison variable reaches the end of the current iteration a new one begins with the alteration that the iteration spans one index less than the previous iteration. This is because beyond and including that separated index is already sorted. Ultimately elements bubble up to their final positions.

Selection Sort and Insertion Sort:

Like bubbling, these algorithms utilize a barrier to distinguish between what has and has not been sorted. The main difference being that the barrier moves from the front of the list and therefore increases the starting point of a new iterative step. Selection sort works analogously to bubbling where it instead finds the smallest element and places it in the greatest position of the sorted part and then moves that barrier. The insertion technique sorts on the fly by comparing a given value to the sorted part and places it accordingly, all the while moving the barrier as it goes.

The next class of sorting algorithms are generally much more efficient but the preceding three (ok perhaps not bubble sort) do have merit when applied to small lists. The faster sorts I'm referring to are called quick sort and merge sort. The latter is O(nlog(n)) with the logarithm of base 2 and the former at its best also meets this run-time classification.

Quick Sort:

The idea here is to choose a pivot element at which a left and right list are created. The left list is a sorted list of all the elements less than the pivot and the right is a sorted list of all the elements greater than the pivot. Bringing them together would then generate the desired list. Now how are the left and right lists sorted you may ask? Recursively. Eventually the calls will cause the lists to be single values and then the recursion can make its way back up to the top. I made notion to the fact that quick sort's run-time class can change depending the choice of the pivot with respect to the conditions. By conditions I mean the length of the list and to what degree it is unsorted. A good choice in pivot results in a run-time of O(nlog(n)) but a poor choice can result in quadratic run-time.

Merge Sort:

Somewhat similar to quick sort except the list is divided in half (sometimes approximated via integer division) and the sorted halves are merged by moving through and comparing the items in parallel from the largest to the smallest and appending these values to a third new list . The new list becomes an extension of the smaller lists to generate the merge. Recursive calls are made on the halves.

These are the main sorting techniques studied in CSC148. The other obvious one to mention is python's built-in sort which is far faster than any of these. Firstly it's highly optimized such that depending on the conditions of the list provided it will employ the most efficient algorithm. It can make those distinctions part way through its sorting. For example, if a merge sort generates an acceptably small halve insertion sort will be used instead of the recursive call. Secondly in the python environment it has the advantage of being written in C. This is a lower level language that is more akin to what the processor directly understands.

Sunday 16 March 2014

A little on algorithm run-time and egregious errors....

I don't really know where to start. I think the most embarrassing error this past week were the mistakes in the unittest files related to exercise 3. Now perhaps I'm just spoiled from the prerequisite course where errors by the teaching staff were nearly nonexistent or handled so well I didn't take notice, but given that this is a skill that relies on precision and accuracy in the scientific sense it's bothersome that there have been just so many misprints, typos, and other flaws. There's also great confusion regarding assignment two part two. To the extent that a good number of people are initially unsure of what the parameter and return types of some of the functions are.

I don't want to dwindle on it too long for there are many other evaluations this coming week. What was nice about class however (aside from hilarious moment where I witnessed an instructor reluctantly ask for an extra pen) was the overlap between CSC148 and CSC165 on run-time analysis. In both classes we have been studying big - Oh notation's use to approximate (with an upper bound) the run-time of a function. In this class the analysis is greatly focused on application and many examples concerning sorting algorithms were shown. The other course goes deeper into the mathematical theory.

I'll end on a plus: I was able, after hitting my head against the wall enough times, to put together a decent algorithm that handles permutations. I for one am surprised more people aren't flailing about for help on this but maybe like the tower of Hanoi problem the algorithm or a related one was covered in the other class section. Then again I might not be very good at this ... oh well.

Saturday 8 March 2014

Trees, Linked Lists, more Trees...

The two past weeks amid the tests and assignments we've been largely studying recursive data structures. After two labs and an exercise they are not so abstract anymore. A linked list, who has the attributes head and children in which the children are other linked lists, provides an alternative implementation of the Stack and Queue. In practice the class is more restrictive in this form since unlike a list they cannot be indexed, reversed, popped etc. This is usually desirable since it limits unwanted/unprecedented behaviour and the extent users can augment the object. The implementation is not as clean cut as the list implementation, although at the end of the day code is more often read and utilized than it is written.

Trees can come in many different forms than I had imagined. I spoke of the basic tree design before but it can be broken down into nodes. In a binary tree the node class has a primary data attribute and a left and right node. The binary search tree then uses the nodes as its parts but sorts the data accordingly as they are inserted.

In other news, the assignment 2 part 1 document was still fairly vague but at least there didn't seem to be any mistakes. I personally believe assignments that do not require much of or any starter code, such as the exercises, can provide the most interesting, understandable, and effective learning experience much more readily. As a matter of fact the instructions for this current exercise (3) maintains the high quality of question design, so there's an odd disconnect between the design of the assignments and exercises.

Finally the first test went smoothly and the questions were fair. I hope it reflects the approach of proceeding evaluations.

Hold on ... for a moment I had forgotten my username. I have critical comment or tip to people in the class. Instructor Danny Heap seems partial to giving out hints in lab and exercise documents. Ignore these completely if you have even the slightest vision for your program. Often I find there are a myriad of ways to provide a proper implementation for our relatively low level of understanding and rigor. I also specifically want to make a distinction between mathematical and computer science hints. If there's a trick (ie recursion) this can be a life saver but once a direction for implementation has been imposed upon you your creativity is diminished. I more often than not find myself lead astray by many of his hints and approaching a CS problem freshly (with an understanding of the subject material obviously) is the only way we're ever going to gain any solid programming skills.

Wednesday 26 February 2014

Assignment Topic Two: Warming up to Recursion.

Recursion is a technique in which a large problem is broken down into smaller ones. To define something recursively means to define that something by a relation to itself. For example a value in the Fibonacci sequence is defined as the sum of the two last values in the sequence. This seems ridiculous because an origin has not yet been given. For any recursive algorithm to work it must have a base case(s) and at each step of the algorithm get nearer to that base case(s), otherwise the program will spiral out of control or simply crash.

In the case of the Fibonacci sequence the base case or origin value is the predetermined value of 1 for the first and second positions in the sequence. In code a function utilizes itself to solve easier and easier problems until it can return a concrete value.

def fib(p: int): -> int:
""" Return the Fibonacci value at a given position, positions start from index 0 """

if p == 0 or p == 1:
return 1
else:
return fib(x - 1) + fib(x - 2) #This is the recurrence relation that defines the sequence.

This is a decent structure of basic recursive functions. At a more complex scale there may be iterations and many base cases that have to be considered.

Often recursion is a viable solution when either the pattern or objects you are working with are intrinsically recursive. These objects can come in the form of recursive data structures such as trees. A tree is an object with a root and branches. The branches are defined to be other trees. In this case you're forced to think recursively from the get-go.

Recursive definitions need not only be algebraically understood. Some recursive algorithms can be geometrically based. The traversal algorithms of binary trees ( trees with only 2 branches) demonstrate this process. The in-order traversal algorithm has you start at the root, print the value, and then move to the left branch before moving to the right branch. Except once you move to the left you're now at what may be a new tree and furthermore means that your position is the root of this sub-tree. This restarts the algorithm (as the function would recursively be recalled upon in the body at this point) and continues to do so until a branch is no longer a tree itself.

Often unlike their algebraic counterparts, geometric recursion is easier to trace than it is to represent in a clean recurrent relation. This largely takes practice and for beginners some faith in the algorithm for the behaviour with respect to base cases to translate to greater depths.

Recursion depth is simply how many times the larger problem has been reduced to a smaller one. From the fib function above it can be seen that it continues digging until it reaches a concrete value of 1 and then as it climbs back out it evaluates the problem one level above using that concrete value to generate a new concrete value to be used in the next highest level. So there needs to be a clear path back up to the surface.

Recursion is a very powerful tool when used appropriately. It can however be overly complicated to introduce recursion to problems that don't necessarily require it. As with any computer science problem you need to begin with a simple example you know how to compute and then dissect the steps you take (often implicitly) to find the solution. Those details define the type of function or program that is written.

Thursday 13 February 2014

More Recursion and Trees

The week before reading week has got to be one of the toughest weeks of the academic year (except for maybe the final week). So there's been a delay on this post as a result.

I'll start positive. The lab over a week ago did supply good practice with recursive functions. I still feel, while comprehensions are nice, that code should be indented (python) and done in blocks for clarity. This is particularly with regards to " if statements" that become an absolute mess otherwise. Speaking of this, this past week's lab oddly had us interpret code written using comprehensions and functional programming techniques and re-implement the functions using the style we've always known. This just means there's some confusion as to how I'm expected to code on the tests/Assignments. Dustin did mention he'd send an email about this and other material we're responsible for on the test.

We also looked at trees this week. They seem simple but the pre-, in-, post- , order traversal algorithms can give your brain a workout with some very large cases. This is precisely what I intend to do in preparation for the test. In-order traversal is probably the weirdest as it moves left to root, then to right, and not merely root to children, or children to root as in pre/post-order traversals respectively.

A1 marking should start from tonight onward. I do hope on the next assignment that they make the instructions far more clear and more importantly ensure the starter code is not fraught with errors. Now I must return to studying.

Sunday 2 February 2014

Comments: Recursion and Testing.

This last week was largely comprised of exposing us to lots and lots of recursion examples. That being said we don't get much of a chance to interact with them.

Yes the code is available to us on the course webpage.

What I'm getting at is seeing recursion being used to solve problems is different than having recursion in your tool box, attempting a problem, and ultimately realizing recursion is a viable path to a solution.I think it would be more effective if we'd be given some very easy practice problems to warm up with and get a sense of the technique of recursion. Instead of merely seeing solutions of problems using recursion and then be expected to implement a rather difficult one on a our assignment this week.

To get a sense of the technique is particularly what I've sought out to do recently and after scouring the internet you find some useful tips. You want to break the function down into two parts (for a simple problem) in which either the argument falls into the category of the base case(s) or that it must be reduced to the base case by simply following the pattern. In a later week for the class I will have to publish a post dedicated specifically to recursion so I'll end things here.

I'd rather be coding (this is actually with respect to the lectures) but if I'm going to sit in a lecture hall for two hours watching a power point presentation I'd rather the lecture systematically describing how I should go about implementing these techniques (especially if they're new) than just show us lots of examples with no preface.

OK, I've reached my complaint-quota for the week.

Aside from recursion, unittest became much more interesting. In lab we utilized the fact that since tests are considered classes they can be the subject of inheritance. The only caveat is that the ability to reuse a particular test usually means that it is very general in nature, although, anything that reduces duplicate code is almost always warranted. Also the setUp and tearDown methods are great. No really - why they weren't covered in the prerequisite course where we often did create unittest files is mind boggling.

Thursday 23 January 2014

Additions about O.O.P

After this week's latest class I should probably comment on some of the more technical parts of O.O.P.

Often when creating an overriding subclass method it is useful to utilize the corresponding method of the master class such that the overriding method derives all the information from the original method. Then you can build around it to specialize that new overriding method. In this instance calling that method requires the following syntax:

MasterClass.method(self, arguments...)

Similarly if you simply want to use the more general method and not the specialized version the syntax required is MasterClass.method(subclass_object, arguments...).

Another less technical point made was that when designing an __eq__ method, it must encompass three qualities.

The simplest is symmetry.

If a == b, then b == a

The next is that it is reflexive.

a == a

The last quality, which is also the easiest to overlook, is transitivity.

If a == b and b == c, then a == c.

That's all for this week.