What does a computer do?
- perform calculations (billions per second)
- remembers results
What kinds of calculations?
Computers come with sets of built-in operations. Usually true or false evaluations and arithmetic operations from the Arithmetic Ligic Unit (ALU).
Types of Knowledge
Declarative Knowledge – statement of fact or truth, ie. “the sky is blue”
Imperative Knowledge – Recipe or directions for computation. This is a mechanical sequence of how-to steps. These are sequences of simple steps with a flow of control that directs how to move through sequence. There is also a point to terminate the algorithm.
Limits to Computers
Some problems are too complex to develop predictable solutions. This is what allows encryption techniques to flourish. At the same time, some problems are fundamentally impossible to compute. I think Touring had an example of a question that was impossible to compute.
Computers know what you tell them. There are two types of directions:
- Declarative – statement of fact
- Imperative – instructions
- Recipes are sequences of septs with a flow of control process the directs you on how to proceed through steps. We also need to know when to stop. Recipies are also algorithms!
Machines
Computers are machines that compute operations. There are two types of these machines:
Fixed Program – Performs one set of operations
Stored Program – Performs any set of operations that you load into the computer. The advantage is that the computer can emulate a fixed program computer for any number of different operations.
Whats Inside the Machine?
Memory – Data stored and sets of instructions
Input / Output
ALU – Arithmetic logic unit. Takes info from memory, reads it in, performs primative operation, then stores the result back into memory.
Control Unit – Keeps track of what operation to perform in ALU at each step
Program Counter – When program is loaded in machine, the counter points to the location of the first insturction and performs the operation. The program counter is then incremented, firing off the next instruction. Eventually we will react a test to determine whether the program has completed running, then there will be an outup.
Basic Primatives
Turing showed that you can compute anything with 6 primatives:
- Move left
- move right
- scan
- read
- write
- do nothing
Modern languages have a more convenient set of primitives. We also want to create new primitives using the primitives available.
Turing Complete – Anything computable in one language is also computable in another language.
In programming, we are trying to combine primatives in legal ways to create instructions (recipies) for solving problems. Programs are sequences of definitions and commands.
Definitions are names applied to variables or, more importantly, discreet procedures that can be evaluated like primatives. Commands are simple expressions that are executed by Python in the shell. The shell is the window in which we can write Python code to be interpreted.
Commands instruct the interpreter to perform an action. We can type commands into shell, but we can also use files to be evaluated by the interpreter.
The fundamental primatives that represent data are called data objects. Our program will manipulate these data objects to solve problems. Every object has a type that tells us information about the object. The type also tells programs whether they can act upon the data object or not. Objects can be scalar or non-scalar:
- scalar – data that cannot be subdivided
- non-scalar – data that has internal structure that can be accessed and evaluated
Scalar Objects
- int – integers
- float – real numbers with decimals
- bool – Boolean values of true or false
- NoneType – Special object with only one type (none)
Type Conversion (Casting)
We can convert scalar objects from one type to another. The command:
float(3)
Takes the int 3 and casts it to a floating point number, or 3.0. Similarly:
int(3.9)
Takes the float 3.9 and truncates it to an integer of 3
Expressions
We combine objects and operators to form expressions. Every expression has a value (that is computed), which also has a type.
When writing expressions, we use the following syntax (simple example):
<object> <operator> <object>
When adding, subtracting, or multiplying integers, the value will be an integer. If one or more objects are floats, then the value will be a float.
When dividing, the value is always a float. There are two types of division, standard and integer division. Int Division, denoted by //, returns the int quotient without the remainder. So 5//2 results in 2, not 2.5. Can also receive modulus (remainder) which is an int. So 5%2 results in 1, which is the remainder of 2 into 5.
Raising a number to a power can also be performed. 5**5, which results in 3125. This is different from 5*5, which is 5 times 5 or 25.
Simple Operations
When writing expressions, parenthesis are used to control the order of operations. Without parenthesis, here is the order precedence:
- **
- *
- /
- + and –
- executed left to right as they appear in the expressions
Binding Variable and Values
Assignment – In python and many programming languages, the equal sign is an assignment of a value. It says, “bind to or associate with” one value to another. Value is on the right, and we take the name on the left of the equals and assign it the value that is on the right.
We abstract expressions to make the program run faster, to be able to understand it more easily, or to chance the code more easily in future iterations.
When working with variables, we use names because they can be discriptive and meaningful. They also make it easier to read over our code. The only caveat is that our names CANNOT be keywords used by the programming language.
Values can be any legal expression that can be stored in memory. Values can be updated in memory.
Variable Binding
We always evaluate assignments from right to left. When binding variables, we first compute the right hand side (value) of the assignment. We then store the value in the left hand side (variable) of the statement. The variable can be reset with new values in memory. The = is called the assignement.
If we have the statement x = 2 followed by x = x*x, the statements can be evaluated because the value of 2 is assigned to x in the first statement. In the second, the right side is evaluated first, where x is already set to 2. This means x = 2*2, and then x is reset with a new value of 4.
Changing Bindings
Can rebind variables with new assignemts. Old assignemnts are then thrown away, but computations that included old assignemnts are still stored in memory. We would need to recompute to re-assign these.
Comparison Operators
We can test our results to determine whether to keep testing
<
<=
>
=
== – this tests whether the numbers are equal
!= – This tests whether numbers are unequal
Logical Operators
We can use AND, OR, NOT operators when testing.
Branching Programs
The simplest branching statement is a conditional. Test expression and evaluate to true or false. If true, move to next code. If false, move to other code. In Python, we don’t necessarily have to have a false. We ALWAYS need a true response to a test.
Can also have nested conditionals and compound booleans. Elif statements give instructions if a different conditional is evaluated true.
Branching programs allow us to make choices to do different things. At most, each step runs once. This is also known a linear program that takes constant time.
Strings
This is a new data object type. Strings are sequences of characters (letters, spaces, digits). Strings are enclosed by either double or single quotes. Double quotes are especially handy if we want to use an apostrophy in the string. Ex: foo = “ain‘t I great” parses correctly.
Stings can be concatenated with the addition operator. Ex: hi = “hello”, name = “James”, greeting = hi + name, then when I try to access greeting I receive helloJames, the strings concatenated or added together. You can concatenate multiple things in a row.
By allowing the addition operator to perform both arithmetic addition and concatenation, we are said to be overloading the operator. The type of the object will determine how the addition operator will be used in your statement.
Sting Operations
- <string> + <string> – Concatenation
- <int>*<string> – Successive Concatenation or the string spelled out int times
- len(<string>) – Return the number of characters in the string, including spaces
- <string>[<int>] – This indexes into the string to the int character, 0 indexed. So [1] would return the second character in the string
- <string>[<int>:<int>] – This allows us to slice a portion of the string, between int and int. So [1:3] would return the 2nd character in the string and read until the 4th character, when it will return the selected characters, excluding the last one. The 2nd and 3rd characters would be selected, and not the 4th.
- Also, we can use [:3] and it reads from the beginning to the 4th character, returning them with the last one excluded. Alternatively, [1:] returns everything from the 2nd character on.
- Finally, [:] returns a copy of the string. Not the original, this is a copy.
Strings are non-scalar in that we can extract pieces of the string.
Input / Output
Often times, we will want data to be output before the program completes its execution. This can be to monitor how the data is flowing through the application or simply to make sure we are getting what we expect at each step. In Python, we use the print keyword to output data to the console. It prints values stored in memory, followed by a space.
We also have an input keyword to control text inputs. We can bind these inputs to variables, offering us many ways to display the input text.
text = input(“<string>”)
print(text)
This code results in a text prompt being offered to the user, and when the user enters text and hits enter, that text is then printed to the console.
If you want to get a number into the input, you’ll need to cast the data. You want to turn the input into an int before it is used:
text2 = int(input(“<int>”))
print(text2)
Whatever data we enter the input is treated as a string, so we then cast that string to a number, allowing us to input the integer and then print it to the console.
IDE
For simple expressions, it is simple to type commands into the shell. For more complex commands and to use iteration, we need a text file full of commands. We create this file with an IDE or integrated development environment.
IDE’s come with text editors (used to write the program), shell (environment to evaluate and execute code), and integrated debugger.
Control Flow
The simplest branching program is a conditional, in that the true response runs one code while false runs another. These branching programs make choices, but they move through the code linearly. An alternative that allows us to iterate over data would be a while loop.
While loops keep running code until a condition is met. Now we can control the flow of our loops.
While Loops
while <condition> :
<expression>
…
For Loops
for <variable> in <expression> :
<expression>
…
Range
range(start, stop, step)
range(2 , 10, 3) – This returns the numbers between 2 and 10 (not including 10) with increments of 3, so it returns 2,5,8
Break Statement
When running through code, you can use the break command to stop execution of the code to jump out of the loop to a level above.
For vs While Loops
in For loops, we know how many iterations because we use a counter that is captured inside the for loop itself. You can end them early with break statements. You can rewrite for loops as while loops by pulling variable out of loop.
in while loops, we have an unbounded number of iterations to loop through. We can still end it early with the break statement. We can also use counters, but they must be initialized outside of the loop and incremented inside the loop. Can’t always rewrite while loops as for loops.
If we know what we are iterating over, we use for loops. If we don’t know how much data we will loop over, then we use while loops.
Iteration
Iteration allows us to extend beyond the simple branching algorithm to write more complex programs. With it, we can start with a test and evaluate a condition. If that condition is true, then we can run a block of code before reevaluating the test. When the test finally returns false, when we are run a different block of code. We can go through loop multiple times before breaking out!
Iterative Code
Branching structures only let us run through code based on test once, they run on constant time. Looping code lets us repeat pieces of code until a condition is satisfied. The program now runs on time that depends on the variable values and program length.
Classes of Algorithms
Iterative algorithms offer us different classes of algorithms. We can now perform more complex operations than simple arithmetic. We are able to repeat a sequence of events based on decision logic. One type of algorithm we will examine is “guess and check” methods
Guess and Check Methods
When looping, we need a loop variable that is initialized outside loop. This variable is mutated within the loop and ultimately stops mutating when a condition is met.
To help with this condition, we can use decrementing functions. These functions map program variables to integers. When loop begins, variable value is positive and the value is decreased each time through the loop (iteration). When the final value becomes negative or 0, the the loop can terminate.
Guess and Check algorithms are slow and inefficient, therefore they are called exhaustive enumeration. As computers improve, they can handle more of these type of functions.
Approximate Solutions
Instead of exhaustive enumeration, where you try every possible value and test if that value meets your condition, you can try for an Approximate Solution Algorithm. These work great when there are many possible values and you don’t want to work through each. The difference between exhaustive enumeration is the use of a step variable which is used to move through the range of possible values. This means, instead of trying every value, you move through the population of possible values in discreet steps.
This can speed you up because you are reducing the amount of values that you test. You have to be careful, however, that your steps are small enough to prevent you from stepping over the value you are looking for. For instance, if the actual value is 2.5 but you are moving through a range of 1 – 5 in steps of 1, you will move from 2 to 3 without ever testing the value 2.5 . Your code will never find the actual value and will run infinitely.
There is a trade-off, of course, in that as steps get smaller, the time to solve the problem increases. As steps reduce, the possible values that you test are increased. So you want to choose steps that are large enough to move efficiently through the population of possible values while not allowing for your expected value to be stepped over.
Bisection Search
Another way to handle finding values within large populations would be to use Bisection search. Imagine a phone book where you want to find a particular person. You can search every page (exhaustive enumeration), but that would take forever. If we have X pages, it would take at most X times turning the page to find your desired person.
We could look at every ten pages (Approximate Solution), which would make our search ten times faster. If we have X pages, we would turn the page at most x/10 times.
Or we could open the phone book in half and determine whether the name we are looking for is before or after the page we are currently examining (Bisection Search). If we open the phone book to the letter “L” but our person’s last name begins with “F”, we know we need to look in the pages earlier than our current page (this, of course, requires that the data be in some sort of order).
We can then remove all of the following pages from our population of possible values because we know for sure that our person is located in the previous pages. In effect, we can through out all possible values after the letter “L”, dramatically reducing the population of possible values.
If we keep performing this operation, we would divide the first half of the phone book up and choose a place in the middle. Say we land the the letter “E”. Because we know that our expected value follows the letter “E”, we can throw out all possible values before “E” and we are left with a section of the phone book from “E” to “L”. In only two steps, we dramatically reduce the number of possible solutions to a much more manageable decision set! And we can keep going with the Bisection Search! If we have x pages, it will take log2(X), which is a substantial improvement!
Floating Point Numbers
Floats approximate real numbers, but it is useful to understand how they are stored. Machines can’t store infinite numbers of decimal points. The decimal numbers we use numbers in the placeholders to determine the magnitude of the values. Take 302, for example. That 3 in the hundreths, 0 in the tens and 2 in the ones, added up is 302. Binary uses different number placeholders with different magnitudes.
Binary, rather than using powers of 10, uses the power of 2. Counting from the furthest left bit, we have a 0 or 1. The next bit is a power of 2, so a 1 in this bit counts as 2. The third bit from the left is in power of 4, so a 1 in this bit counts as 4. 4th bit from left is power of 8, so any one in this bit is counted as 8. This can be continued to infinity, but it often cut off at 8 or 16 bits. That means, in an 8 bit binary number, the maximum value is 255 or 11111111. Any further increment and we can no longer hold the value, resulting in an error.
Computers represent all numbers in binary. If we wanted to convert decimal number to binary, or vice versa, there are some techniques. To go from binary to decimal, supposing we have the binary number 10011 (19), we can divide the binary by 2, which will give us the 1 or 0 in the final bit (or x%2, in python this is x modulus 2, would divide the number by 2 and return the remainder). We could then use floor division / integer division (x//2) to divide by 2 and toss out the remainder, returning the binary number 1001. You’ll notice that this is the first version of the binary number, minus the final bit.
It is possible that there is no direct binary representation of a floating point (fraction) number, so the computer provides the closest approximation. When testing the equality of floats, be careful. You might be checking if 1/2 == 2/4, which it is, but the binary representation might not be equal (if there is no direct binary and the computer instead uses the closest approximation), resulting in a fail when you expect a success. Better to use the absolute value of the different between two floating points. If the difference is a very small number, than the floating points are close enough.
Decomposition and Abstraction
Good programming is more than adding code. It’s about the amount of functionality the programmer was able to introduce in the form of functions. Functions allow you to encapsulate pieces of computation. They offer you decomposition (modularity) and abstraction (don’t have to necessarily understand how code works in order to feed it inputs and extract an output).
Decomposition allows you to break the problem into different, self-contained pieces. Abstraction allows you to suppress the implementation details of methods while still using the methods to perform your computation.
To Create Structure with Decomposition
When programming, divide code into modules, which have the following properties:
- self-contained
- reusable
- organized
- coherent
You Suppress Details with Abstraction
Think of 3rd party code (or even yours) as black box in which you can’t see the details, and you don’t even necessarily want to see the details.
Functions
Reusable pieces of code are called functions. They aren’t run in the program until they are called (invoked) by the program.
Some characteristics of functions, include:
- Name
- Paramaters (0 or more) – values to be used inside the function for computation
- Docstring – Documentation that tells what function does
- Body – Sequence of commands (recipies) that we want to execute when function runs
In python, functions look like:
def function_name (i): – where def is a Python keyword used for to define an object or function or class and i is the parameter (argument) for the function.
#This is a comment, we will use these to define that input datatype and describe what is returned from the function. This is the docstring. Docstring represents a contract, it says if you give me this, I’ll give you this, every time.
print(“something”) – this is the start of the body of the function. Steps of the recipe.
return Something – final portion of the body. Return is a keyword that Python uses to know what data to return from the function to the invoking code.
To use functions, you’ll use the function_name followed by the data you want to pass into the function. This is known as invoking the function.
When called, each formal parameter (argument) gets bound to the value of the actual parameter (values of the expression passed into the function when invoked). Called function creates a new environment (also called a frame or a scope), which is a mutual place to bind variables relative to the body of the function. In essence, function scope is a mapping of names to objects.
This is important. The function creates its own scope, which can bind and return variables completely separate to the more global scope that encapsulates the function. When we return values, those values are passed from the function scope to the global scope and bound to variables in that scope.
If you forget to place a return statement in your functions, Python will automatically return the value None. This is an actual value that represents the absence of a value. Unless function is performing side-effects, you always want atleast 1 return that gives a value back.
Return Vs Print
Return only has meaning inside functions while print can be used anywhere. You can have more than one return (perhaps you have different conditions that return different responces), but only one will ever be executed. Once executed, the return will jump you out of the function and return a value. Code inside functions that come after a return statement will never be executed as we are already jumped out of function as soon as we reach the return statement. The returns will have values that are then handed to the piece of code that invoked the function.
Functions as Arguments
The arguments that are accepted by the function can take any data types, including functions. This gives us more complex scope arrangements. Inside of a function, we can access a variable defined both outside and inside the function. However, we cannot modify a variable that is defined outside the function, only inside.
Default Arguments
When creating functions, we can set default values for our arguments. For instance, if we have def printName(firstName = ‘James’, lastName): as our function, notice that we have set the first name argument to James. This means, we can call the function and only pass in a single value, which would be bound to lastName. We can also call the function and pass in two values, with the first overwriting the firstName variable.
Recursion
Recursion occurs when a function calls itself. We don’t want infinite recursion, so we must have 1 or more base cases that can be solved. The idea is to break the problem into simpler pieces, solve them, then use the information to solve the larger problem.
Recursion is separate from iteration, which we have been studying. Looping constructs are used in iterative algorithms, with state variables being set and updated on each iteration through the loop.
In recursion, we want to think about how we can reduce the complexity of the problem to a simplier or smaller version of the same problem. We need to set a base case that is a direct solution for simple problem. In the recursive step, you want simple operators and to call the function again in the return, jumping you back to the top of the loop. Break into smaller version of problem, then perform calculations that you are comfortable with, but you need to track when you can break out of recursion loop (base case)
For example, if creating a function to find the factorial of a number, the first step is to find a base case. What factorial do we know off hand? We know that the factorial of 1 is 1, so n == 1: return 1 is our base case.
It is important to note that each recursive call the function creates its own scope. This can be difficult without seeing an example. Each time our function hits the return where the function itself is invoked, it creates a nested scope within memory that holds the value of variables. Between scopes, the binding of variables is completely separate, so the variable is not reset each time the function is called. Instead, a new version of the variable is set in the new, nested scope that is created. The variables use the same names, but they are different objects in separate scopes.
Most programmers see recursion as being easier and more efficient. It is often more intuitive
Inductive Reasoning
When we write recursive code, we want to have an idea of when the code will break from it’s execution and return a value. With iteration, this is easy because we have a looping variable that keeps track of our times through the loop and returns when our variable reaches a certain count. I’m not great with math, so I’ll just wave my hand at this and hopefully come back to it at some later date.
Modules and Files
We don’t want to store all of our code in one place, it is better to place functionality in separate modules (files). We then use the import keyword to invoke the module into the currently running shell. We are able to access data in modules using dot syntax. So a pi variable stored in a circle.py file could be accessed with circle.pi . We can call functions with dot syntax as well. For example, circle.area(3) would run an area function in the circle module and pass it in 3 as its radius.
We don’t have to use the dot syntax. If there are no conflicts between variable or function names between modules, we can simply from circle import *, which imports everything from the module to the shell. We can then use the variables from the module as if they were set in the current shell, instead of circle.pi, you just use pi. As there are no collisions, our bindings are set directly from the module. This also creates bindings in the current scope, even though our code is written in the module scope. As a consequence, statements in the module are only executed the first time a module is imported. I think it means that if you want to run a function in a module more than once, you must import the module more than once in our code, thus resetting it after each use.
Files are the objects we save for later use. Since every operating system has a different way of handling files, Python uses its own operating-system independent method, called a file handle. It looks like this:
nameHandle = open(‘kids’, ‘w’)
We use the special command open and pass it the name of the file. Open will open up a file handle (an access into a file), it will name the file ‘kids’, and the file will then be accessed so that we may write into it, as denoted by the ‘w’. This is all bound to a name, nameHandle, which allows you to refer to the file in your code.
To write into files, we use the nameHandle.write() method, which takes a variable for entering data into the file. When done, use the .close() method on your file handle. Be sure to note that following the function with () parenthesis actually invokes the method at that runtime.
We can also read from files with similar notation: nameHandle = open(‘kids’, ‘r’). This allows us to read data from the file into the shell.
Tuples
Tuples are ordered sequence of elements, which can include a mix of different elements inside it. They are not ordered as in largest to smallest, the sequence is ordered so that you can access data by indexing into the tuple. Tuples are immutable, you CANNOT change values inside of them.
In Python, empty tuples can be created with open and closed parenthesis, ie = () . Parenthesis designate that it is a tuple. You can also put elements inside a tuple, like t = (2, “one”, 3) is stored as a tuple with the different data types inside of it.
When accessing data in a tuple, you use square brackets [] with the index number [0] to reference the data. So, for our last example t[1] would return “one”.
You can concatenate tuples, so t + (5, 6) would then create a tuple t with the data (2, “one”, 3, 5, 6). You can slice tuples, but we can’t change the string. t[1:2] returns the element at the 1 spot, which is “one”, then stops returning anything before the element indexed at the 2 spot. Notice the empty comma in the return above, it is the notation to tell you the data is in a tuple, but it only has the one element. This distinguishes it from a string.
Tuples are conveniently used to swap variables. (x, y) = (y, x) is all it takes! Super elegant! They are also great for returns because they allow you to return multiple pieces of data. Remember that we can only write one valid return statement for a block of code. Tuples allow us to return more data than we would otherwise be able to.
We are able to iterate over tuples.
Lists
Lists are the next extension of tuples. They are:
- Ordered Sequences of information
- Data in the list can be accessed through indexing, ie list[0]
- lists are denoted by square brackets []
- lists contain elements, list = [1, ‘two’, False]
- elements are usually homogeneous in that they are all integers
- elements can be of mixed type, although this is not common
- Elements in the list can be changed, making the list mutable
- this is in contrast to tuples and string, which can not have their contents changed
The mutability of lists is really powerful, but offers a lot of headaches. You can create empty lists with list = []. Lists are 0 indexed, so the first item in the list is in the 0 place of the index.
You can get the length of a list by passing it into the len() function, so len(list) returns the count of items in the list. You can perform computations on a list, so list[0]+2 would equal 3, the addition of 2 and the number in the list at index 0, which is 1 from above.
Interestingly, lists can be both variables and expressions, but the expression must evaluate to an integer. So, if we set i = 2, we can set list[i-2] which will evaluate to 1. What happens is i – 2 evaluates to 0, so we have list[0], which as stated above, evaluates to 1.
Lists are mutable. We have to be sure to know the implications of any change. When we set list[1] = 2, we are a physically changing the binding of the element at the 1 index from ‘two’ to 2.
We can also easily iterate over the elements in a list. When writing for loops, we can simply write for i in list, and Python knows to access the list and run the code on each element that is contained within.
List Operations
Lists and tuples come with standard operations. We can add items to the end of the list with the append method. List.append(5) appends the number 5 to the end of the list. This has mutated the list, changing the number of items in the list. The . in .append is used to access elements inside the list object. In Python, everything is an object, and objects have properties in the form of data and methods (functions). We access this information with dot syntax.
- some_Object.do_Something()
We can concatenate lists, List3 = List1 + List2 cmbines the 2 lists sequencially, so the first item in List3 is the first item in List1. List1 and List2 do not change, List3 is a copy of both List1 and List2 combined.
We can extend lists and add multiple pieces of data to the list. The difference between .extend and .append is .append places objects at the end of the list, while .extend places pieces of data. This is best understood as:
- .append – List.append([e, f]) will append 1 object to the list, containing the two pieces of data. Therefore, List now has 1 item inside of it
- .extend – List.extend([e, f]) will extend the list by 2 pieces of data, so the list now has 2 items in it, not the one object. Extension mutates the list.
We can delete elements from lists. del(List(index)) will remove the indexed item from the list. Remember that lists are 0 indexed.
We can remove elements from the end of the list with the .pop() command. This mutates the list and returns the removed elements.
We can remove specific elements with .remove(element) . This looks through the list and removes the first instance of them element that you are referencing. If the .remove() method can’t find the element, than the function returns an error. This function mutates the list.
We can convert strings to lists, and vice versa. If we have s = ‘string’, we can cast List = list(s) to return a list with all the characters as separate entries. So List = [‘s’, ‘t’, ‘r’, ‘i’, ‘n’, ‘g’].
We can split strings to create a list that has each part of the string in a different index. For above, if we s.split(‘i’) (on the letter i), our List becomes [‘st’, ‘ng’]. Important to note that if we call .split() with nothing in parenthesis, the default is to split on the spaces
We can also take lists and turn them to strings. With above, if we ”.join(List), we get “string”. Note that we had empty single quotes before the .join. If we place anything between those single quotes, like ‘_’.join, we get ‘_’.join(List), which returns “s_t_r_i_n_g”.
Sorting Lists
We can sort lists. There are two versions of list sorting, one destructive and one which preserves the original list. Take List = [5, 4, 3, 2, 1], we can run sorted(List) which returns [1, 2, 3, 4, 5]. Note that we are seeing the list sorted, but the original List still resides in memory, unchanged.
We also have the .sort() method, which when run on List.sort() returns List = [1, 2, 3, 4, 5]. Note that the .sort() method mutates the original list, so in memory our List variable is mapped to [1, 2, 3, 4, 5], not the original data passed into the method. Along with .sort(), we can perform .reverse(), which is really the same thing, but reverses the order. Note that this also mutates the original list.
Mutation, Aliasing, and Cloning
The mutability of lists has the potential to cause problems. Lists are simply objects in memory and we use variables to point to these objects. You can have multiple variables pointing to the same list, changing each variable to a clone of another one. They are the same instance with different names, so they are also known as aliases. Because these variables point to the same lists, any mutation in the list is reflected in any variables pointing to it. This has the potential for interesting side effects.
Cloning
You are able to clone lists with list2 = list1[:] . This creates another separate list with the exact same contents as the original. You can then mutate the original (or copied) list without affecting the other.
Nesting
You can have lists inside of lists, inside of lists. These are very dangerous for side effects. Watch when changing values and variables.
Things to Watch For
If iterating over a list, must avoid mutating the list as you iterate over it. If you need to change a list as you iterate over it, clone the list first using [:] notation then iterate over the copied list.
Functions as Objects
Functions are first class objects:
- they have types
- they can be used as elements of data structures
- they can appear in expressions
- They can be used as part of assignment statements
- They can be used as an argument into another function
Of particular use, we can use functions as arguments with lists, a paradigm known as higher order programming. We use it to apply the function to each item in a list, mutating the elements in the list per the specifications of the function.
Alternatively, we can apply a list of functions to an argument. The list of functions is the first argument ([abs, int], ).
To help with this, Python provides us with generalized higher order procedures (HOPS). One such HOP is map(). In its simplest form, map() takes a unary function (unary means that the function expects only 1 argument) that and creates a new list where it applies the function to a collection of suitable arguments.
- map(abs, [1, 2, -3]) returns an iterable, which you need to then “walk down”. It is a data structure that acts like a list, but must be iterated over to be used. If we want to show it, we can just print() the return. If we want to use it in a list, we just iterate over it and place the items into another list.
Map() is a great way to think of applying an operation to a collection of data.
We can also use map() to apply n-ary functions (functions with multiple arguments). If we have 2 lists, we could compare the lists and return a data structure that represents those comparisons.
Common Operations for Strings, Tuples, Ranges, and Lists
- You can select an element of the sequence = sequence[i]
- You can find the length of the sequence = len(sequence)
- You can concatenate sequences (not ranges) = sequence1 + sequence2
- You can slice into sequences = sequence[start:end]
- You can check if an item is a member of a sequence = e in sequence -> evals to True is e is in sequence
- You can check if an item is NOT in a sequence = e not in sequence -> evals to True if e is NOT in sequence
- You can iterate over the members strings, tuples, ranges, and lists = for e in sequence
How to think about strings, tuples, ranges and lists
- Strings – Concatenations of characters
- Tuples – Concatenations of Objects
- Ranges – Sequences of integers
- List – Sequences of objects that are MUTABLE
Dictionaries
Dictionaries are a data structure that represent a collection of lists. Dictionaries allow us to index to items directly, and not rely on their index in the list as we would with a collection of linked lists. Unlike a list, which is a collection of data separated into indexes that you can iterate through, dictionaries are collections of data where the index itself is an actual key, and the item at that index is the value. Dictionaries are collection of key-value pairs with a custom index that we can label.
Dictionaries are instantated with empty {}. You then create key-value pairs and assign that to the dictionary:
grades = { ‘James’:’A’, ‘Iesha’:’A+’, ‘Vegas’:’F’ }
You can then lookup values based on the key:
grades[James] -> evaluates to ‘A’
Dictionary Operations
- Add an entry -> grades[‘Tiana’] = ‘B’ – Dictionaries are MUTABLE –
- You can ask if an item exists -> ‘James’ in grades -> evals to True if string James is a key in the grades dictionary
- You can delete entries -> del(grades[‘Tiana’]) -> results in deleting the Tiana entry from the list
- You can return the set of keys -> grades.keys() -> returns all key labels from the dictionary as an iterable object that can be walked down
- You can return the set of values -> grades.values() -> returns all values from the dictionary as an iterable object that can be walked down
Dictionary Keys and Values
- Dictionaries can hold values of any types, both mutable and immutable
- Values can be duplicates
- Values can be lists, even other dictionaries! ANYTHING
Keys have more constraints
- Must be unique
- Must be an immutable type (int, float, string, tuple, or bool)
- Be careful with float types as keys because the float can have varying levels of precision
There is no order to keys or values
Dictonaries are great for storing values as you iterate. Then, you don’t have to recompute each variable on every iteration, you can save variable values to a dictionary and lookup the values should you need them in subsequent iterations. This process is called Memoization, you are basically sending a memo to yourself with the previous values. This process is much more efficient.
Global Variables
Global variables, also called fields, are dangerous and most computer scientists try not to use them. They are dangerous because global variables break the scope created by functions. Globals can be accessed outside of scope, which allows for side effects. They can be deployed to handle changing variables inside of a function.
Testing and Debugging
Ideally, our code will do exactly what we want it to do. In reality, it doesn’t usually work this way.
Defensive Programming
To program defensively:
- Write specifications and comments for functions and code statements
- Modularize programs so a piece of code only performs one action
- Check the conditions on inputs and outputs
In short, define what you expect to receive, perform your action, then define what you expect to output and make sure that you code does exactly that.
Testing / Validation
In this step, compare your expected inputs and outputs to what you are actually receiving and outputting. Is it working? If not, you need to debug.
Another great questions for testing and validation is “How Can I Break My Program?” Test different use cases to ensure that program performs under different conditions.
Debugging
When debugging, we study all of the events that lead to an error and try to figure out why we aren’t outputting what we expect. It’s important to know what the state of the application was before the error occurred. Once we know why our program isn’t working, then we must create a way to fix the program so that we receive what we expect.
Classes of Tests
You want to set yourself up so that testing and debugging is a seamless part of your workflow. Use these general guidelines:
- Design code that supports testing and debugging by using modules that can be tested and debugged individually
- Make sure to use docstrings and document constraints
- Comment the assumptions you made within each module so you can validate your assumptions
So, when are you ready to test. The first thing you do is make sure you have no syntax errors or static semantic errors. Once code runs, generate possible inputs and what you expect the output to be. Are you correct? With a running program, we can use these 3 types of tests:
Unit Testing
Take each module and test it individually. Test each function separately.
Regression Testing
Once you’ve tested your module and fixed it, your fix may have inadvertently created new bugs. Regression testing is used to catch this situation. After each fix, test the same unit again to make sure that it is actually performing the way you expect.
Integration Testing
With each module working properly, now west test the overall program. Does it work? Do they hand off information properly to eachother. This is the step that throws many people. Often times, they will skip straight to this step and discover a bug, but not know where the bug is in the code. Now they have to sift through all the modules, when they could have done it correctly the first time and saved themselves the trouble. Make sure to do unit and regression testing before.
How to Test
You want to use your intuition about natural boundaries to your problem. If we don’t see natural boundaries, we can simply perform random testing. The probablility of our code being correct increases with each passed random test. We can also do Black Box Testing or Glass Box Testing.
Black Box testing explores all of the possible paths through the document specifications. In fact, we never look at the code, only the spec. Often times, it is better for a 3rd party to perform this testing to avoid any biases you might introduce. It is important to think of boundary conditions that you should test.
For example, if your module deals with list, you’ll want to test it with an empty list, a list with 1 item, and a list with a lot of items. Also try extremes: What happens if you use a really large number in the function? What about a really small number? These are all example of boundaries.
Glass Box testing looks inside the black box, at the code itself. We want to try different paths through our code to test out the various functions. Our tests are path-complete if every potential path is tested at once. Drawbacks include the possiblility of missing paths.
Guidelines for Glass Box Testing:
- Exercise all the parts of a conditional and move through all different possible branches.
- If you are testing a for loop, create a condition where you don’t actually go into the loop. Try executing the loop body only once. What happens? Execute more than once. What happens?
- If testing while loop, look for cases that catch all the different ways to exit the loop.
Bugs
Once you’ve identified a bug in your program, you want to:
- Isolate the bug
- eradicate the bug(s)
- retest your code to ensure that it runs properly
All to often, we don’t do the retesting, so be sure to actually perform this step.
Overt vs. Covert Bugs
Some bugs are obvious, the code errors out or falls into an infinite loop. These are overt bugs. Some are less obvious; your code can return a value that is incorrect, but it might be difficult to tell and even harder to tell why. The worst bugs appear to be valid but are in fact not accurate return values.
Bugs can be persistent or intermitant. Persistant ones are easier because you can expect them. Intermittant is more difficult because the can have dependencies.
Overt and Persistant bugs are obvious and easy to detect. Good programmers use defensive programming to ensure that if an error is made, it falls into this category. Overt and intermittant bugs are more difficult, but if you can reproduce the error, you can probably find a solution. Covert bugs are the real threat. They can be difficult to detect and can often have bad side-effects before they are ever treated.
Exceptions and Assertions
Exceptions occur when during the process of your code running, something unexpected happens. There are common expections:
- IndexError – Try to access data in structure beyond that structures length.
- TypeError – Trying to convert data to an inappropriate type, or an invalid calculation between incompatible types
- NameError – Referencing non-existent variable
- SyntaxError – Python can’t parse program
- AttributeError – Referrencing non-existent attribute
- ValueError – Expression is correct, but value doesn’t make sense
- IOError – Typically happens when attempting to access non-existent file
When we run into errors, we can do one of the following:
- Fail Silently – Bad idea because user doesn’t know something went wrong.
- Return error value – This makes your code more complicated
- Stop Execution – Python raises an exception and describes what occurred.
Dealing With Exception
Python can actually handle exceptions as they occur. Using the try keyword, we tell Python to try a piece of code. If try throws an exception, then we move to the except block and execute those statements. Python then starts back and runs the code after the try clause.
We can extend this functionality. Say we know to expect certain types of errors, like ValueError or a ZeroDivisionError. If so, we can write except clauses that target these error codes, making the program more robust in it exception handling.
The else: clause is used to specify code that you want running if the try body completes with no exceptions. I don’t really understand how this is useful.
The finally: clause is always executed, after all the except clauses, even if they cause the code to break. This is useful for clean-up code that you need to finish an operation, like the closing of a file that was opened for writing.
Important Concept – You can use your exceptions as a control flow for your code. For example, if you receive a ValueError on a type conversion, you could just let the user know of the error, or you can use the except ValueError clause and code it to ask for another value, or substitute another value, or whatever you want. Be sure to use the try clause as a sort of if-else to handle exceptions that you can expect.
Exceptions as Control Flow
We have been handling exceptions when the compiler raises them, but we have the ability to raise our own exceptions when we are unable to produce the consistent result that we expect. This uses the raise keyword.
raise ErrorType (“Message that you want to print out”)
Assertions
Assertions are assumptions the programmer has made on the state of the code a various moments in the computation process. In a functions docstring, we will describe our garuantee as relying on an assumption of the data that is passed into the function. However, this was simply directions and wasn’t enforced in any way.
To enforce our assumptions, we can use assertions. This isn’t just for inputs, even though that is how they are usually used. We use assert statements to describe what we expect, and AssertionError exceptions are raised when our assumptions are not met. The code exits immediately if the assertion is not met.
assert [expression], ‘if the expression resolves to false, then this message in the quotes is displayed‘
When to use assertions:
- Check types of arguments or values
- Check data structures
- Check constraints on return values
In a nutshell, use exceptions to control how you process through your code and use assertions to validate data passed into your program.
Object Oriented Programming
Python supports a number of different types of data:
- int
- string
- char
- tuple
- list
- array
Every object has three pieces:
- type
- internal data representation – objects are collections of properties and a way for the computer to keep them all together.
- set of procedures for interacting with instance of object
Every data type we use is an instance of an object. For example, a = ‘hello’ instantializes a string object, sets its value to hello, then binds it to the variable “a”.
Everything in Python is an object and has a type, but what are objects. Think of objects as data abstractions. They capture the internal representation through data attributes and they offer a way of interacting with the object and the data attributes through methods. These methods define the behaviors of the object, but they hide the actual implementation details.
We have the ability to create new instances of objects, and we can destroy objects by using the del keyword or simply letting the garbage collection script destroy them when they are no longer used.
Creating a Class
Creating a class is different than using an instance of a class. Creating the class is making your own data abstactions and the methods that can interact with that data. Using a class is simply asking the computer to allocate the memory for those data abstractions, then using predefined methods to perform operations on the instance of the object.
Creating a class involves:
- defining class name
- defining class attributes
Methods
Methods are functions that only with with this class. Within methods, for Python, we always use self as the first argument. You can then use the . operator to access any attribute within the class (both data attributes (variables) and procedural attributes (methods)).
A class call with a method looks like this:
class ClassName(object):
def __init__(self, arg1, arg2):
self.x = arg1
self.y = arg2
def MethodName(self, arg3):
return self.x + self.y + arg3
Notice that we begin the class with the class keyword followed by the class name and the object argument. Every class we build in Python will inherit from the object class, so we set it as the superclass. Should we create child classes, those classes would reference this current class as the superclass.
Also notice the __init__ keyword. It is very special. Note that there are two underscores before and after init in the constructor. Then, the very first argument is self. This will always be the pattern. For any class initialization, the first argument is the class instance itself, denoted by the self keyword.
We can then assign the argument to variables using self.VariableName to access the variables inside the class.
Towards the end, we define a method and the very first argument it takes is the self keyword. This refers to the particular instance of the class that you are working with. Once again, this is a pattern you will always use in Python. We then return a calculation from the method, with the value going to the spot in the code where the method is called.
When calling methods, you first designate the object to operate on, then the method to call, then pass in any data that the method might require:
VarName.MethodName(Parameters, not including self)
String Method for a Class
This is similiar to the get and set methods from C#. __str__ is the Python equivalent of get. Whenever you ask to print an instance of a data object, Python uses the __str__ method that is defined in the class. You can define the string function however you like, but the return value must be a string (use casting to get variables to strings).
We can use the isinstance() function to check if an object is an instance of a class:
print(isinstance(VarName, ClassName)) -> True / False
We have the ability to overload our object constructors with additional code the handle the situation of add, subtracting, checking equality, checking length, etc. between classes. For example, if you want to override the default befavior of adding two instances of a class together, you use the __add__(self, args) statement. To subtract, use the __sub__(self, args) statement. There are a bunch more!
When writing classes, don’t forget about your accessors. Your get and set methods used to manipulate the attributes of your objects. Make sure you have get methods that return every data attribute in your class, like def getAttribute1: return attribute1.
Objects are bundled together and share common attributes and share procedures to operate on those data attributes. This is an example of abstraction, because you don’t need to know how the object works in order to use the object in your programs.
Piggybacking on this idea of abstraction is how each instance of a class is abstracted from another. Once properly assembled, you are able to use a class as you would any other primitive in the language.
Creating vs Using a Class
When implementing a new object type with a class, you want to define the class (header) and define the data attributes (variables) and methods to interact with the object. Your class is the object type. Make sure to define your classes generically by using self keyword to refer to particular instance. You class defines the data attributes and the methods that are common to ALL instances of the class.
When using a class, you are only creating unique instances of that class and using the supplied methods to operate on the data inside of your instances. Your instance is one particular usage of the class to create a binding inside memory. You can have many different instances with different data, but they are all related by their structure.
Pro Tip – When calling for data from outside the class, always use the get and set methods rather than referring to the data attribute as opposed to the dot syntax. It’s better for abstraction. We want to separate internal representation of data from access to that data. Python doesn’t provide any restriction from directly accessing the data inside classes, so it is a best practice that you watch yourself and don’t allow it to happen.
Hierarchies
Inheritance is a critical component of classes that makes them so powerful. We can create subclasses of classes that add different behaviors. Flows in tree structure ->
- Object
- Parent Class (superclass)
- Child Class (subclass)
- … (can have more subclasses)
The children inherit all data and behaviors from their parents, but also add their own data attributes and behaviors that can override the superclass.
One thing to note is that children classes don’t require their own constructor, they can simply inherit the constructor (__int__) from the parent class.
Class Variables
Instance variables are the data attributes that are stored inside each instance. Class variables instead belong to the class. They are defined inside class definition, but outside __init__. It belongs to the class and is shared among ALL instances of the class.
For example, if we wanted to create a unique ID for each instance of the class, so we could differentiate them, we could write the follow:
class ClassName(ParentClass):
UniqueIDVariableName = 1
def __init__(self):
self.UniqueID = ClassName.UniqueIDVariableName
ClassName.UniqueIDVariableName += 1
Notice that we set the unique ID variable ouside the constructor. Then, inside the constructor, we set the unique ID to an instance binding (self.UniqueID) and then increment the unique ID variable. Each time an instance of the class is created, it has an incrementing self.UniqueID .
Generators
Any procedure or method that includes a yield statement is called a generator. Generators a useful in separating out long computations. This means you can generate, as needed, new objects to be used in another computation. Rather then generate an entire list and stepping through values, you can create loop to generate values then step through loop sequentially, stopping when you get what you are looking for.
This idea is used in range. When you say for something in range(n), range gives you the first value then promises the next ones. Once you need the next value, you can then access it easily. This is more efficient without actually changing your computational model, letting you write complex code that can be used easily.
Program Efficiency
Some algorithms are better than others. Due to large data sets, inefficient programs will not scale well for processing large data sets. But how do you know which one is better?
There is often a trade-off between how fast a program can run and how large that program is. By pre-computing results and storing them in variables, you can dramatically increase the speed of your program. However, the amount of disc space used will also grow.
The most appropriate way to assess the efficiency of your algorithm is to use order of growth. In OOG, we want to evaluate programs efficiency with very large inputs and look at the worst case scenario when running the program. We also want to express the growth in the program’s run-time should the number of inputs grow. Because we might not be able to find the exact equation, we want to know the upper bound. If our inputs grows this much, what would be the maximum change in algorithmic efficiency?
What we will see is that there are different types of orders of growth:
- Constant – Algorithmic efficiency is the same regardless of inputs
- Linear – Efficiency is reduced as inputs increases, moves linearly
- Quadratic – Efficiency drops dramatically (n2) should inputs increase
- Logarithmic – Opposite of quadratic, shows efficiency as a log of the inputs, so at some point more inputs doesn’t affect efficiency very much
- n log n – Between quadratic and linear
- Exponential – Worst of all, should inputs increase, the program efficiency is reduced drastically, worse than quadratic.
We want to be logarithmic, but that’s difficult, so best to be linear or n log n.
Big Oh Notation
Comes from the greek symbal Omicron, this measures an upper bound on the asymtotic growth (very large growth in data / sample size). Rather than see every problem as exponential (as data grows, so does time to calculate), we want to be able to differentiate between algorithms to determine their growth type.
Big Oh Notation describes the worst case scenario, which interestingly enough occurs quite often and represents a bottleneck in your system. Another goal of ours is to express the rate of growth of the program relative to the growth of the input size. So, as inputs grow, this is how the efficiency of the program will change.
When evaluating Order of Growth, you are going to count the steps in your algorithm. For every discrete step, you add 1 constant. For every looping step, you add n(number) times the amount of steps in the loop. So, if the loop has 5 steps, then the Order of Growth is 5n.
Once you’ve counted the steps, you look at the worst case scenario. In the worst case, you can ignore the constant steps. So, remove every discrete step from the equation. You can then ignore the loop constants, because their share of the computational cost is low at large data sizes. So, if the order of grown is 5n, at worst case it is O(n), which means on the order of n. In other words, this is a linear algorithm.
With Orders of Growth, we want to focus on the dominant term. For example:
- n(squared) + 3n + 4 – We remove the constant 4 and the multiplier 3, leaving us with n(squared), so O(n-squared) is our Order of Growth
- n(squared) + 10000n + 400000000 – Even with very large constants and multipliers, we still evaluate to Order of N Squared
- log(n) + n + 4 – Log(n) will grow slowly, so n is the dominant term, so O(n)
- 0.001 * n * log(n) + 30000n – n * log(n) is the dominant term, so O(n log n)
- 2n + 3(to the n) – 3 to the N will be the dominant number, so Order of 3 To The N
Given an algortihm, we will count the steps as a function of input size, then focus on the dominant term.
Lets look at the types again, along with their Big Oh Notation:
Constant | O(1) |
Logarithmic | O(log n) |
Linear | O(n) |
Loglinear | O(n log n) |
Polynomial | O(nc) |
Exponential | O(cn) |
Ideally, we want our algorithms to be as close to the top of the table as possible. Constant time is our goal, but likely not possible. The higher in the table, the more efficient the algorithm.
Law of Addition (Sequential Statements)
In a complex algorithm with multiple sequences of statements, we can figure the complexity of one piece, the complexity of the next piece of code, then add the complexities together. So, if first for loop is O(n) and second for loop is O(n-squared), the complexity for the entire program is O(n-squared) because it is the dominant term.
Law of Multiplication (Nested Statements)
If we have nested statements or loops in our algorithm, we use multiplication. If you have a loop inside a loop, it might be O(n) for the first loop and O(n) for the second loop. Now, you multiple the orders together and you have O(n-squared).
Complexity Classes
As you get more familiar with more algorithms, you’ll notice that certain design patterns will lend themselves to certain complexity classes.
Constant Complexity
This is the case when complexity is not related to the size of input. Unfortunately, there aren’t many interesting examples of these algorithms in use today. While the algorithms in this class aren’t of interest, there are certain pieces that can be used to great effect.
Logarithmic Complexity
These are efficient algorithms whose complexity grows as a log of the size of inputs. Examples include binary search, really algorithm that divides a large sample in half, then works on that half. Characteristic is that you reduce the size of the population by a constant value each recursion.
Linear Complexity
Fairly straight-forward, this would be stepping through each item in a list to see if an element is present. You have to walk down the iterable. Another example is anything with a loop that has a constant number of steps inside the loop. Interesting note is that iterative and recursive forms of linear algorithms still have the same order of growth, even though they are programmed differently.
Log-Linear Complexity
Another nice, efficient class. It’s used in merge sort.
Polynomial Complexity (Quadratic Complexity)
Complexity grows with the square of the size of the input. This occurs when we have nested loops and recursive function calls.
Exponential Complexity
Most expensive algorithm, sometimes we will be stuck with these. Many inherent problems are simply exponential and there is no other way to solve these problems. Common characteristics are recursive functions that call themselves more than once.
Search Algorithms
We use search algorithms to find a specific item or group of items within a collection of items. There are many different types of search algorithms:
- Linear search – this is a brute force method that looks at each item and compares it to the search parameters. Upside is the collection doesn’t have to be sorted. This is an O(n) algorithm with linear complexity.
- Bisection Search – Cuts the problem in half, then half again. Much, much more efficient, but the collection has to be sorted in order to work. This is, at best, an O(log n) algorithm with logarithmic complexity.
- To perform this, choose an element in the middle of the collection. If the element you are looking for is larger in magnitude than the element you have chosen, then throw out the first half of your collection and repeat the process on the second half. If the element is smaller, then throw out the later half of the collection and repeat the process on the first half.
- This is a great example of a divide and conquer algorithm, you are breaking a larger problem into smaller pieces, with the same computations occurring on both the large collection and the subsets.
- One important caveat is that because list must be sorted for bisection, is it more efficient to sort then bisection search, or simply linear search? This depends on how you use the list. If you only want to search a list once, it is more efficient to simply use linear search. However, if you want to have a list them perform multiple searches on it, you amortize the cost of the sort across multiple searches and the sorting time becomes less and less relevant. It is usually better to sort and then search.
Sort Algorithms
- BOGO sort (monkey sort) – You randomly assign elements to indicies in a collection, then walk through the collection to check if elements are sorted. If not, you randomly assign indicies again and check again if the elements are sorted. Do this until they are sorted.
- This only works for very small collections of data
- This algorithm is called unbounded because there is no guarantee that we will ever finish – O(?)
- Bubble Sort – This sort compares elements in pairs from the beginning of the list. If the first element is larger then the second element, then the element indicies are exchanged and the smaller element is put before the first element. You then perform the same operation on the 2nd and 3rd elements, then 3rd and 4th, and so on until you reach the end of the collection.
- Each time you compare the elements and move the largest one forward in the index, you are bubbling the largest value to the end.
- If the very first number in your collection is the highest number within the set, then that number gets “moved” to the end of the collection as you work through the comparisons. This process is known as “bubbling”.
- This passes the largest value to the end of the list, but then you have to start the process again to bubble the next largest value, and so on.
- Because you have to do multiple passes of comparisons and swaps, the complexity is length of the list times the length of the list, or O(n2)
- Selection Sort – This sorting algorithm finds the smallest element, then swaps that element with the first index. It then searches the sublist (list without first element) for the smallest element, and swaps that with the second element. This keeps going until all sublists are sorted, resulting in a sorted collection.
- Selection sort keeps the left portion of the list sorted.
- Selection sort kind of works like bubble sort in reverse
- Just like bubble sort, because you have to do multiple passes of comparisons and swaps, the complexity is length of the list times the length of the list, or O(n2)
- Merge Sort – This is our divide and conquer sorting method. We break the problem into smaller sets, then perform operations on them. Merge sort splits the collection in half, then sorts the halves separately. We then merge those results together.
- You keep subdividing the list until you have individual collections of only one element
- Compare the individual collections, sort them, then merge the individual collections into lists of size 2 (with sorted data)
- Take your size 2 lists and compare them by looking at the first index in each sublist. The smaller of the two becomes the first index in the size 4 list we are now building.
- Now compare the remaining item in the first collection with the first element in the second collection. Place the smaller of the two in the second spot of the size 4 list.
- Now compare the remaining items in the respective lists and sort, then merge them into the size 4 list.
- Do this until you’ve build back up to the size of the original list.
- Sort – Because you are comparing the smallest elements in each sublist, you have to walk through the code len(sublist) + len(sublist). At large populations, this becomes the len(largest sublist). An algorithm for sort that depends on the length of the collection is called linear, or O(n)
- Merge – Becuase we use recursion to break the problem into smaller peices then build them back up, the merge portion of the algorithm is logarithmic or O(log n)
- Merge-Sort – Together, these complexities give merge-sort a total algorithmic complexity of log-linear or O(n log(n))
- In fact, O(n log(n)) is the fastest a sort can ever be