Big-O notation explained by a self-taught programmer

dizzystar · on Oct 29, 2016

As a self-taught, I always get nervous when I see articles explaining Big O because they are almost always wrong. While this one is mostly correct, it is a bit simplistic and misleading, and it could use considerable annotations. The "scary" Wikipedia article has examples that are more nuanced than what is written here; the first example shows a constant time algorithm with nested for loops.

I take issue with the math fearing throughout the article. Time complexity is math, period. There is no getting around this fact, and the sooner you accept it, the sooner you learn that the math isn't that difficult, and the sooner you realize that, without some intuition on what the math is saying, you'll never really have a firm grasp of Big O or any of the other variations.

And also, why no mention of log time?

Waterluvian · on Oct 29, 2016

As a self taught programmer who did no math in university, the most valuable thing to me was the discovery that the math really isn't scary. Once you understand the symbols and notation, most concepts are rather easy, given you commit time to learning it.

I strayed away for so long until I watched the MIT lecture series on algorithms and it all just clicked. The best feeling was was seeing the math on the chalk board for how you determine the complexity of operations on various graph layouts and just getting it. I struggled in high school math and rarely got that feeling in class.

It was just one of those, "holy crap... It's all just an array and how we organize the data in that array that gives various benefits and drawbacks!" Incredibly empowering.

Edit: This is the series I watched: https://www.youtube.com/watch?v=HtSuA80QTyo&index=1&list=PLU...

code777777 · on Oct 29, 2016

Would you please share the link to the MIT lecture series? It may help the rest of us too. Thanks!

Waterluvian · on Oct 29, 2016

https://www.youtube.com/watch?v=HtSuA80QTyo&list=PLUl4u3cNGP...

I used YouTube to slow down and speed up the videos as necessary. Watched some parts a few times. Had a notepad out.

sn9 · on Oct 31, 2016

Coursera also has two great sets of Algorithms courses: one on analysis and one on implementation (the latter by Sedgewick).

Self-taught programmers really have an embarrassment of riches these days.

c0achmcguirk · on Oct 29, 2016

My company always works a "Big-O" question into interview questions. It's funny how we ask about the complexity of the algorithm and maybe 50% of applicants even know what Big-O notation is.

It doesn't seem to impact hiring decisions though. We haven't turned anyone down because they miss that question.

I think it's because anyone who has programmed for a year or so professionally already understands the concept of inefficient algorithms. They don't need to measure it mathematically, just learn how to optimize.

user5994461 · on Oct 29, 2016

That makes sense. Big-O notation is so overrated.

A little bit of history: C++ programmers were always able to choose the right containers simply by using this image, since long before the invention of Big-O.

http://homepages.e3.net.nz/~djm/containerchoice.png

bogomipz · on Oct 29, 2016

> "That makes sense. Big-O notation is so overrated."

Can you explain how having a classification that allows one to determine whether something is logarithmic vs vs quadratic is so overrated?

Isn't that a bit like saying "Algebra is so overrated"?

Also Donald Knuth introduced Big O somewhere around the mid 1970s and C++ didn't come along until 1979.

https://en.wikipedia.org/wiki/Big_O_notation#cite_note-knuth...

Also your link is a visualization of ADTs not run times. And while it true that ADTs are chosen for certain guarantees it still depends on how they are used in an algorithm.

mikebenfield · on Oct 29, 2016

I think (hope?) that user5994461's post was tongue in cheek.

BTW, you are way off saying Don Knuth introduced Big O. It was invented by physicists and mathematicians like Paul Bachmann and Landau, many decades before the 1970s.

bogomipz · on Oct 29, 2016

I didn't know the OPs post was tongue and cheek. Sometimes it hard to tell : )

Sure "O" goes back to mathematicians in the late 19th century. What I meant was that Knuth introduced Big O(mnicron) in the context of "Computer Science" literature.

This is the source I am referring to from 1976 SIGACT News:

http://www.phil.uu.nl/datastructuren/09-10/knuth_big_omicron...

On page 21 or page 4 of the PDF:

"I would like to close this letter by discussing a competing way to denote the order of function growth. My library research turned up the surprising fact that this alternative approach actually antedates the O-notation itself. "

On page 22 or page 5 of the PDF:

"The main reason why 0 is so handy is that we can use it right in the middle of formulas (and in the middle of English sentences and in tables which show the running times for a family of related algorithms etc.)."

BeetleB · on Oct 29, 2016

Mathematicians used it for asymptotic expansions of mathematical expressions.

Its use in computer science for algorithmic complexity came later. In fact, I believe in the early TAOCP, Donald did not use big O, but instead would derive the full expression where possible.

jb1991 · on Oct 29, 2016

The thing is that an O(n) algorithm can perform faster than an O(1) algorithm simply because Big-O is only comparing different scenarios on a single routine and not how it compares to other algorithms. A slow algorithm might technically be O(1) but that doesn't necessarily make it fast.

gk101 · on Oct 29, 2016

Yes, however big O notation isn't about which programs run faster, it's about how the runtime of a program changes in response to the input size

pluma · on Oct 29, 2016

...which is irrelevant depending on the bounds of the input size.

It boils down to optimization. There's no point in spending time optimizing for Big-O if the input size will be so small the difference between O(n²) and O(n log n) doesn't matter.

As with all optimizations, there are situations where it matters a lot. But in the real world there are plenty of situations where the time is better spent elsewhere or where there are other things that will necessitate replacing the code before the performance becomes a problem.

tscs37 · on Oct 30, 2016

>O(n²) and O(n log n)

What about O(n!) and O(logn)?

I think that will probably explode even at microscopic input sizes.

Even in the real world, understanding what O is helps a lot. That doesn't mean you should use it everywhere or that you have to.

But it means you should understand how your code scales. Efficient code scales because it has a low order, inefficient code doesn't scale because it has a high order.

And if you replace the code, what have you done but change the order of the algorithm?

Computers are just a collection of functions with some order, some are O(n!), some O(1), some O(infinity) and some others are O(nlogn). Big-O is everywhere no matter if you admit it or not, so understanding it can be a big help to understand why Code A is faster or slower than Code B

pluma · on Oct 30, 2016

To address your first point: There is an amazing amount of software where even O(n!) vs O(log n) doesn't matter. There is also of course a lot of software where it does. There is even more software where it does matter in some parts of the code but doesn't in others.

Every optimization has a cost. Time is a finite resource. Time spent optimizing one thing could be spent optimizing something else or implementing a new feature or fixing a bug.

That said, the original argument isn't that that which Big-O notation represents is generally irrelevant. The argument is that the formalism of Big-O notation itself is generally irrelevant (except where it isn't, obviously).

You don't need CS101 to understand that a nested for-loop probably scales worse than a single if-statement. Or that looping over the input twice in succession is probably faster than nesting those loops. Or that only looking at half the input is faster than looking at all the inputs, and so on.

To put it in other words: you can figure out that something that scales from an ant to an elephant to a skyscraper scales worse than something that scales in multiples of ducks without having to use a ruler or charts. Big-O is the more accurate tool but there is a tremendous amount of situations where you don't need an accurate tool and a vague understanding of the same thing the tool would measure is entirely sufficient.

tscs37 · on Oct 30, 2016

>The argument is that the formalism of Big-O notation itself is generally irrelevant (except where it isn't, obviously).

That sentence kinda defeats the argument itself. If it's irrelevant except where it is not, then your argument is obviously true.

We could also say that some number X is always not 1 except when it is.

>You don't need CS101 to understand that a nested for-loop probably scales worse than a single if-statement.

That is true, but it's just Big-O applied via intuition rather than learning.

Big-O is mostly the expression of intuition for some people.

bogomipz · on Oct 29, 2016

But this fact is accounted for in the definition of Big O in the "for all n > k" no?

"f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. "

Buttons840 · on Oct 29, 2016

Tip: If you're asked about Big-O in an interview, and your rusty, at least determine whether the algorithm is sub-linear or super-linear. Just knowing this is 80% of what you need to be aware of in industry IMHO.

MOARDONGZPLZ · on Oct 29, 2016

Maybe you're being hyperbolic (and maybe I'm being pedantic), but from what I recall Knuth was the first to describe complexity notation in the late-60s or early-70s in The Art of Computer Programming. This definitely predates C++.

allan_s · on Oct 29, 2016

I can't agree more, when interviewing I always ask this kind of questions of "theoretical knowledge" not to have the actual answer, but to see how candidates reacts when they don't know. Because if the guy not only admits he does not know directly and is eager to know what it is briefly, then you know that giving your team and environment is favorable to learning, the guy will soon be able to catch up, and it's certainly the same guy that one day will bring in the team's "tips and tricks" channel an article or an insight that will make the team's knowledge grow too.

erikbye · on Oct 29, 2016

How can anyone applying for a programming job not know Big-O notation? That's CS101. I don't care if you're self-taught, it's audacious to even call yourself a programmer if you don't know the very basics of algorithms and data structures.

If I were interviewing a candidate, and it was revealed he didn’t know Big-O notation, my next questions would be about data structures, because now I’m suspecting he wouldn’t even know how to implement the simplest of structures. What next, simple pointer arithmetic, is he unable to even walk an array?

We are no longer talking about a programmer then.

Not knowing these things, I usually chalk it up to either laziness, or lack of interest in understanding how things work.

Either way, that’s not someone I’d want on my team. Programmers should be enthusiastic about fundamentals. Good programmers have a “hacker” mentality, a need to know and understand inner workings, a craving to dig deep. I’d say this mentality is what you want from anyone in STEM.

binarycrusader · on Oct 29, 2016

How can anyone applying for a programming job not know Big-O notation? That's CS101.

You pointed out exactly how someone would not know -- they may have never taken computer science classes and taught themselves programming.

Or they may have taken some CS classes, not enough to have covered that particular concept.

I'd guess that, especially for older programmers, anecdotally, it's not uncommon to not really understand or use it.

I don't care if you're self-taught, it's audacious to even call yourself a programmer if you don't know the very basics of algorithms and data structures.

Just because someone doesn't use the same terminology to describe a set of concepts doesn't mean they don't understand those concepts.

Also, I wasn't aware there were were formal qualifications for someone adopting the title of "Programmer", pretty sure big-O notation isn't part of the dictionary definition.

ivanhoe · on Oct 29, 2016

But still you had to hear about it sometimes, somewhere, it's mentioned in pretty much any serious text on algorithms. Thing is that many people see it mentioned, but never bother to stop & learn what it's all about... and that's telling something about their approach to programming (and life) in general.

binarycrusader · on Oct 29, 2016

Sorry, but, it doesn't tell you anything.

The world of computers is full of things to learn about, more than almost any one person could learn in a lifetime.

People usually learn about what interests them, or what they need to know. It's entirely possible to be an accomplished programmer and never really learn big-O notation just as some programmers never learn assembly or C.

Think about it differently, big-O wasn't widely discussed or well known until the 1970s, thanks to Donald Knuth who popularized it. Then consider that many programmers who learned programming may have done so before it was widely known and some may have learned programming from those same people.

Are those programmers from roughly 1980 and earlier any less a programmer because they weren't aware of or didn't use big-O notation? Of course not.

Is big-O notation something most programmers should consider learning? Yes. But we should not seek to exclude others from a profession simply because they don't meet arbitrary criteria we think is necessary when they are clearly already actively engaged in that profession.

mikebenfield · on Oct 29, 2016

Thank you. As someone who is tangentially connected to software (I'm a mathematician who sometimes intersects with CS/software), the anti-intellectual attitude of some software people is really baffling.

You're working in a field that requires you to use your brain, learn new stuff regularly, and solve problems. Isn't being ignorant of the fundamental concepts of your field a massive red flag that something is wrong?

digital_ins · on Oct 29, 2016

I started off disagreeing with your statement but as I read further, I became a believer. In fact, "gotcha" and other trick questions are the worst, but understanding CS fundamentals shows that you were paying attention in class and are probably interested (if not passionate) about CS.

If not, there are hundreds of other openings that the candidate can apply to.

MarchHare · on Oct 29, 2016

Depends on the role, for junior devs my company doesn't care if you have ever programmed before. If the person is clever, we can teach them to program. For example, the most recent hire has a masters in maths from oxford, he may have never seen Big-O, but he could certainly understand it if need be.

pluma · on Oct 29, 2016

> simple pointer arithmetic

Because pointer arithmetic is so vital to being a programmer in Python, Java, Ruby, JavaScript, ...?

Personally I know most of these things. I'm mostly self-taught. I looked into C++ at some point. I took some university courses. I've also been programming for a decade and a half (I'm 31, made my first website around age 10 and later picked up PHP before moving on to other languages).

But in my entire career there was not one point where I needed to implement a sorting algorithm myself. Or where I even had to know the difference between the various sorting algorithms. There have been a handful of situations where I needed to implement some kind of tree structure but very few moments where I benefited from knowing the difference between a list, an array, a stack or a queue (other than the different APIs in the standard libraries).

You are right that someone who's sufficiently interested in how things work will likely teach themselves these things even if they're not relevant to their work. But there's only so many hours in the day and not everyone shares the same interests. I have a basic understanding of several programming languages I don't use, including C/C++, C#, Prolog, Erlang and Haskell. I know many but not all of the concepts that make them different. I understand the general idea behind how assembler code works, formal logic, TCP vs UDP, endianness, binary representations of numbers, even some basic electrical engineering (I've played with Arduinos and shift registers). I also have a basic understanding of state machines, neural networks, genetic algorithms and all kinds of other stuff that will probably never be relevant for most of my work.

I'm a full-stack JavaScript developer these days so I also have some rudimentary knowledge of design and layout (color theory, spatial relations, typefaces, accessibility, etc) as well as database operations and Linux system administration. I know my limits -- I'm not a designer and I'm not a sysadmin and I feel more comfortable working together with them knowing they have my back.

I'm fully aware I have a broader knowledge when it comes to software development than most people I've had the fortune of working with. But I'm also fully aware that they often not only have deeper knowledge on any given subject than I might have but that no matter how broad my knowledge is it's incredibly likely that I have knowledge gaps in topics they might be taking for granted.

We like to stroke our ego and brag about the "hacker mentality" and how hackers are superior because we want to know everything (implying we can know everything or at least more than others). But knowledge isn't simply defined as a quantity. There are very few things that are essential knowledge to be a programmer and even they are debatable.

Calling people "not real programmers" is bullshit posturing. Software development is an incredibly broad field and no matter how good you are, you can never have expert knowledge of all of it (and even claiming a working knowledge in all of it strongly suggests your overall knowledge is so shallow as to be useless).

I've seen this kind of elitist bikeshedding over the definition of what is a "real programmer" since I wrote my first line of code. JavaScript is not a real programming language because you're "just scripting the browser". Shell programming is not real programming because it's just the interactive command line. Dynamically typed programming languages are not real programming languages because you can't write serious software in them. PHP programmers aren't real programmers because PHP is only used by amateurs. And on and on and on.

You know what? If I see someone applying for a programming job who doesn't know pointer arithmetic, I don't care. It's just pointer arithmetic. They can still learn that if they have to. Instead I focus on what they DO know. Maybe they come from a strong math and logic background and can optimize the crap out of our code. Maybe they come from a design background and can spot deep flaws in our fundamental assumptions. Maybe they have intricate domain knowledge and can us help make a product our users actually want.

Yes, if you apply to a senior C++ developer position and don't know pointer arithmetic or can't implement a linked list, you're probably not a good fit for the position. But flat out dismissals based on bullet points are bullshit.

RandomOpinion · on Oct 29, 2016

Yep, have to agree with this. People who don't know big-O notation (and other CS fundamentals) are the ones gluing together libraries and code built by people who do. Now there's nothing wrong with that; there's enough work to be done for all of us. However, it's not something to crow about either.

eweise · on Oct 29, 2016

I've been programming for 25 years and never had a class covering big O notation.

achr2 · on Oct 29, 2016

The thing I always run into when discussing big-O are people (good programmers even) who think all O(x) algorithms have the same efficiency. I find it very frustrating when someone says my streamlined O(N) algo with 5 operations has the same efficiency as their O(N) algo with 20 extra function calls and operations. Big-O is not the only determining factor...

BjoernKW · on Oct 29, 2016

In a way Big-O notation intentionally glosses over these differences. At a large scale 5 vs 20 per instance of n doesn't matter. It might matter for practical purposes but Big-O really is about making broad distinctions.

_greim_ · on Oct 29, 2016

Yup. Achieving a better big-O could for example be the difference between something being possible or not, whereas fine-tuning an algo without altering its big-O might be the difference between needing five servers or ten. Still relevant, but a different class of relevance.

FreeFull · on Oct 29, 2016

It can also be the case that an algorithm with a worse big-O complexity might actually be better because all your inputs are small and the constant multiplier is lower than the more complicated, better big-O algorithm

stouset · on Oct 30, 2016

This suggestion gets trotted out in every discussion on Big-O. In 16 years of doing this professionally, I have not once encountered a situation where that would have been relevant. Any case where it even remotely might have been, the algorithm wouldn't have been hot enough to warrant considering it at all.

FreeFull · on Oct 31, 2016

One example would be multiplying large matrices together. A common algorithm used is https://en.wikipedia.org/wiki/Strassen_algorithm with a big-O complexity of around O(n^2.8074). https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_a... has a complexity of around O(n^2.375477), but the constant factors are so large it is currently never better to use it over the Strassen algorithm.

achr2 · on Oct 31, 2016

I definitely agree with you, my comment was simply to show that there are a lot of people who use Big-O as a metric at a level where it shouldn't be applied.

mrcactu5 · on Oct 29, 2016

Don't you want to know if your recipe takes 10 minutes or 10 hours to cook?

jb1991 · on Oct 29, 2016

That would be nice to know, but Big-O won't tell you that. It makes no time guarantees, it only provides an idea of an implementation. An O(1) algorithm sounds great, until you realize that 1 means 1 hour for that particular algorithm, while a competing algorithm's O(n) might mean n seconds in reality.

What's in the parenthesis is not a measure of a unit of time, which is why Big-O fanatics often miss the bigger picture.

tscs37 · on Oct 30, 2016

>An O(1) algorithm sounds great, until you realize that 1 means 1 hour for that particular algorithm, while a competing algorithm's O(n) might mean n seconds in reality.

Well, I'm not trying to miss the bigger picture, but if you're going to scale that your O(n) algorithm will quickly take longer than a O(1) algorithm.

I mean, let's just take a hashtable as example. This hash takes 1 hour to complete while you can search the list in milliseconds.

But what if the list has billions and billions of entries and it takes longer than 1 hour?

As some others said, Big-O is not something you use to choose one algorithm over the other, it's something you use to choose an algorithm that scales well and has acceptable performance.

O(n) vs O(1) is a rather tame comparison, what if your competing algorithm is a O(n!) or worse?

guessmyname · on Oct 29, 2016

> Don't you want to know if your recipe takes 10 minutes or 10 hours to cook?

Yes, the first time, then it becomes a skill learned from experience.

I started learning Big-O a few months back and I have been working as a programmer for several years, not knowing about Big-O didn't stop me from becoming a lead of a development team. Knowing about Big-O is important but humans learn more from experience (aka. trail and error) than anything else. I started learning about Big-O just because I am interviewing right now and many tech companies love to ask those questions, but if I were not interviewing I would probably leave that topic for another time.

R_haterade · on Oct 29, 2016

Sometimes. Sometimes you just need to throw things in the oven until the first one browns up.

I recently had a problem where I needed an adjacency matrix of shortest paths. It was a choice between dijkstra and floyd-warshall. My dijkstra implementation kicked the pants off of floyd warshall for this application by an order of magnitude, which you wouldn't really expect. And the big-O complexity was the same for both algorithms, it's just that for the graph structure, the operation count for dijkstra was much lower. It was the first one to brown up.

bogomipz · on Oct 29, 2016

Yes and its good to know the limits of Big O as you mentioned Big O tells you what an upper bound on something is but it will provide no answer as to which to which of two algorithms with the same upper bound will be faster. There is a field that deals with this and I watched some lectures by Robert Sedgewick called "Analytic Combinatorics" where he talks about a different analysis to do just that. He talks about being able to tell large institutional clients exactly how long an algorithm on a particular input will take and how important this was in the 1970s when computing power was much slower and far more expensive.

R_haterade · on Oct 29, 2016

I will have to check this out as I'm in a lot of places where speed matters these days.

Does he delve into the nuts and bolts of data structure for this, or does it stay more theoretical?

bogomipz · on Oct 29, 2016

Its both. Here are some links:

Video Lectures are here: https://www.coursera.org/learn/analysis-of-algorithms

and the book:

http://aofa.cs.princeton.edu/home/

Also a youtube - short history of algorithm analysis by Sedgewick is worth a watch:

https://www.youtube.com/watch?v=qap2MyBTSZk

gk101 · on Oct 29, 2016

A few comments just in case this a critical part of your program, and if running it faster would help:

1. Using a heap with Dijkstra's algorithm would speed up your program

From your comment I am guessing you are using Dijkstra's algorithm without a heap, which gives O(n * m) for one source, and O(n^2 * m) for all sources (all pairs shortest path), where n is the number of nodes/vertices, and m is the number of edges

You would be able to improve that to O(m * n * lgn) by using a heap, which improves the time to run it by quite a lot in practice

2. The time it takes to run the program also depends on the density of your matrix:

In the case where the graph is sparse: m ~ n, and will result in O(n^2 * lgn) (or O(n^3) without a heap)

In the case where the graph is dense: m ~ n^2, and will result in O(n^3 * lgn) (or O(n^4) without a heap)

Compare the numbers above to O(n^3), which is the time complexity of Floyd-Warshall

So just in terms of time complexity, Floyd-Warshall would be faster in a dense graph

R_haterade · on Oct 29, 2016

Agreed, but this was for a fast script that will probably run 100 times for analysis purposes--the code itself wasn't the end goal. I just needed something that could crunch my numbers in 10 minutes instead of 2 hours. :-)

R_haterade · on Oct 29, 2016

I do want to express that I appreciate the effort you put into this post. Algorithms and data structures are my weakest point right now, and being able to read through your explanation is a helpful thing.

e12e · on Oct 29, 2016

Rather, don't you want to know if that great dessert you ate, that took half an hour to prepare for 3 people, will be possible to make in an hour for a seating of 250 people? Because if it can't be done in an hour, you might have to pick something else to serve as dessert...

(And we could probably turn this into a question about caching, by allowing you to prepare some of it a day early...)

Anasufovic · on Oct 29, 2016

Yes, testing is critical.

castratikron · on Oct 29, 2016

Didn't know that the "O" in Big-O actually means "order". I guess it doesn't really matter too much.

If you really wanted to get theoretical, you could also talk about the whole family, including little-O, big-omega, little-omega, etc. :)

bogomipz · on Oct 29, 2016

I don't believe this is correct, I believe O if for Greek letter Omnicron, just as the other Greek letters are used Big Theta and Big Omega are used in discussing bounds in time complexity.

I think O meaning order would be a retronym.

p333347 · on Oct 29, 2016

The usage "big omega" and "little omega" always makes me chuckle, as they translate to "big big oh" and "little big oh".

allan_s · on Oct 29, 2016

After seeing a lot of peers being mislead by big O I think most of big O articles on the web, for the sake of clarity or simplicity omits to express one thing

if we admits functions * square_big_o O(n²) * linear_big_o O(n)

then the statement

time(squarre_big_o(x)) > time(linear_big_0(x))

is not necessary true for every value of x

because the actual complexity could be

  * 1000+1000*N
  * 2+N²

in which case most of the time you will chose the square one, because practically it will be faster.

without this in mind, you got peer who tell you that this thing is faster because the algorithm is O(n) versus O(n²), though we're talking about a function that will always have n < 10, but yesterday night they read an article about big O and now they have to throw a "let's think about the big O" remarks for every single problem we meet

Bahamut · on Oct 29, 2016

Except if one knows what the definition of big O is, there is nothing to be misled by. What big O tells you is what is the function bounded by after a constant lower bound on the input.

One always want to do performance testing to see whether it comes into play even when it comes to more practical applications such as in code.

adsofhoads · on Oct 29, 2016

Extensively incorrect.