HN2new | past | comments | ask | show | jobs | submitlogin

This is not "linear time" in any decent sense--we shouldn't conflate the number of elements with the size of the elements (in this case, O(max element) is actually exponential in the input size!).


This isn't really right. If the numbers can be arbitrary in size, then you can't compare them in constant time, which means that comparison-based sorts are quadratic.

This is directly related to my third point. Traditional analysis of sorting algorithms assumes the Von Neumann architecture, and in particular, it assumes constant-time comparisons.


The computational model will define what you can and cannot compare in constant time. It is perfectly fine to say that comparing elements of arbitrary size is a constant time operation - that's precisely what a computational model is: an enumeration of the operations that are considered basic.


Yeah, I know.

Go read the parent again. The author claims that sorting in this case takes time proportional to the size of largest element, and I'm saying, if it takes time proportional to the space consumption of in the largest element (presumably where the cost of the sort comes from), you can't define a computational model where comparison takes constant time -- it still has to read the digits, which we know takes linear time.

If your point is that you can define a model of computation with contradictions in it, then you should rethink whether what you are saying here is even relevant to the thread at all.


You can shurt circuit comparisons but that's not always going to happen. Consider it's completly reasonable to assume long int is sorted as fast as int on a 128 bit CPU. If nothing else it's useful to have a fairly long cycle even if you could in theory use a 20GHz CPU that has some comparisons finish in 4 cycles and others just 1. Sure, it might be slightly faster than a 5GHz CPU with constant time comparisons, but the added complexity is not free.


I am not sure I understand your point here.

Short circuiting comparisons is still linear for integers of unbounded size. But besides that, it's incredibly common to assume that 64-bit primitives are done in constant time, because to the asymptotic analysis, what matters is that the integers are _bounded_ in size. As long as it's bounded, the asymptotic analysis here doesn't care -- you can pick any size you want, as long as it's bounded.

One thing to keep in mind here (which it seems you are maybe confused about) is that these bounds are _asymptotics_ of the input size generally, and have very little to say about any particular choice of input size. 64-bit, 32-bit, whatever, it doesn't matter, if the size of the operands is bounded, the they're the same thing to the asymptotic analysis. As long as they're bounded, you can say that the operations on those operands is proportional to a constant times the (bounded) size of the operands.

The consequence of this is that when you deal with operands of _unbounded_ size, all of this asymptotic analysis goes right out the window. You can't possibly have a meaningful comparison operator, for example, that operates in constant time on unbounded integers.


With parrell analysis comparisons are log N operations. P1 compares N bits, P2 compares N bits, if use P1 unless it's equal then use P2.

Short circuting cuts that to log(log(N)) on average and log(N) worst case.

It's actually generally faster to compare X bytes of Vary large numbers than X Bytes of long int's. Degenerate case being 2 numbers of x/2 bytes.

Anyway, the point is treating log(log(N)) as constant is generally reasonable especially when N is small.


The discussion is about big O, yes? This isn't really relevant to our discussion. Parallel analysis and average case, for the purposes of the discussion here, are fancy games you can try to play to get nice bounds, but they do not really have any impact on the big O of these algorithms.


"You can't possibly have a meaningful comparison operator, for example, that operates in constant time on unbounded integers."

We don't really deal with unbound integers worst case is something like 2^(2^1,000,000,000,000,000,000)) before we can't actually store them.

Even then log of log of N makes random numbers of that size practically constant time to compare. Put another way there is less than 1 in 2^64 you need to do more than 3 comparisons and less than one in 2^128 you need more than 4. One for length, one for the chunk which might be just one bit and one more for the next 64 bits. Sure that might happen, but reolistically random input is unlikely to need many comparisons.

Stings on the other hand are more of an issue.


Retric, look.

Do you realize that the whole point of asymptotic analysis is that _we do not care_ what integers we "really" deal with? These bounds specifically deal with sorting arbitrary data of unbounded size, and _no other assumptions_.

Do you understand? No assumption that we can use parallelism. No games with integers you see in "real life". None of that is relevant. You can't play games with practicalities to get a better bound. This is the general bound, and if you fiddle with it to get something else, you are changing the scope of the question to be something else entirely.

If you want to have a discussion about those other models, fine, but that's not the discussion we're having.


What your talking about is a type of overly simplified model that's irrelevant for actual computing. My point is there are other models out there. If you actually want the fastest algorithm out there you need to design it for actual hardware with finite data, cache, limited registers, limited ram, and sometime aproximating a real instruction set etc.

However, by changing your assumptions you can still use O notation with more complex models.


Yes, everyone knows this. This is not helpful.


My point is indeed that "you can't compare [arbitrary sized numbers] in constant time" is a statement about some model of computation, and not all of them.


Again, yes, I know. I understand what the model is. I'm saying it's not useful to point out that you can choose contradicting axioms to base your model on. And make no mistake, the two axioms that you are proposing do contradict each other.

So again, the issue is not that you can't compare numbers of arbitrary magnitude in constant time -- I never said that. The issue is that it is a contradiction to say that it takes linear time to inspect all the digits of a number AND that it only takes constant time to compare O(n) digits of two numbers.

You're essentially saying that you can axiomatize your mathematical universe with nonsense axioms, which yes, you could do, but that's not useful to point out.


Of course the models I mentioned are not self-contradictory, I've no idea what you mean by that.

Your statement was "If the numbers can be arbitrary in size, then you can't compare them in constant time". I was merely pointing out that this is not a true statement.


flebron, you do not seem to be reading my posts and I am getting a bit annoyed.

Again, for the third time: you cannot "assume" that reading the digits of a number takes O(n), but reading the digits of two numbers to compare them takes O(1). That is absolutely, obviously true. Your response, that you can "assume" the latter is true as part of the model of computation is just wrong, plain and simple. There's nothing else to say about it.

The original claim directly implies this, and if you can't address that point, then just save us both the time and don't respond.


Just worthy of note here, we can do comparisons in linear time with respect to the number of bits, however OP's algorithm is exponential in the number of bits.


I thought that too, but couldn't you make element size irrelevant by renormalizing the list? One pass to find the max, one pass to divide everything by it.


Yea, it's linear time for n where n is the largest number—pretty sure it's unbounded in terms of numbers of elements.


True, but writing a linear sort with bounded number of distinct elements is not really a challenge.

E.g.

Intput, T: array of n elements, m: number of distinct elements.

    for i in 1..m
       for j in 1..n 
           if( T[j] == i ) print T[j]; 
This is O(n * m) which is O(n) since m is a constant.


the max size of the elements is usually constant when comparing sorting algorithms. The complexity of this depends on the algorithm of the scheduler. With the right scheduler this could be O(nlog n).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: