wareya's comments

wareya · 2026-03-06T00:35:44 1772757344

It's worrying, but it's consistent with how copyright law is currently written. Laws haven't caught up with what technology is currently capable of yet. The discussion should be whether, and if so how, our laws should be tweaked to stop this from getting out of hand, IMO.

wareya · 2026-03-05T14:41:04 1772721664

That's the "but their case would still fail if the second author could show that their work was independent, no matter how improbable" part of the post you're responding to.

jerf · 2026-03-05T15:12:55 1772723575

One out of ten to the power of "forget about it" is not improbable, it's impossible.

I know it's a popular misconception that "impossible" = a strict, statistical, mathematical 0, but if you try to use that in real life it turns out to be pretty useless. It also tends to bother people that there isn't a bright shining line between "possible" and "impossible" like there is between "0 and strictly not 0", but all you can really do is deal with it. Where ever the line is, this is literally millions of orders of magnitude on the wrong side of it. Not a factor of millions, a factor of ten to the millions. It's not possible to "accidentally" duplicate a work of that size.

wareya · 2026-03-05T15:46:11 1772725571

It sounds to me like you're responding to a different argument than they're actually making and reading intent into it that isn't written into it.

wareya · 2026-03-05T14:36:27 1772721387

This actually isn't what legal precedent currently says. The precedent is currently looking at actual output, not models being tainted. If you think this is morally wrong, look into getting the laws changed (serious).

wareya · 2026-03-05T14:01:20 1772719280

> If you had a hermetically sealed code base that just happened to coincide line for line with the codebase for GCC, it would still be a copy.

If you somehow actually randomly produce the same code without a reference, it's not a copy and doesn't violate copyright. You're going to get sued and lose, but platonically, you're in the clear. If it's merely somewhat similar, then you're probably in the clear in practice too: it gets very easy very fast to argue that the similarities are structural consequences of the uncopyrightable parts of the functionality.

> The actual meaning of a "clean room implementation" is that it is derived from an API and not from an implementation (I am simplifying slightly).

This is almost the opposite of correct. A clean room implementation's dirty phase produces a specification that is allowed to include uncopyrightable implementation details. It is NOT defined as producing an API, and if you produce an API spec that matches the original too closely, you might have just dirtied your process by including copyrightable parts of the shape of the API in the spec. Google vs Oracle made this more annoying than it used to be.

> Whether the reimplementation is actually a "new implementation" is a subjective but empirical question that basically hinges on how similar the new codebase is to the old one. If it's too similar, it's a copy.

If you follow CRRE, it's not a copy, full stop, even if it's somehow 1:1 identical. It's going to be JUDGED as a copy, because substantial similarity for nontrivial amounts of code means that you almost certainly stepped outside of the clean room process and it no longer functions as a defense, but if you did follow CRRE, then it's platonically not a copy.

> What the chardet maintainers have done here is legally very irresponsible.

I agree with this, but it's probably not as dramatic as you think it is. There was an issue with a free Japanese font/typeface a decade or two ago that was accused of mechanically (rather than manually) copying the outlines of a commercial Japanese font. Typeface outlines aren't copyrightable in the US or Japan, but they are in some parts of Europe, and the exact structure of a given font is copyrightable everywhere (e.g. the vector data or bitmap field for a digital typeface, as opposed to the idea of its shape). What was the outcome of this problem? Distros stopped shipping the font and replaced it with something vaguely compatible. Was the font actually infringing? Probably not, but better safe than sorry.

danlitt · 2026-03-05T17:29:08 1772731748

> If you somehow actually randomly produce the same code without a reference, it's not a copy and doesn't violate copyright.

I don't believe this, and I doubt that the sense of copying in copyright law is so literal. For instance, if I generated the exact text of a novel by looking for hash collisions, or by producing random strings of letters, or by hammering the middle button on my phone's autosuggestion keyboard, I would still have produced a copy and I would not be safe to distribute it. There need not have been any copy anywhere near me for this to happen. Whether it is likely or not depends on the technique used - naive techniques make this very unlikely, but techniques can improve.

It is also true that similarity does not imply copying - if you and I take an identical photograph of the same skyline, I have not copied you and you have not copied me, we have just fixed the same intangible scene into a medium. The true subjective test for copying is probably quite nuanced, I am not sure whether it is triggered in this case, but I don't think "clean room LLMs" are a panacea either.

> dirty phase produces a specification ... it is NOT defined as producing an API

This does not really sound like "the opposite of correct". APIs are usually not copyrightable, the truth is of course more complicated, if you are happy to replace "API" with "uncopyrightable specification" then we can probably agree and move on.

> it's probably not as dramatic as you think it is

In reality I am very cynical and think nothing will come of this, even if there are verbatim snippets in the produced code. People don't really care very much, and copyright cases that aren't predicated on millions of dollars do not survive the court system very long.

wareya · 2026-03-05T17:53:58 1772733238

> I don't believe this, and I doubt that the sense of copying in copyright law is so literal.

It is actually that literal, really.

> For instance, if I generated the exact text of a novel by looking for hash collisions,

This is a copyright violation because you're using the original to construct the copy. It's not a pure RNG.

> or by producing random strings of letters,

This wouldn't be a copyright violation, but nobody would believe you.

> or by hammering the middle button on my phone's autosuggestion keyboard, I would still have produced a copy and I would not be safe to distribute it.

This would probably be a copyright violation.

You probably think that this is hypothetical, but problems like this do actually go to court all the time, especially in the music industry, where people try to enforce copyright on melodies that have the informational uniqueness of an eight-word sentence.

> APIs are usually not copyrightable,

This was commonly believed among developers for a long time, but it turned out to not be true.

> This does not really sound like "the opposite of correct".

The important part is that information about the implementation can absolutely be in the spec without necessarily being copyrightable (and in real world clean room RE, you end up with a LOT of implementation details). You were saying the opposite, that it was a spec of the API as opposed to a spec of the implementation.

fc417fc802 · 2026-03-05T21:33:08 1772746388

> I don't believe this, and I doubt that the sense of copying in copyright law is so literal.

What color are your bits? That's all the law cares about.

The first sentence is the title of an essay.

wareya · on June 1, 2017

Creole is a scientific term, not a casual one. Creoles evolve from pidgins. English was never a pidgin, and it has a very clear history. No useful interpretation of the word "creole", formally defined or not, is broad enough to actually consider English to be a creole, and no good linguist will call it one. Whoever or whatever taught you that it can be considered one, it's wrong. The people who take the Middle English creole hypothesis seriously are crackpots.

kafkaesq · on June 1, 2017

I don't think you're precisely correct in all of what you're saying, above. But that's precisely the point I was trying to get across: for the purpose of this discussion, such distinctions are basically persnickety. And in terms of relevance to the original article under discussion, basically neither here nor there.

wareya · on April 5, 2017

First, even at a constant framerate, different players will have different experiences with the game due to different amounts of latency in their peripherals, weird CPU bugs, very very minor floating point behavior deviations, etc. The important part of making something framerate independent is to make the effects of framerate independence /smaller enough/ than other factors that make player experiences.

It's entirely possible to not have gameplay change at different tickrates. There are three usual types of problem that cause gameplay to change at different tickrates:

1) Events that happen at specific times

If events can only happen "in sync" with ticks, then running at different framerates will make them happen at different times. Say you have a machine gun that fires once every 1/10th of a second. If you're running at 24fps, then not only do you need to make sure that you correctly output a bullet every two-or-three frames, you also have to make it so that they /act like/ they came out at a time between frames, presumably by simulating them a little extra bit or a little less depending on what's necessary to make each bullet a fixed distance apart (for example). You could also use a continuous interaction system here but that's really hardcore and doesn't matter in 99.9%+ of cases.

2) Treating curved motions as linear motions

This isn't a problem in 99% of cases, but it ~can~ be a problem. If you can't represent your physics curves in closed form, it's effectively impossible to trace them in a framerate-independent way. The only thing you can do for these things is to "fix your timestep", but "fix your timestep" should be used with extreme caution and not be applied to action games because it adds input latency and hides frame-specific-timing information from players. Recommendation: use simpler game physics. Constant acceleration everywhere you can, special-purpose closed form functions with the right curves where you can't, and avoid tight curves. This makes it less of an issue.

You might also want a collision system that allows curved paths, but if you have enough control over your game's environments, and you don't expect people to run at SUPER low framerates, this isn't an issue.

3) Running different "integral" parts of the simulation out of sync with eachother.

This isn't literally about code order execution, it's about the behavior of different pieces of code that "add up" "over time" (if you're into calculus, think antiderivatives)

If you add n to your position every frame, that's all fine and good, as long as it always starts at exactly the right time. If you add "five pixels per second" to your position, then you need to make sure that always causes the same amount of distance travelled. Quake 3 handles this by giving subframe timings to things like pressing and releasing movement keys.

Another example is gravity. Normally, game code adds a specific amount of gravity every frame, possibly adjusted for time, either before or after motion. This isn't good enough, because if you add gravity after moving, very low framerates will have a higher initial frame of gravitational movement; essentially, low framerates will skip "outside" the ideal jump arc rather than tracing it. If you add gravity before moving, the opposite happens, low framerates trace "inside" the ideal jump arc. The correct thing to do is to calculate motion with around half the added velocity from all accelerations for that frame, which basically means the average of accelerating before or after. You can also trace a hermite spline, which is more flexible (you will have perfect framerate independence for any acceleration values that only change along line segments, i.e. "constant jerk", as long as you handle start/stop conditions for changes in acceleration with a correct continuous interaction simulation), but harder to implement.

The above is only 100% ideally possible for things you can represent closed form, and in practice, people will only implement something correctly if it's not just closed form but /simple/. Many games stop short of correct jump arcs, but it's entirely possible. Gang Garrison 2 was just changed to have a correct framerate-independent jump arc.

CJefferson · on April 5, 2017

I don't know what kind of CPU bugs or float point behaviour deviations you think modern cpus have that will effect a game.. such things would cause most online games to immediately go out-of-sync, and in practice they don't, and while latency might effect player's enjoyment, it won't effect how fast they run / how high they jump.

The hardest part is the curved motions -- and also collision detection. Without ticks it's extremely hard to make collision detection repeatable.

I'm unaware of any recent notable multi-player game which don't run on a tick-based physics engine (I'd be interested if you could point me to some) -- that suggests to me that tick-based is necessary.

Sure, you can get things "closer" in tickless physics, but I've never seen anything non-trivial which can produce exactly the same results, for reasons of floating point: when (A+B)+C isn't the same as A+(B+C), it's very hard to make your game produce exactly the same results.

As you say, with a lot of work, you can get a 95% repeatable result with a tickless engine, but you can easily get 100% repeatable with a fixed time-step engine, and in practice it seems to make things work well enough, and is VASTLY easier.

Jare · on April 5, 2017

During development of the game [Prototype] I experimented with a number of game logic update scenarios, including variable timesteps, fixed (potentially multiple per frame) and limited fixed (potentially leading to slow motion), in combination with factors like enabling and disabling vsync, taking into account camera movement (speed, mostly horizontal or mostly vertical?), amount of explosions and debris going off, etc. To try and find the best experience for the player at each moment.

But we always had physics running at a fixed update rate (always receiving a constant delta time) regardless of any of the above. Trying variable rates there will ruin the stability of your movement, collisions, platforming, etc in any system that is complex or requires precision and predictability.

avereveard · on April 6, 2017

yeah, 5 minutes playing with Kerbal Space Program really highlight the fuckery that variable delta updates cause to physic based games

wareya · on April 5, 2017

Online games avoid that problem by having an authoritative networking architecture, not by avoiding minor deviations in CPU behavior between players. Games that use synchronized lockstep that use floating point math always desync, because floating point math differs slightly between different processors.

Curved motions are easy as long as you have simple polygons and only one of them is accelerating and is accelerating in a simple way. You remove the acceleration by skewing the polygon you're going to collide with and then you have a linear path of motion again. (if the acceleration is not constant for the duration of that frame, then skewing doesn't give the right result; as long as you have a closed form representation, though, it's entirely possible to get it right)

"Don't run on a tick-based physics engine" is a misconception. Interactions are the borders between ticks. It's just dynamic. As long as your tick doesn't contain interaction changes inside it, you can simulate that tick with 100% tickrate independence with 100% certainty.

Floating point differences between cpus are Much Larger, especially if your code uses hardware-accelerated trig at all. Games networking is authoritative for this reason.

You can get a 100% repeatable result with a tickless engine.

Fixed timestep doesn't solve the problem. If the user can't run the simulation at 60 ticks per second, they're still going to slow down, period. All you're doing is separating simulation from rendering, which basically every modern FPS under the sun already tries to do in a different way than fixed timestep does. If you run the simulation slow enough that nobody will have performance problems with it, you just added tons of input latency. Thanks a lot, sincerely, someone with dysgraphia.

CJefferson · on April 5, 2017

    > because floating point math differs slightly between different processors.

I don't know why you keep saying this. If I compile a program which uses floating point for (say) x64, it will produce the same results on every machine. Can you give any example where the same executable will produce different results on different machines?

Now, you may get different answers on ARM, or 32-bit, but almost no games (I'm having trouble thinking of any) try to do cross-CPU networking, so the (extremely) minor differences doesn't make any difference. Most games don't sync between users, they trust each user to run the game engine -- you can't afford to send the total state of the world to users, it would take far too much network traffic.

Can you point me to a physics engine which gives 100% repeatable results with tickless? I'm genuinely interested, I didn't know of any that claim they achieve that, the common ones (box2d, unity and havok for example) certainly don't.

wareya · on April 5, 2017

>I don't know why you keep saying this.

Because it's true. The same operation on the same data may give different results on different CPUs, even if you're operating at the machine code level (no implementation-specific optimizations).

https://randomascii.wordpress.com/2013/07/16/floating-point-...

https://randomascii.wordpress.com/2014/10/09/intel-underesti...

>Can you point me to a physics engine which gives 100% repeatable results with tickless? I'm genuinely interested, I didn't know of any that claim they achieve that, the common ones (box2d, unity and havok for example) certainly don't.

box2d is sufficiently low level that it gives programmers the tools necessary to do this.

Most notably, box2d provides the following:

>Continuous physics with time of impact solver

This directly allows developers to create a "tickless" physics simulation by breaking up the simulation into the timespans between interactions, as long as pathological situations aren't introduced (like the quake 3 ledge climbing bug). That doesn't mean that the developer will actually do so, and if the gameplay logic interacts with physics in framerate-dependent ways the result will still be wrong. It just means that the possibility is there.

Of course, this doesn't solve game logic issues with things happening only at the moment that a frame happens. That's entirely on the developer, even if they use an ideal physics engine.

At the end of this, I'll repeat that fixed timestep doesn't actually solve the problem, all it does is allow you to output higher graphical framerates than the physics simulation is running at. If the physics simulation itself can't run at full speed, you still need a way to adapt to longer frametimes, and you have to do so correctly. And if you run the physics simulation so slowly that it won't slow down on any reasonable PC, you either have a very simple game or you just added tons of input latency.

buzzybee · on April 5, 2017

The reason why you don't see an issue while using Box2D is because its integration methods are low-error. A simple platform game using Euler style integration of the type

   each step, add forces from input and gravity. add the resulting acceleration multiplied by time to my position.

will trivially produce jump heights of 50% variation when subjected to a variable timestep. On a quick search, Box2D uses semi-implicit Euler [0], which much more accurately fits the curve. It does not guarantee zero error: to do that you have to have an analytic method, which isn't applicable to a general-purpose physics simulation.

[0] https://en.wikipedia.org/wiki/Semi-implicit_Euler_method

edit: i did forward euler wrong

wareya · on April 5, 2017

I covered integration error in my first post here, point 3. The thing I was calling Box2D out for is the fact that it doesn't use a hacky way of rectifying collisions. It seeks out the point in time that they occur.

>It does not guarantee zero error: to do that you have to have an analytic method, which isn't applicable to a general-purpose physics simulation.

If you have constant acceleration, the analytic way to get the point you want to be at at the end of the frame is simple: just pretend your current frame is using half the added speed from the acceleration that you're going to undergo this frame. Or you could use a hermite curve or something.

buzzybee · on April 5, 2017

Those are the integration methods I was talking about. An analytic method without error is one which can describe the curve at any moment in time, not "constant acceleration per frame".

ewjordan · on April 5, 2017

Box2d's continuous physics does not allow for a tickless simulation as I understand the term, because continuous interactions (stacking, sliding, rolling, etc.) are not reduced to discrete time-of-impact calculations. Joints are also in that category. If you run the testbed and crank the timestep way up, you'll definitely see problems with things like ragdolls.

Continuous collision detection definitely makes it easier to handle high velocities and bigger timesteps, but it's not a panacea.

wareya · on April 6, 2017

By "continuous physics", it doesn't mean the "objects don't go through walls" thing (that's easy), it means that it actually finds the times that interactions happen. Box2d isn't able to use that in all cases, but it does allow it to act the same way as a tickless simulation if your game's physics is simple enough, like 2d platformers.

ewjordan · on April 6, 2017

I think theoretically you're correct that the methods underlying Box2d could be used in such a way, but practically I don't think it's there, as that was never one of the engine's goals. Read up on the methods used (for instance, http://twvideo01.ubm-us.net/o1/vault/gdc2013/slides/824737Ca...) for some of the limitations that Erin chose to accept. In particular the fact that it misses multiple roots during the solve means that you can't use if for perfect tickless simulation; there are (were? I'm not sure to what extent they've been addressed since I was deep in that code) also issues with conservation of time that would be a problem, which are discussed elsewhere (for instance http://box2d.org/forum/viewtopic.php?t=154).

CJefferson · on April 6, 2017

I didn't know about the innaccuracies in sin/cos, good to know, I'll remember that, but I usually wouldn't use those in a physics engine (and now I certainly won't).

The other stuff, different compiling levels, changing the floating point rounding mode, does indeed change results, but that wouldn't effect a single compile of a program running on different machines.

strainer · on April 6, 2017

I maintain a javascript PRNG which relies on basic floating point math operations, addition and multiplication. Its test page here (https://strainer.github.io/Fdrandom.js/) displays a warning if it doesn't hit expected value 1 million rnds after default seeding. I've not found any computer or phone fail that basic compliance test yet. Differences in trig functions strikes me as more matter of Math library compliance that floating point unit, im not sure but would be hopeful these are fairly well standardised across javascript engines.

buzzybee · on April 5, 2017

To add to this statement: Input latency is a much broader issue than collision latency - a large scale of collision latency is produced simply by being an online game, regardless of how you're trying to represent the events. So the collision being subject to a tickrate is hardly a dealbreaker. Variable timesteps can be a little bit better at performance or responsiveness, but it comes at the expense of flooding additional concerns about consistency throughout the main loop. The modern fixed timestep treats a tick rate as delta times being partitioned into quantities of ticks, but also passes the time each tick represents into each update as a way to ensure that timers and time-centric physics can exhibit perceptually similar behaviors if the tickrate, and thus the duration of each tick, changes.

Input latency, OTOH, is something that has challenged developers from the earliest days since it has to do with how fast you propagate new input through the main loop to the output device, and the timestep's interaction with input is not straightforward. Arcade games could drastically improve their feel just by capturing and debouncing input multiple times a frame, even with the simple screen-refresh timesteps that were the common practice.

wareya · on March 30, 2015

Robots.txt is different. Without it, bots have no way of knowing whether to get any other data from the site. You would need "bots allowed" information in the HTTP handshake itself to prevent bots from accidentally hitting pages they shouldn't. This can already be Very Bad.

TazeTSchnitzel · on March 30, 2015

> You would need "bots allowed" information in the HTTP handshake itself to prevent bots from accidentally hitting pages they shouldn't.

Humans could also hit such pages. If your GET requests change state, there's no helping you.

wareya · on March 30, 2015

The whole point of robots.txt is that there are pages which people may hit that bots can't. What are you on?