Possibly all the ways to get loop-finding in graphs wrong

xelxebar · 2024-09-11T08:56:09 1726044969

First time I've heard of the disjoint-set data structure. For a DSF implementation, Wikipedia [0] gives the wildest asymptotic complexity I've seen, using the inverse Ackermann function! "Almost constant time" indeed.

[0]:https://en.wikipedia.org/wiki/Disjoint-set_data_structure

kragen · 2024-09-11T09:04:42 1726045482

more commonly it's called union-find; it came up on here last thursday: https://news.ycombinator.com/item?id=41450948

shepherdjerred · 2024-09-11T15:37:25 1726069045

Disjoint sets are awesome! You can use them with a _very_ simple algorithm to form a minimum spanning tree.

https://en.wikipedia.org/wiki/Kruskal%27s_algorithm

kevinventullo · 2024-09-11T16:13:05 1726071185

What’s especially surprising to me is that inverse Ackerman is provably optimal for this problem, regardless of implementation.

LudwigNagasena · 2024-09-12T01:31:08 1726104668

It’s a standard part of Data Structures & Algorithms curricula. You can find it in CLRS.

agumonkey · 2024-09-12T03:35:37 1726112137

Ha, I've been looking for an inverse ackermann instance for quite a while. Thanks

karmakaze · 2024-09-11T15:49:09 1726069749

I've always thought Floyd's tortoise and hare[0] algorithm was handy as it can find a cycle in a DAG with low memory usage.

[0] https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoi...

water-data-dude · 2024-09-11T16:31:01 1726072261

Here is where a smartass might point out that by definition you won’t find any cycles in your DAG

pavon · 2024-09-11T18:52:17 1726080737

It's short for Directed Accidentally-cyclic Graph.

alfiedotwtf · 2024-09-11T23:11:55 1726096315

Thanks for the morning laugh :)

konstantinua00 · 2024-09-11T16:29:41 1726072181

how does it work on DAG?

I always thought it's a linked list algo

tejtm · 2024-09-11T17:38:14 1726076294

By never finding a cycle, it proves the graph is a DAG.

jerf · 2024-09-11T18:38:06 1726079886

But you have to run it on a sequence, or perhaps even more accurately, something that is putatively a sequence but may have an infinite loop in it. Tortoise & hare requires an unambiguous "next" function, which a graph does not have. Modifying it to not require that so it can run on a graph may be possible but that would constitute a different algorithm for sure, with different complexity analysis, etc.

pavon · 2024-09-11T19:02:06 1726081326

It is useful for determining whether a given traversal of a graph has encountered a cycle, rather than determining whether the entire graph has any cycles. So different from what the article was about, but still a nice tool to have in the box.

zeroonetwothree · 2024-09-11T14:23:54 1726064634

This problem is called “online cycle detection” in the literature. You can search for various better algorithms that have been developed.

layer8 · 2024-09-11T14:49:51 1726066191

The article isn’t interested in the “online” aspect, but in finding all (what turns out to be called) non-bridge edges, in order to visualize the loops.

konstantinua00 · 2024-09-11T19:09:25 1726081765

it is interested in both

tromp · 2024-09-11T09:02:34 1726045354

The problem of finding cycles in (pseudo randomly generated) graphs is actually one of the first examples of an asymmetric Proof-of-Work puzzle [1]. Asymmetry means that while finding a cycle takes a lot of memory and effort, a solution can be verified instantly.

Several of the suggested algorithms were in fact implemented in the various Cuckoo Cycle solvers [2]. But by maintaining directed paths, my union-find based algorithm did identify the entire cycle and not just the cycle completing edge [1] (implemented in [3]).

This algorithm suffers from three problems though. One, it only finds one among a set of interconnected cycles. Second, it's rather slow, being latency bound by the random memory accesses. Third, it uses much more memory than needed.

So repeatedly trimming edges with only one endpoint is a much faster approach. And for the random graphs that the PoW generates, it can be done using only 1 bit per edge to record if it's been trimmed.

Once sufficiently trimmed, a loop tracing algorithm (like [4] for the Cuckatoo Cycle variant) can find all remaining (simple) cycles.

[1] https://www.semanticscholar.org/paper/Cuckoo-Cycle%3A-A-Memo...

[2] https://github.com/tromp/cuckoo

[3] https://github.com/tromp/cuckoo/blob/master/src/cuckoo/cycle...

[4] https://github.com/tromp/cuckoo/blob/master/src/cuckatoo/gra...

pxeger1 · 2024-09-11T10:09:44 1726049384

I initially took issue with "first asymmetric proof of work", thinking that in practice any NP-complete problem would surely be fine, even if we can’t prove that it can’t be solved quickly. But is it difficult to find NP-complete problems which can't also be quickly approximated? And I suppose it might also be hard to find NP-complete problems in a "Goldilocks zone" of difficulty to make for practical proofs of work, since their complexity increases so sharply.

tromp · 2024-09-11T10:46:16 1726051576

For Proof-of-Work applications, at least in blockchains, you need puzzles that

* can be pseudo randomly generated from a seed (usually the hash of a partial blockchain header)

* have a known optimal solution method with a (nearly) constant running time that's at most about one second (for progress freeness this should be a small fraction of the target block time)

NP-hard problems rarely satisfy both these constraints. Finding fixed length cycles in random (bipartite) graphs with billions of edges does.

JohnKemeny · 2024-09-11T10:36:52 1726051012

Approximation doesn't help, since you can require optimal solutions, but yes, it's very difficult to come up with hard instances. After all, you need to know the correct answer, so you can't just randomly generate instances. In addition, SAT solvers are ridiculously good these days.

JohnKemeny · 2024-09-11T10:38:33 1726051113

After trimming, you can also branch on edges: either it is in a cycle, in which you find one, otherwise you can remove it and further reduce the instance by trimming more edges.

JohnVideogames · 2024-09-11T09:08:19 1726045699

I had to tackle a problem much like this one when analysing the polygons in planar graphs (analysis of planar chemical networks). A fun approach I found was to use a Delaunay triangulation (the dual of a Voronoi diagram; draw the Voronoi polygons and join their centres if they share an edge). Then join the triangles into larger polygons by removing edges that are in the Delaunay triangulation but not in your original planar graph. Once you've got no more edges to remove you've got all the polygons in the graph and convenient triangles for rendering / analysis.

For planar graphs on the surface of a torus, like they faced here, I just ended up adding extra images of the central unit cell, and then removing the excess. Not elegant (lots of weird corner cases) but convenient for visualisation.

gilleain · 2024-09-11T11:53:17 1726055597

Chemistry does ring finding in various ways. I've always liked the SSSR (smallest set of smallest rings) just for the name :)

https://depth-first.com/articles/2020/08/31/a-smallest-set-o...

slashdave · 2024-09-11T22:59:12 1726095552

Unfortunately, as noted in your link, the SSSR solution is not unique. Also unfortunate, Daylight (the inventor of SMILES) added SSSR as an optional atom identifier in their SMARTS query specification without realizing this. Because OpenEye doesn't like SSSR, they actually leave out this feature in their implement of SMARTS, breaking from standard.

https://daylight.com/dayhtml/doc/theory/theory.smarts.html

BTW, there is a very clever algorithm for finding SSSR

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765087/

JohnKemeny · 2024-09-11T10:42:42 1726051362

Can you expand on what you wanted to analyze? Why didn't you just output the polygons immediately?

yohannesk · 2024-09-11T09:54:12 1726048452

If this article triggers a desire to work on an actual problem with graphs try https://leetcode.com/problems/freedom-trail

thaumasiotes · 2024-09-12T06:07:08 1726121228

This looks like a dynamic programming problem taking space proportional to |ring length|*|key length|. By that analysis, graphs don't really come into it.

What's the graph-theoretical approach? Does it have nicer characteristics?

vouwfietsman · 2024-09-11T16:36:43 1726072603

Funny I've had to do this occasionally and I always do the following: for each edge E, attempt to construct a clockwise loop from that edge by traversing each vertex and picking the edge that is "most clockwise" compared to the incoming edge. Converse for counter clockwise. This gives you, for each edge, the two loops the edge is a part of (if they exist).

It sounds horribly slow, but isn't because the loop finding process for one edge can mark all edges it traverses as either being on the same loop, or not having a loop, for the orientation its checking. These edges can be skipped henceforth.

This does not find all loops, only the smallest disjoint loops (as every edge is at most part of two disjoint loops). But basically results in a similar planar "face" partition the author describes.

Its not clear to me whether my approach does not work (perhaps the angle compares fail on a torus? I've never had to deal with torii), or simply isn't suited because it misses some loops, but I thought I'd share it anyway :-)

cvoss · 2024-09-11T16:58:00 1726073880

Notwithstanding the limitation of assuming the graph is embedded in a coordinate space where you can try to measure angles, I think this algorithm still doesn't work.

I may be missing a detail, but it seems to fail on a graph consisting of a loop with some extra edges branching off the loop toward its interior and some edges branching off toward the exterior. The left hand walk hits a dead end in one branch, and the right hand walk dead ends in the other, no matter where on the loop you start (except for some special cases).

Edit: strengthening the proposed counterexample

vouwfietsman · 2024-09-11T17:07:43 1726074463

Hmm I'm having a hard time visualizing your counter example, I'll try to think of it some more. An important detail I left out though is that in my problem sets edges do not cross, which I suppose simplifies the entire problem a lot :-)

mrgoldenbrown · 2024-09-11T18:42:39 1726080159

I think you are trying to say your method only works if the graph is "planar", ie it can be drawn on a flat piece of paper with no lines crossing.

vouwfietsman · 2024-09-11T19:03:52 1726081432

Indeed, thanks for the clarification!

eigenket · 2024-09-11T21:28:38 1726090118

If I understand your algorithm correctly I think it fails on this graph (drawn on my phone, using an online implementation of Paint).

https://imgur.com/a/ihPrwrX

I think this is roughly the counterexample the person above was suggesting.

vouwfietsman · 2024-09-12T07:05:34 1726124734

Aha! Yes to fix these cases a bit of bookkeeping is needed: a protruding edge like that would force the "walking" to back up, because its a dead end indeed. However, it also immediately means such an edge is not part of any loops, can be marked as such, and ignored upon retracing the steps.

It's an edge case that should be handled, but does not prevent the general algo from working.

eigenket · 2024-09-12T11:18:35 1726139915

Ah yeah, if you're not trying to identify all loops but just check if some loops exist then that works. I think you can dispense with the "walking around taking clockwise/anticlockwise edges" part though. If you repeatedly do the "remove all nodes with only 1 edge going to them" step then you've already identified whether or not at least one loop exists.

vouwfietsman · 2024-09-13T05:57:12 1726207032

Repeatedly removing those edges would help indeed! I am interested in finding the loops, but in the example provided the "dead end" edges do not contribute to the loop. Officially a loop is not allowed to contain the same vertex twice (besides start/end), otherwise any sequence of vertices could be considered a loop just by repeating them in reverse order.

However, indeed it remains true that it does not find all loops, only the smallest set of loops that cover all edges.

konstantinua00 · 2024-09-11T19:10:54 1726081854

how is it different from the "sidewalk" algo in the article?

you'd get the same pitfall of torus topology with 2 perpendicular loops

Hendrikto · 2024-09-11T08:46:54 1726044414

I think it is strange that there is no mention of either time or space complexity, in an article about graph algorithms.

If you don‘t need to be efficient, any problem is relatively easy to solve.

082349872349872 · 2024-09-11T09:36:08 1726047368

If you're not yet correct, complexity is moot.

> When in doubt, use brute force. —KLT

emmanueloga_ · 2024-09-12T07:43:50 1726127030

I thought I recognized the name; this article was written by the putty guy [1] :-)

---

1: https://www.chiark.greenend.org.uk/~sgtatham/putty/

sim7c00 · 2024-09-11T20:00:44 1726084844

one of the more interesting reads ive had on here so firstly thanks for that! love all the explanations of the intermediate solutions. ive been working lately with graphs to deduce network topologies and finding loops is one of my problems. excited to learn about the final answer, and happy to see it involves spanning tree :') kind of saves me reading through how that protocol works (or should work) so i can get on with implementing it! saving me likely all sorts of wrong solutions <3 thanks!

orbisvicis · 2024-09-12T22:05:19 1726178719

Same. It might also be pertinent to the radicle protocol, which was posted a few days ago and which I'm still slogging through. This has led to one of the most satisfying wikipedia dives in a long time, and I haven't even finished the original article. The wikipedia entry for DFS is very well written. Intuitively I would have structured find via path splitting, so I'm glad that the article covered all three find variations. Then that led to MSTs, and Kruskal's algorithm, then radix search (still reading). This thread is the first time hearing about CLR, so I'm definitely going to add that to my reading list, alongside Purely Functional Data Structures.

penguin_booze · 2024-09-11T17:23:47 1726075427

I'm curious: If I were to share the same link again now, I'll be forwarded to the latest submission instead. Interestingly, this link was shared twice in the past few days (click the "past" link above - and, yes, one of the submitters was I). How, then, was it that neither of the recent submitters, including myself, were forwarded to the latest at the time, but instead new submissions were created?

wonnage · 2024-09-11T07:35:44 1726040144

Any time you’re looking for a graph algorithm there’s a pretty good chance Tarjan’s name is on it

082349872349872 · 2024-09-11T09:33:23 1726047203

Tarjan's SCC is elegant with a heavily imperative flavour; has anyone come up with an elegantly pure (yet still linear) functional variation?

EDIT: looks like Okasaki has a linear-time functional; I'm pragmatic enough that 2 passes are (even small constant passes would be) fine for me — my graphs are all tiny enough that I'm not about to store them on tape.

EDIT2: my bad, not Okasaki himself, King & Launchbury: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

    scc' :: Graph -> Forest Vertex
    scc' g = dfs g (reverse (postOrd (transposeG g)))

(but their sol'n requires laziness to avoid generating an infinite intermediate; has anyone done it linearly, functionally, and eagerly?)

n4r9 · 2024-09-11T15:42:20 1726069340

I'm not too deep into CS theory so I'm not sure if this is an answer to your question but... SCC algorithms are useful when building routeable road networks from source data (like OSM) in order to filter out isolated sub-networks (like an airfield). We use dotnet but have something that looks possibly similar to what you wrote(?):

  StronglyConnectedVertices(Vertex source) => ReachableVertices(source).Intersect(ReverseReachableVertices(source))

The methods ReachableVertices and ReverseReachableVertices are effectively depth-first searches on the graph.

The challenge - in terms of performance - is actually not finding a single SCC but returning the full set of potentially thousands SCCs in large networks. I originally implemented an iterative (imperative) version of Gabow's "Path-Based" algorithm but it was too slow on continent-sized datasets. We did some experimentation with a parallel variant of Tarjan which didn't improve things much.

Eventually I settled for something "good enough" - randomly pick a vertex, calculate the SCC with the above code, return it if it's large enough (>1k vertices) otherwise filter it out, keep going until I've accounted for >90% of the graph. The remaining <10% gets filtered out. That's basically imperative (using a while loop) but could probably be functionalised.

-----------------------------------------------------

EDIT I've taken a better look at the linked paper and what I wrote is definitely not an answer to your question. But I'll leave it in case anyone's interested.

devit · 2024-09-11T14:07:42 1726063662

Seems like a trivial problem to me.

If the graph is directed, do an SCC decomposition in linear time using a graph library and then any SCC with size more than one has at least a loop, trivially extracted by following any edges in the SCC.

If the graph is undirected, compute a spanning tree in linear time using a graph library and then any edge outside the tree forms a loop, again trivially extractable from the edge and the paths from its endpoints to their least common ancestor in the tree.

Since those algorithms are already asymptotically optimal, it's just an engineering problem of finding the solution with the best constant factor given the precise data structures and goal in question.

zeroonetwothree · 2024-09-11T14:22:01 1726064521

The most obvious way is to just run a DFS, if you ever come back to a node you’ve seen then you have a cycle (and by storing parent pointers you can get the cycle back). That’s like CS 101 level.

However in the article it’s actually a graph being modified. If you don’t want to run a full dfs every time, there are better ways. This problem is called “online cycle detection” and you can google some other algorithms.

layer8 · 2024-09-11T14:38:34 1726065514

You obviously didn’t read the article. It’s about undirected graphs, and about visualizing the loops, not just testing for their existence. The final solution is indeed based on spanning trees.

ludston · 2024-09-12T06:23:30 1726122210

> Says problem is trivial

> Spews out computer science jargon that 99% of the world's population wouldn't even recognise, let alone understand.

Hmm.