I couldn't find a universal clustering algorithm yet: Frequently there is more than one way to group data that still makes sense, and as a result whichever final clustering option we choose - it will not be perfect.
Hm... unless maybe we do some sort of quantum clustering, which could be a fun project to explore!
It's a bit hazy now, but I remember trying hdbscan algorithm (hierarchical clustering), and on the graph of the GitHub size - I just couldn't fit it in memory.
I did end up using something similar to hierarchical clustering (mix of louvain/leiden/my own), and that's what we see in the final map.
Repositories are clustered based on jaccard similarity.
The mao helps you find what other related projects are available. So if you know something good - you can easily find what else is there.