More

anvaka · 2025-05-19T15:40:31 1747669231

I have updated the map with the latest data. This one is based on 500M stars given to repositories from GitHub's start till end of April, 2025.

Repositories are clustered based on jaccard similarity.

The mao helps you find what other related projects are available. So if you know something good - you can easily find what else is there.

anvaka · 2025-05-08T16:29:26 1746721766

Thank you!

anvaka · on Dec 22, 2024

oh wow. Glad you liked it!

anvaka · on Dec 22, 2024

appreciate the feedback - I'll take a look

anvaka · on Dec 16, 2024

haha thank you!

anvaka · on Dec 16, 2024

I couldn't find a universal clustering algorithm yet: Frequently there is more than one way to group data that still makes sense, and as a result whichever final clustering option we choose - it will not be perfect.

Hm... unless maybe we do some sort of quantum clustering, which could be a fun project to explore!

It's a bit hazy now, but I remember trying hdbscan algorithm (hierarchical clustering), and on the graph of the GitHub size - I just couldn't fit it in memory.

I did end up using something similar to hierarchical clustering (mix of louvain/leiden/my own), and that's what we see in the final map.

anvaka · on Dec 16, 2024

haha! I love vim.

We shall not quit.

anvaka · on Dec 16, 2024

love it =)!

anvaka · on Dec 16, 2024

Yes...

Aiming to redo it some time in early 2025!

anvaka · on Dec 16, 2024

Jaccard similarity is not particularly good for "celebrity" projects.

They are similar because they are popular, not because there is semantic relationship.

It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.

Stil works great for smaller, non-celebrity projects :D