Hacker News .hnnew | past | comments | ask | show | jobs | submit | anvaka's commentslogin

I have updated the map with the latest data. This one is based on 500M stars given to repositories from GitHub's start till end of April, 2025.

Repositories are clustered based on jaccard similarity.

The mao helps you find what other related projects are available. So if you know something good - you can easily find what else is there.


Thank you!


oh wow. Glad you liked it!


appreciate the feedback - I'll take a look


haha thank you!


I couldn't find a universal clustering algorithm yet: Frequently there is more than one way to group data that still makes sense, and as a result whichever final clustering option we choose - it will not be perfect.

Hm... unless maybe we do some sort of quantum clustering, which could be a fun project to explore!

It's a bit hazy now, but I remember trying hdbscan algorithm (hierarchical clustering), and on the graph of the GitHub size - I just couldn't fit it in memory.

I did end up using something similar to hierarchical clustering (mix of louvain/leiden/my own), and that's what we see in the final map.


haha! I love vim.

We shall not quit.


love it =)!


Yes...

Aiming to redo it some time in early 2025!


Jaccard similarity is not particularly good for "celebrity" projects.

They are similar because they are popular, not because there is semantic relationship.

It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.

Stil works great for smaller, non-celebrity projects :D


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: