HN2new | past | comments | ask | show | jobs | submitlogin

Great insightful comment. I came to the same conclusion a number of years ago. We did something about it - we built a new Hadoop platform around a not very well known distributed, in-memory, open-source database - MySQL Cluster (NDB). It is not the MySQL Server you think you know. It is an in-memory OLTP engine used by most network operators as a call subscriber DB. It can handles millions reads or writes/sec on commodity hardware (it has been benched at 200m reads/sec, about 80m writes/sec). It has transactions (read committed isolation level) and row-level locks. It supports efficient cross-partition transactions using one transaction coordinator per database node (up to 48 of them). You can build scalable apps with strong consistency if you can write apps with primary key ops and partition-pruned index scans. We managed to scale out HDFS by 16X with this technique. Since then, we have been doing like you suggested - we built a microservices architecture for Hadoop called Hopsworks around the transactional distributed database. All the evils of eventually consistency go away - systems like Apache Ranger/Sentry become just tables in the DB. More reading is available here: http://www.hops.io/?q=content/news-events



Hopsworks looks like it might be exactly what I need, I do typical data science work for small to small-medium data and wanted to start properly playing with spark on a HDFS store.

Currently most work is just done in R/Python in VM's on a small proxmox cluster (where only 1 node is always on) but I'd like start gently moving to spark, run the stack on a single node and scale on demand.

Is Hopsworks for me, does this approach even make sense for such small data or am I crazy? Thanks for your response!


Yes, Hopsworks can run on anything from 1 server to 1000s. We are finalizing the first proper release now - Jupyter support, tensorflow, pyspark, sparkr, python-kernel for jupyter too,


Awesome, that sounds perfect, I'll give it a shot. You have a mailing list or anyway to follow? Cheers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: