HN2new | past | comments | ask | show | jobs | submitlogin
Show HN: DataStation – App to easily query, script, and visualize data (github.com/multiprocessio)
131 points by eatonphil on May 31, 2022 | hide | past | favorite | 31 comments


I really don't like the "hidden" enterprise license. On the surface you might assume it's AGPL 2.0 but upon careful inspection, there lives an enterprise commercial license in the ee/ directory.

Wish you'd been more up front about this, for many organizations, the license is a make or break, as they often customize it for their use and distribution.

I'm walking away confused. Not sure if using this or creating derivative work from it results in being sent an invoice for an enterprise license or what the deal is.


I took inspiration from the structure of Gitlab which commits ee code alongside open source (in the same style of folder structuring where everything outside of ee is open source and everything within ee is proprietary.) I'd like to keep it that way because it simplifies operations.

But all the screenshots and website and docs only show open source features. And the core is Apache 2.0, not AGPL, as the license indicates.

ee is only extensions on top. But there's not much to say about the ee folder because not much functionality exists yet in the ee version. In fact you can't even install the ee version in any form yet. Only the community edition is available for download.

Happy for feedback in being clearer!


appreciate the response


I love the idea of a data notebook. Have you seen Franchise? It's a lovely design for something similar, though its implementation (which stopped in 2019) didn't go beyond querying and visualization, rather than manipulation. But the stying and component design are lovely, and it may be possible to pull that from their code. https://github.com/HVF/franchise


Nope I haven't seen that! Looks interesting, thanks.


Looks very useful! In terms of feedback, I think if you brought in a designer you'd have a much bigger "wow" factor. There's a lot of low hanging fruit like consistent button styles, fonts, whitespace, larger text inputs, that'd go a long way. And I'm sure you've thought of this already, but seems like a node-based paradigm could be an improvement over the panel-based paradigm e.g. more akin to something like Blender nodes, or Tableau.


> In terms of feedback, I think if you brought in a designer you'd have a much bigger "wow" factor. There's a lot of low hanging fruit like consistent button styles, fonts, whitespace, larger text inputs, that'd go a long way.

Yes this would be nice to have! If there's a version of this that gets funded or bootstrapped then I'd definitely like to bring someone on to help.

> And I'm sure you've thought of this already, but seems like a node-based paradigm could be an improvement over the panel-based paradigm e.g. more akin to something like Blender nodes, or Tableau.

Actually no I'm not familiar with this concept. But I have seen what natto.dev does and I'm concerned that that is too free form compared to how DataStation works. A little structure is useful IMO. I'm not sure how similar Blender nodes or Tableau are to natto.dev.

That said, DataStation panels show up in an order but the order of evaluation is not set. You can import the results of a panel defined below the current panel it just matters that the panel you refer to has been run. So it may be closer to a node-based design in that case. But again I'm not sure if that's what you mean.


Hadn't seen natto before, but I agree that's pretty far out there! If you search images of Tableau Prep, that's more along the lines of what I had in mind. Although Tableau supports Python and R, it's not nearly as well integrated as what you've done with DataStation. In general, it's more geared towards Excel power user types, rather than programmers.


> If you search images of Tableau Prep, that's more along the lines of what I had in mind.

Ah! I think this is a visualization of what does happen with DataStation panels too. Eventually I'd like to have better support for understanding the dependency graph like this but for now that's just been a nice idea to have sometime in the future.

> Although Tableau supports Python and R, it's not nearly as well integrated as what you've done with DataStation. In general, it's more geared towards Excel power user types, rather than programmers.

Yeah it was definitely my impression it was not geared toward programmers as much (though I know many programmers or data scientists use it).


Hey folks! I quit my job at Oracle almost a year ago now to build DataStation. It's an app I've wanted as an engineering manager for years. It's entirely open-source and while I've had a few awesome contributors I'm mostly the only person on it. It has been funded out of contract development and savings.

DataStation helps you query a variety of data sources (conventional SQL like PostgreSQL and MySQL, non-SQL like Prometheus or Elasticsearch), files and HTTP APIs. It is not a SQL layer on top of these various APIs like FDW in Postgres or Apache Calcite.

DataStation just tries to abstract away glue code. So in DataStation for Prometheus you query with PromQL. For Elasticsearch you query with Lucene. And for SQL databases you query with their SQL dialect. But you don't need to remember how to use the appropriate library for your language. You just need your own credentials.

DataStation is made of panels (other apps might call them cells) that each produce a result. Panels can refer to other panels. These allow you to build workflows that cross the boundary of a particular datasource. For example you might have some data in a CSV a product manager gave you and the bulk of your data is in PostgreSQL. In DataStation you could pull in the CSV with a File panel and pull in the Postgres data with a Database panel. Then you can join both panel results in a Code panel using your favorite language like Python, Ruby, R, Node, Julia, etc. You can even script Code panels in a SQLite dialect with a bunch of rich addons (url parsing, best-effort date parsing, statistics aggregation, etc.): https://github.com/multiprocessio/go-sqlite3-stdlib.

You can watch a simple introductory video: https://www.youtube.com/watch?v=q_jRBvbwIzU. Or if you want to see that cross-datasource interaction taken to an extreme, check out this video using Postgres metadata to filter log data in Elasticsearch to do historic request analysis on a subset of customers: https://www.youtube.com/watch?v=tIh99YVHoRE.

DataStation is mainly a desktop app today where the end result is that you export graph SVGs or HTML tables or markdown tables or just a CSV file. All this data stays on your laptop so it's as easy to use in a corporate environment as any existing SQL IDE or Jupyter Notebook.

In the last year it's reached 1.5k stars on Github, over 1000 unique users and currently on-average about 40 fairly active users per month (defined as having opened the app more than a few times).

Since it's only just now 12 months old it's been going through a lot of maturing during this time. If you've tried it before and it was buggy or too slow it's probably worth another try now if you're still interested.

DataStation is primarily an Electron app but the code that evaluates panels is written in Go. The Go evaluation code forms the backbone of another app you may have seen around HN, dsq: https://github.com/multiprocessio/dsq, which is a limited version of DataStation as a CLI for querying files with SQL.

In the future I'd like to see more people using it as a server app where my goal is to support read-only dashboards and recurring exports. That part is still work-in-progress.

You can find a ton of tutorials on how to interact with supported databases on the DataStation website: https://datastation.multiprocess.io/docs/.

Looking forward to your feedback!


This is really cool. Maybe in the future you can make a paid version with a bunch of BI features.

In your opinion, how does it compare to PyCharm (Enterprise version) when it's all blinged out with big data tools and integrations? I recently realized that PyCharm is my Data IDE and not just my Python editor. I only use limited features though, so hard for me to compare the extent of functionalities between the two.

Edit: Well, PyCharm won't let you join two different data sources, so that's one big difference!


> Edit: Well, PyCharm won't let you join two different data sources, so that's one big difference!

Right!

On the other hand, any real code IDE will have high-quality autocomplete, jump-to-definition, all that code IDE stuff. In the future DataStation may be able to hook into tree-sitter or LSP but for now it's more like a textarea with syntax highlighting (although the SQL code panel autocomplete is relatively complete).

Similarly, SQL IDEs have better exploration of your database. DataStation can't tell you about which tables or schemas exist yet (although I want it to in the future).

DataStation competes more directly with Python scripts than with SQL IDEs and code IDEs (although there is of course overlap).


It does look at bit like parts of Tableau's desktop product.


I haven't used Tableau but I have had some people show up in Discord to ask about using DataStation as an alternative. So maybe it is similar, but I don't know.


Overall, this looks great. My only concern the the project file being a SQLite db. I'd really like to have something to (usefully) put in version control.


I did the original version in a JSON backed file but I don't really want to go back to that.

It is not unreasonable to store the sqlite database in a git repo: https://stackoverflow.com/a/5435079/1507139.

I'm not yet sure what the right long-term solution is.


hi ya,

Interesting idea. I like the ability to pull multiple datasets together. The one thing they I am curious about is visualisation ... What graphing abilities doqs this app have?


The visualization is not advanced. It supports basic bar charts, line charts, pie charts and tables. I'd like to make this better over time. If you need more advanced visualization you can export any panel as CSV or JSON and bring it into whatever better visualization tool you have.

The biggest reason to use DataStation right now is that it makes it easy to query data and script the results.


Would be nice to have geographical charting abilities or maybe integrate some python charting libraries so the output of the library would just be the chart maybe? Just an idea.


Any reason for not having a web client?


You can run it as a web server! It's just not as commonly done right now since I haven't put much time into integration with cloud providers (stuff like CloudFormation templates I mean) and I don't yet have a public Docker image that is up to date.

https://datastation.multiprocess.io/docs/0.11.0/DataStation_...


Looks amazing.

Will try tomorrow. Athena alone is a superior offer in my mind. Even TablePus, my favourite SQL client doesn’t do that :)

If you can add dbt integration it will be a killer product!

Thank you!


Thanks for the kind words!

The only caveat I'll say is that it's definitely not as mature in general as SQL clients (stuff like table, column discovery and autocomplete does not exist yet). But it is pretty convenient to use DataStation if you like being able to easily switch into Python/JavaScript/whatever without needing to look up the docs for how to connect to and run a query against every database.

> If you can add dbt integration it will be a killer product!

I haven't used dbt and my impression was that it was a glue system for copying data from one place to another. But maybe that's not correct. Is it possible to query dbt data directly? Or how would you imagine it fitting into a DataStation flow. Thank you!


Looks very good! How well is the keyboard navigation support? It's one of the selling points for DB explorers like this for me.

Also, I'm a beginner to Go and I see the `GOOD_FIRST_PROJECTS.md`. Eager to contribute!

Thanks again.


> Looks very good! How well is the keyboard navigation support? It's one of the selling points for DB explorers like this for me.

DataStation isn't really a DB explorer. It helps you write scripts that connect to databases. In the future I'd love to make the exploration part better but right now it won't even tell you what tables exist or what columns exist.

It's easier to think of it as a replacement or enhancement of a Python script you might have to write to build a report.

> Also, I'm a beginner to Go and I see the `GOOD_FIRST_PROJECTS.md`. Eager to contribute!

Join the Discord #dev channel and say hi! Though you do need to understand the basics of Go. There are lots of good first issues as you see once you're comfortable with Go.


How does it compare to Apache Superset?


I think my answer about Redash basically applies to Superset as well (and any other primary-dashboard tool): https://hackernews.hn/item?id=31575860.


The UX reminded me of [PipeDream](https://pipedream.com/)

The industry around abstractions tools/ui on top DBs is growing. We use Retool very heavily and it does get pricy.

This is a very neat execution and has potential for SAAS or Cloud offering. Like "Bring your own DB" and build your own abstractions.


> This is a very neat execution and has potential for SAAS or Cloud offering. Like "Bring your own DB" and build your own abstractions.

Definitely my goal for the future is SaaS/Cloud where you can work on projects as a team and configure hosted dashboards, recurring exports and alerts out of panels you set up in a DataStation project.


How does it compare to Redash (now Databricks SQL): https://github.com/getredash/redash?


I haven't used it but just from looking at the Github page. It looks like redash has more advanced dashboarding features today (I'd like to catch up here). In contrast redash doesn't really allow you to manipulate data very much if it doesn't come in a form you want or if you can't get it into the right form with SQL alone.

DataStation allows you to script results of database queries (or loaded Parquet, Excel, CSV, etc. files or HTTP API responses) in Python, Node, R, Julia, etc.

Also, DataStation is first-off a desktop app today so it's very easy to install and use -- especially in a corporate environment. Data never leaves your laptop. In the future I think more people will use the server version of DataStation so you can get server features like recurring exports and hosted dashboards but desktop will always be supported too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: