A Jupyter Kernel for SQLite

llimllib · on June 16, 2020

If you're interested in this, you might be interested in Datasette: https://datasette.readthedocs.io/en/stable/

Which seems to me to be farther along in providing advanced querying/faceting/visualization/sharing capabilities on top of sqlite.

(I love jupyter, and this kernel seems neat; not trying to throw stones at anybody, just to link a project in a similar domain)

3l3ktr4 · on June 16, 2020

None taken! This is a much more developed project indeed.

I think the SQLite kernel for Jupyter has a nice foreseeable future with graph visualizations for the query results and the whole integration with conda, mamba and pip environments, though.

antipaul · on June 16, 2020

Amazing share, my data productivity just went up significantly

I watched a video on that site from the creator, and after trivial installs of datasette and csvs-to-sqlite, I was exploring and facetting my data like never before.

I used to waste some much time generating static report summaries of my data, and now its just 2 steps away: - bundle csvs into a db - datasette it - win

seemslegit · on June 16, 2020

Having to rerun the query and reload the page just to change the sort order on a small dataset is very 1997

simonw · on June 16, 2020

You gotta re-run the query or you're limited to sorting just the visible results.

I guess I could let it spot when there are less than the page-size of results, but then I'd need to be confident that the JavaScript sorting algorithm exactly matches the underlying SQLite sorting algorithm - taking into account character sets and custom collations and suchlike.

I've been deliberately avoiding adding any JavaScript to core Datasette almost to make a point: you don't need it. HTML pages load faster than so-called "modern" SPA monstrosities. They work really well. They don't require a bunch of extra steps to avoid breaking the back button etc.

But... Datasette provides a JSON API for everything, and supports plugins. There's nothing to stop someone who really wants no-page-reload query execution from implementing it as a plugin.

I myself have built JavaScript plugins for things like bar charts and line charts ( https://github.com/simonw/datasette-vega ) and map visualizations ( https://github.com/simonw/datasette-cluster-map )

So yeah, my personal bias here is that building websites like we did in 1997 is deeply undervalued.

seemslegit · on June 16, 2020

The UI needs to be such so as to make the users understand that by clicking a column header they would be sorting the results of a previous query rather than re-running it with a different sort column and that the two are not generally interchangeable, likewise with instant in-results search.

At least with the server being on the internet and the example setup I disagree that the current way is faster or even near a modern webui

simonw · on June 16, 2020

Can you think of a way I could make that clear in the UI ("this will sort all of the data including the rows you can't see" v.s. "this will sort the rows that are visible to you right now")?

anitil · on June 17, 2020

I'd like to add, you're doing a great job of listening to this feedback, despite it's tone. I'm not sure I'd do the same, so well done. Of course, I love Datasette, so thank you for your work on it.

seemslegit · on June 16, 2020

I think this is the natural expectation of anyone familiar with spreadsheets, reporting tools or data grids with sortable column headers, but a message along the lines of

"Sorting results of the previous query, click 'Run SQL' button again to query with ${column_name} as the ORDER BY column"

and a "got it" hyperlink to be stored in your session state store of choice upon first click on a column header would probably make it clear.

Swannie · on June 17, 2020

I tend to disagree.

I find that data/UI tools that only sort the visible results, and not re-populate the paged view of the data, extremely counter-intuitive.

Even old libraries like ExtJS supported easy re-querying the server upon the column sorting event - though it has to be said, many lazy implementers didn't use this feature.

amasad · on June 16, 2020

This is very cool. SQL is a great interactive language, we added it to repl.it too: https://repl.it/languages/sqlite

3l3ktr4 · on June 16, 2020

Oh my god! That's so cool! Thanks for sharing it!

seemslegit · on June 16, 2020

Cool hack, but probably wouldn't install a separate kernel or run an entire notebook just for sqlite work.

What would be useful is for the kernel of your language of choice to provide a magic for sqlite and return results of queries in language-native data structures.

Something like this: https://pypi.org/project/ipython-sql/

thom · on June 16, 2020

org-babel in Emacs has some level of support for passing data between different languages in different cells etc.

https://orgmode.org/worg/org-contrib/babel/intro.html#meta-p...

TallGuyShort · on June 16, 2020

I wonder if this is closer to what you're suggesting, as it already allows sharing of simple data structures between cells of distinct languages: https://polynote.org/. One cell: SQLite kernel to get data. Next cell: operates on that data.

zmmmmm · on June 17, 2020

BeakerX [1] provides a kernel that does that.

[1] https://nbviewer.jupyter.org/github/twosigma/beakerx/blob/ma...

seemslegit · on June 16, 2020

Alternatively: a new type of cell in jupyter itself alongside 'code' and 'markdown' for sql work available regardless of kernel choice.

3l3ktr4 · on June 16, 2020

Yeah, I had this idea at first, but it's not trivial to run more than one kernel at the same time. I know some examples of people who did it, and some projects that I could plug in and try to make it work with my kernel, but this was a first version to see how the community responds! Let's see, might do something like it in the future. I think it'd be really cool and powerful to make it interact with Python for example.

cheez · on June 16, 2020

Yes this is a great first step. Would absolutely love a way to use this alongside my Python code.

jedimastert · on June 16, 2020

What's the distinction between "SQL" and "code"?

seemslegit · on June 16, 2020

'code' would be the code of whatever kernel is in use and 'sql' would a notebook-level feature like 'markdown' is today.

pletnes · on June 16, 2020

There’s already the %%bash cell magic, which does that for bash.

santafen · on June 16, 2020

I like this bit a lot. Might have to extend it for QuestDB. I did a Notebook for QuestDB earlier this week, but having this part would be great!

crazygringo · on June 16, 2020

Wow. It never even occurred to me that this was missing from Jupyter -- but in hindsight seems so obvious.

Congrats to the Jupyter team on this!

3l3ktr4 · on June 16, 2020

Thanks! <3

reallymental · on June 16, 2020

Incredible, but wasn't this kind of available though psycopg2 (postgres connector), SQLAlchemy or any other kind of database connection library ?

I realise the difference between the kernels altogether, but how is one better than the other?

jedimastert · on June 16, 2020

Weird as it might sound, I can imagine a scenario where I wouldn't need anything past SQL in some form. I've got plenty of python scripts sitting around my hard drive that are just fronts for data manipulation. Especially with the cvs import mode, I could see myself using just sqlite or whatever

tejtm · on June 16, 2020

PSA: SQLite is embedded as a standard builtin python library.

no pip install anything just import sqlite

https://docs.python.org/3.7/library/sqlite3.html

But I will be trying the Jupyter kernel in the article

dataminded · on June 16, 2020

Not really. You could use the python kernel and embed your SQL code within your python code but you couldn't run cells with just SQL. It made for a very poor analyst experience.

Microsoft got it right with Azure Data Studio.

yellowapple · on June 16, 2020

Does ADS support SQLite? I already use it for SQL Server, and I'd tried the PostgreSQL add-on but couldn't get it to work for some reason (but that was when it was first released so it might've stabilized since then).

dataminded · on June 16, 2020

No. I've seen success with Microsoft SQL and PostgreSQL.

I've also experienced instability issues, it needs that VS Code level polish.

mritchie712 · on June 16, 2020

Yes and when paired with Pandas `read_sql`[0] it's easy to quickly query, view and plot.

There's also pandasql[1] which is based on SQLite. We use this behind the scenes at SeekWell[2] to power cross database / cross source joins.

0 - https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

1 - https://pypi.org/project/pandasql/

2 - https://seekwell.io/

yellowapple · on June 16, 2020

Well, as someone who strongly dislikes Python and strongly likes SQLite, this might give me a reason to willingly use Jupyter (granted, I guess I could do it through Julia, too, which I also strongly like, but... eh).

A lot of the time when I need to work with a bunch of data I'll fire up the SQLite CLI, snarf the data with the built-in CSV import commands, and do whatever queries I wanna do. This looks like it can provide a nicer UI around that workflow, and I'm all for it.

Datasette looks interesting for this, too; I'll have to give that a look.

cheez · on June 16, 2020

I keep a SQLite buffer open in emacs, might switch to this given it's nice visualizations.

aghillo · on June 16, 2020

I like this. Previously I’ve used notebooks to explain a data pipeline from different perspectives. One notebook showing the ETL process going from raw data to RDF triples in a store; one SPARQL notebook showing the raw queries; and then a final decision support notebook using a Python binding to the underlying SPARQL query library. It seemed to work well.

CrazyCatDog · on June 17, 2020

This is awesome! Makes sql as accessible in the classroom as python—-we waste so much time installing SQLite and trouble shooting for students (b-schools). As soon as this is accessible in colaboratory (google), or similar, super light web interface and notebooks become trivial... THANK YOU!!

justinclift · on June 16, 2020

Interesting. Wonder if it'd be possible to embed it for visualisation in a Go web app?

Was adding basic online chart capabilities to our SQLite publishing website a few weeks ago (eg:

https://dbhub.io/vis/justinclift/Marine%20Litter%20Survey%20... )

But if people could do Jupyter notebooks and visualise them like this too, that could be useful.

Hmmm, should probably set up some kind of survey on our website to ask... :)

3l3ktr4 · on June 16, 2020

Hit me up on https://gitter.im/QuantStack/Lobby if you think I can be useful. I'm @marimeireles there and on Github too! :)

justinclift · on June 17, 2020

Thanks, will do. :)

gepoch · on June 16, 2020

Excellent! I usually just type up my SQL in triple quotes and execute it from python, which gives me a little bit of templating too. Excited to give this a try (and start using more views.)

Koshkin · on June 16, 2020

This looks nice. Now if only somebody wrote a Jupyter kernel for (power)shell.

mweatherill · on June 17, 2020

Have you seen Azure Data Studio? It uses the same code base as Visual Studio Code and has first class support for notebooks. It includes a PowerShell kernel

jonsequitur · on June 18, 2020

On the .NET team, we're working on a polyglot Jupyter kernel that includes C#, F#, and PowerShell support. We're collaborating with the Azure Data and PowerShell teams. It's in preview now. https://github.com/dotnet/interactive

Keyframe · on June 16, 2020

How's this different from let's say BeakerX or Zeppelin?

3l3ktr4 · on June 18, 2020

Zeppelin is not Jupyter. And I'm not sure how BeakerX works tbh, but it uses a whole docker image all the time to run right? So, might be very resource consuming, this kernel was implemented on C++. I honestly have no idea of how good is the tooling around SQL in BeakerX but might be a better alternative if you don't care about the docker image and doesn't actually need SQLite, that's different from SQL, which is what they're offering.