Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

Lots of confusion on what JSONB is.

To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.

The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.



TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.

Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.


Ah, https://sqlite.org/draft/jsonb.html says:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.


> TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be.

Definitely is not: Hipp states

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller)

Whereas in Postgres jsonb commonly takes 10~20% more space than json.


Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.


This is the first time that SQLite3 is getting JSONB. Idk if it's the same as PG's. TFA doesn't say. I assume they are the same or similar because I seriously doubt that D.R. Hipp is unaware of PG's JSONB, but then for that reason I am surprised that he didn't say in TFA.


https://sqlite.org/draft/jsonb.html references postgres:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.


https://sqlite.org/draft/jsonb.html says they're NOT the same.


> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.


In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.


> There is no json data type

Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.

SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...

Not sure where to give feature suggestions so I'm just leaving this here for future devs to find


SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.

https://www.sqlite.org/datatype3.html


> It would also be nice to be able to access subfields of the JSON.

Not that it's not _useful_ sometimes, but it amuses me that this is a huge violation of 1NF and people are often ok with it. It really depends on whether you're treating the JSON object as an atomic unit on its own, regardless of contents, or using the JSON to store more fine-grained information.

I guess the same argument can be made for XML data types, and everyone's OK with it too.


The principles of relational algebra aren't always compatible with real applications.

They are successful beyond anything that I can imagine people expecting when creating them. But they are not that complete silver bullet that solves every problem humanity will ever need solved.


Database design is, unfortunately, a lost art. Up through the early 2010's I remember having design reviews for database schemas, etc. That isn't "agile"... so you just fix it in the next sprint, as you explain to someone what a unique constraint is and why a table is now full of duplicates.


1NF is a theory construct that makes no accommodation for real-world performance. I'm shuddering even thinking how many tables and joins I would need to store some of these 3rd party JSON things I need to import & refine.


The H2 database uses almost that exact syntax:

  create table foo(my_json json);
  insert into foo values ('{"a":{"b":{"c":"d"}}}' FORMAT JSON);
  select (my_json)."a"."b"."c" from foo; -- where the () around the field is mandatory
https://h2database.com/html/grammar.html#field_reference


SQLite has a very few actual data types. There is a function that validates json and you can use it as a check on your `TEXT` or` `BLOB` column.

You can access subfields using `json_` and `jsonb_` functions.


Indeed. Those who use Postgres are already familiar with the difference.

Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.


There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

This may be desired - it makes two effectively-equal objects have the same representation at rest!

But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.


> There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

JSON never guarantees any ordering. From json.org:

> An object is an unordered set of name/value pairs.

If you depend on a given implementation doing so, you're depending on a quirk of that implementation, not on a JSON-defined behavior.


Using a function to query a fact about a string does not imply that the string should change, regardless of what guarantees are present within the format that the string is in. If you expect to get out the same string you put in, then there's nothing implementation-quirk about that.


> Using a function to query a fact about a string does not imply that the string should change, ... then there's nothing implementation-quirk about that.

Fair enough when considering only the string value. My ill-expressed point was more about JSON as a data exchange format - it may pass through any number of JSON parser/storage implementations, any of which may use an arbitrary order for the keys.


You're not storing a string, you're storing a JSON object. If you want to store a string use TEXT or VARCHAR.


That is what this post chain is about. The nuance between choosing to store your JSON as a string in a TEXT column or as JSONB in a BLOB column.


What about SQLite JSONB? Does it maintain order?

UPDATE: I think SQLite JSONB does maintain order. For example:

    select json(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'));"
    -- {"a":1,"b":2,"c":3,"d":4}
And it does maintain order when adding new keys:

    select json(jsonb_insert(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'), '$.e', 99));
    -- {"a":1,"b":2,"c":3,"d":4,"e":99}
(tested on https://codapi.org/sqlite/)


JSON processors are not required nor expected to retain object key ordering from either input or object construction order.


The question is whether people think of their database as a processor of data or a storage location for data, exactly as it was provided. Given that a lot of people use SQLite as the latter, it's a worthwhile caveat to make people aware of.

Additionally, Javascript environments (formalized in ES2015), your text editor, and your filesystem all guarantee that they'll preserve object key order when JSON is evaluated. It's not unreasonable that someone would expect this of their database!


If you're storing it exactly as provided then surely until you know it's actually valid json you want a string, not a json/jsonb column.


> The question is whether people think of their database as a processor of data or a storage location for data

If you use the database's JSON type(s) and/or functions, then yes, your database is a JSON processor.

And, yes, your database is not a dumb store, not if it's an RDBMS. The whole point of it is that it's not a dumb store.


> Javascript environments ... guarantee that they'll preserve object key order when JSON is evaluated

Yes, but only for non-integer keys.


Is there such a thing as a non-string JSON key to be evaluated? https://www.json.org/ suggests not.


Yes, in the context of JSON handling by JavaScript. I meant string keys that can be interpreted as non-negative numbers. Those are traversed first, the rest of the keys are coming in insertion order. For example:

    > JSON.stringify({'one': 1, '3': 3, 'two': 2, '2': 2, 'three': 3, '1': 1})
    '{"1":1,"2":2,"3":3,"one":1,"two":2,"three":3}'


Thanks for explaining. That behaviour is... interesting!


That compounds really badly with another lack of specificity in JSON: parsers can treat duplicate keys in arbitrary ways, e.g. keep only the first or last value, combine values somehow, error out, etc.

When keys are not unique, their order can matter again, and can matter differently to different parsers.


Yes, but it's not impossible to achieve it in practice.

For example, JavaScript JSON.stringify and JSON.parse have well-defined behaviour:

> Properties are visited using the same algorithm as Object.keys(), which has a well-defined order and is stable across implementations

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

> The traversal order, as of modern ECMAScript specification, is well-defined and consistent across implementations. Within each component of the prototype chain, all non-negative integer keys (those that can be array indices) will be traversed first in ascending order by value, then other string keys in ascending chronological order of property creation.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Similarly, Python json module guarantees that "encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered." Since 3.7 dict maintains insertion order.

https://docs.python.org/3/library/json.html

https://docs.python.org/3/library/stdtypes.html#dict

Yes, JS and Python behaviour are not the same for all cases, however, non-integer keys do maintain order across Python and JavaScript.


> Yes, but it's not impossible to achieve it in practice.

For any one implementation. But there's a very large number of implementations. You just can't count on object key order being preserved, so don't.


There are situations where keeping order is useful. For example, human-editable json config files.

I also gave an example of two implementations that are compatible for a useful subset of keys. By the way, SQLite JSONB keeps object keys in insertion order, similar to Python: https://hackernews.hn/item?id=38547254


> There are situations where keeping order is useful. For example, human-editable json config files.

You might have to just normalize every time you want to edit that.


You can count on object key order being preserved when you control the implementation. Which is why it's useful to know whether a given implementation preserves key order.


Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.

So use it for writes:

    UPDATE t SET col = jsonb_*(?)
Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).

But if returning values you probably want to use `json_*` functions

    SELECT json_extract(col1, '$.myProp') FROM t
Otherwise you will end up receiving the binary format.


Wait is it really true that if you use SQLite's JSONB format, `select my_json_column from foo` becomes unreadable to humans? That seems.. unacceptable. One would expect it to convert its internal format back to JSON for consumption.


Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?


There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.


Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.


No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.


That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.


Yes, but when you use the BLOB with jsonb functions, your application demands go from:

Read JSON from TEXT column.

Parse JSON into Internal Binary Format.

Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.

To:

Read JSONB from BLOB column.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.

Because:

The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.


Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.


> Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output

I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))

Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.

(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is

“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”

so it can’t be completely dumb.


There’s no point using json_* functions if you’re always reading/writing the full blob.


Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.


It doesnt validate that its valid json?


No, it doesn't. By default in sqlite3, you can store whatever you like in whatever field you want.

The "built-in validation mechanism" is invoking json_valid(X) [1] call within the CHECK [2] condition on a column.

[1] https://www.sqlite.org/json1.html#jvalid

[2] also assumes you didn't disable CHECKs with PRAGMA ignore_check_constraints https://www.sqlite.org/pragma.html#pragma_ignore_check_const...


If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.


JSON is unordered. Nothing in your code should assume otherwise.

"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."


That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.


I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement


I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys


> but in JS the order is actually defined

Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:

> in ascending chronological order of property creation

Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.


Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)


> other languages use hash maps simply because authors were either lazy or unaware

Because it is idiomatic in that language, and you often don't need the higher overhead of tracking insertion order.

Before Python 3.7, json.loads used dict (undefined order) and you needed to explicitly need to override the load calls with the kwarg `object_pairs_hook=collections.OrderedDict` to accept ordered dictionaries.

Since Python 3.7 all `dict`s are effectively `collections.OrderedDict` because people now expect this kind of inefficient default behavior everywhere. ¯\_(ツ)_/¯


The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.


That might get them to rely on the randomization, though :)


You make it sound like it's one of the laws of physics.

ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.


> It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)

And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.

Concretely that means if you write:

    Object.entries(JSON.parse('{"a":10, "1":20, "b":30, "a":40}'))
This is guaranteed to evaluate to:

    [['1', 20], ['a', 40], ['b', 30]]
(Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)

Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.


All the keys in JSON are strings. I do not understand why string literal containing number would be moved first. Maybe in JS-object in runtimes.


> You make it sound like it's one of the laws of physics.

The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.


That does not follow.

JSON formatted data remains JSON no matter if you use it incorrectly.

This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.

You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.

In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).


It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:

> The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.


Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...


According to the JSON spec it's also legal to round all numbers to zero.


If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.


I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.


I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it


SQLite JSONB does maintain order: https://hackernews.hn/item?id=38547254


PG's JSONB is compressible yet also indexed (objects' key/value pairs are sorted on key and serialized array-like; arrays are indexed by integers; therefore binary searching large objects works). The history of PG's JSONB type is very interesting.


According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.


I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.

  Jsonb_extract(jsonb(foo), '€')
likely

- will remove/change white space in ‘foo’,

- does not guarantee to keep field order

- will fail if a dictionary in the ‘foo’ string has duplicate fields

- will fail if ‘foo’ doesn’t contain a valid json string


Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.

There'll also be a conversion cost if you ultimately want it back in JSON form.


From the post:

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.


> Your application will never see the raw JSONB content.

That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.


And you can just read the BLOB column values out of the DB.


> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.


If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?

It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?


As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.

There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).


The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.


`json_` functions return json, `jsonb_` functions return jsonb. Both take either as input.

If you're modifying something "in-place" then `jsonb_` functions would be better since they avoid conversion.


I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.


What would that SQL look like in practice ?


Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.

If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.

    create or replace view bme680_v(ts, message, temperature, humidity, pressure, gas, iaq) as
    SELECT mqtt_raw.created_at                                          AS ts
     , mqtt_raw.message
     , (mqtt_raw.message::jsonb ->> 'temperature'::text)::numeric    AS temperature
     , (mqtt_raw.message::jsonb ->> 'humidity'::text)::numeric       AS humidity
     , (mqtt_raw.message::jsonb ->> 'pressure'::text)::numeric       AS pressure
     , (mqtt_raw.message::jsonb ->> 'gas_resistance'::text)::numeric AS gas
     , (mqtt_raw.message::jsonb ->> 'IAQ'::text)::numeric            AS iaq
    FROM mqtt_raw
    WHERE mqtt_raw.topic::text = 'pi/bme680'::text
    ORDER BY mqtt_raw.created_at DESC;


You can do this with text JSON in SQLite as well, but JSONB could speed it up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: