The “Build Your Own Database” book is finished

brap · on April 22, 2023

While I’m not in the target audience for this, I skimmed through the first chapter and this seems really cool - great job! I will definitely keep this one in mind if it ever becomes more relevant to me.

One small comment: the /about page only has the books listed and nothing else. Some people will probably be interested in knowing a thing or two about the author before getting the books (and there are probably a million people with the same name, so difficult to Google)

vira28 · on April 22, 2023

+1 reg the author point. I would like to know whose book I am getting.

I understand that sometimes people want to keep their identity private but for an author it's tough.

pcthrowaway · on April 22, 2023

This is like wanting someone authoring a PR on github to have their about section filled out. Maybe it's nice for the reviewer to have, but some authors prefer to let the content to speak for itself. I think that's fair.

hotstickyballs · on April 22, 2023

For someone trying the learn new information (and thus lacking the ability to discern), its not possible to let the content speak for itself

Mike_12345 · on April 22, 2023

[flagged]

zamadatix · on April 23, 2023

Appeal to authority is about claiming nothing in the book can be wrong on the basis it came from a respected author. It's not about treating any author as equally good to learn from as expert's just because the content could happen to also be good if you're lucky.

sameerds · on April 23, 2023

I wonder how that works in other fields. Is it an appeal to authority if one buys things by reading reviews? Does everyone always afford the time or the money to try each thing themselves to decide this is the one they want? Should people never look at the manufacturer brand before making a purchase?

hansworst · on April 23, 2023

If you know the author is an expert in a certain field, you can use that as a heuristic for deciding whether reading the entire book is worth your time.

webmobdev · on April 23, 2023

How can the content "speak to me" when I am not familiar with a subject? If you want to learn and practice Islam, would you pick a book written by an expert Christian who prefers to denounce it?

cloudripper · on April 22, 2023

Past HN posts for author's other book, "Build Your Own Redis":

https://news.ycombinator.com/item?id=34557389 (Show HN - 5 comments)

https://news.ycombinator.com/item?id=34572263 (129 comments)

https://news.ycombinator.com/item?id=35212660 (65 comments)

1equalsequals1 · on April 22, 2023

https://github.com/codecrafters-io/build-your-own-x

intelVISA · on April 22, 2023

build-your-own is actually good unlike that link though..!

jrvarela56 · on April 22, 2023

The book recommends that website: https://build-your-own.org/database/

Why do you say the codecrafters stuff is lower quality?

cloudripper · on April 22, 2023

Can you provide some context? Curious what informs your thoughts on codecrafters' quality...

intelVISA · on April 22, 2023

Well, one's a decently comprehensive book and the other's SaaS aimed at embezzling your employer's learning budget - wider lang support but lower quality.

Maybe the problem is the format...

Vosporos · on April 22, 2023

haha yeah nicely put

aidos · on April 22, 2023

For a supplementary angle I’d highly recommend having a nose in both the Postgres docs and source. They’re both very readable.

I think the section on internals is a good jumping off point that leads to a lot of deeper content.

https://www.postgresql.org/docs/current/internals.html

jmartrican · on April 22, 2023

We have so many database products out there. But if this book leads to more of them, I'm all for it. I think databases, while old as IT itself, still has room to evolve, especially in the distributed realm, and especially in the multi-master configuration where I do not see many products being offered, as compared to one-writer configurations.

didgetmaster · on April 23, 2023

The book suggests building a database on top of a KV store. This is precisely what I did with my project. I was initially building a product that I hoped would be able to replace existing file systems and I needed an architecture that made it easy to create meta-data tags for every file and then find every file that had certain tags very quickly. I implemented the tags using a set of novel Key-Value stores that I invented.

Once I had it working, I realized the KV stores I used for tags were just like columns in a relational table built within a columnar database. Querying for files based off their tags was very much just like SQL queries for table rows. So I tried using them to create relational tables.

They turned out to be incredibly fast at a variety of queries (the bread and butter of databases) without needing to create separate indexes in order to get optimal performance. I thought database experts would be intrigued when I showed how much faster my system was than other conventional RDBMS setups on the same hardware. I guess I was surprised when almost no one was even curious how it did it.

Here is a simple, short video comparing it to SQLite: https://www.youtube.com/watch?v=Va5ZqfwQXWI

hobs · on April 23, 2023

I think the problem is that every relational query can be implemented a KV store, but usually the trade offs and bugs in your nascent query engine IS boring, and we can already use a bunch of hella fast KV stores out there if you dont care about ACID or you are willing to give up your decades long implementation details on SQL engine of choice.

layer8 · on April 22, 2023

I believe the aim of the book is more to promote a basic understanding of how relational databases work internally, by way of implementing a simple one oneself, an understanding which is generally helpful when using databases, and not so much to cause new database products to be created.

submerge · on April 22, 2023

As someone who lacks a formal CS education and wants to know more about how databases work, I have been eagerly awaiting this book. I also want some practical golang projects to work on so this is perfect! I'm so excited!

ikiris · on April 22, 2023

its kind of frustrating how this description of the book doesn't even mention what language is used in the examples.

layer8 · on April 22, 2023

I agree it could have been mentioned. Section 0.2, however, part of the short introduction page [0], provides the information:

The book uses Golang for sample code, but the topics are language agnostic. Readers are advised to code their own version of a database rather than just read the text.

[0] https://build-your-own.org/database/00a_overview

treeman79 · on April 22, 2023

Built college newspaper website back in ‘99 got tired of maintaining it by hand. Discovered PHP when it was new. Wanted to build a content management system. SQL sounded hard so I wrote my own database. Worked great for the years I was at the college.

vira28 · on April 22, 2023

Is the final solution available somewhere that I can build and run?

I see the code in sparse but wanted to get the feel end to end.

Kinrany · on April 23, 2023

It's a very short book, takes half an hour at most to read through if you skip implementation.

polskibus · on April 22, 2023

Woah great! I’d love an Andy Pavlo review of it ;)

sfc32 · on April 22, 2023

What language is the tutorial in?

efficax · on April 22, 2023

the book uses golang

freedomben · on April 22, 2023

Indeed. The Redis book was C/C++ so I was hopeful that this one would be as well. Given that database essentials are so closely tied to system calls, I would have hoped for a language that doesn't abstract them away. At least in the first fsync() call, the author does explicitly mention that `fp.Sync()` ends up invoking `fsync` system call, but as someone who has no intention of returning to Golang I'd rather not have to add complexity by requiring me to build and maintain a mental map of Golang calls to syscalls (the worst kind of abstraction layer IMHO: leaky and unnecesary).

macintux · on April 22, 2023

It’s interesting that it doesn’t seem to actually say that anywhere.

vira28 · on April 22, 2023

write on the introduction it says https://build-your-own.org/database/00a_overview

Or if you see any chapter you can literally see Go code.

macintux · on April 22, 2023

You're right, I missed the one place in the introduction where it mentions Golang.

Other than the "if err != nil" I wouldn't have recognized the code samples. Go's error handling is a big reason I've never taken a closer look.

bsaul · on April 22, 2023

never understood the issue with go error handling. As soon as you start dumping exceptions as a valid error forwarding mechanism ( which seems like a totally acceptable design decision), you end up manually having to check every call you make that can raise an error, on each line.

I don't really see any alternative. It also makes you carefully think about how you plan on managing errors in your codebase, which also seems like a very sane thing to enforce.

jeremyjh · on April 22, 2023

You don't have to trap exceptions separately on every line. You trap a whole block of code and match on the exception type to determine how to handle the problem. It isn't perfect, nothing is, but I prefer systems languages where typical failures cannot easily be ignored and yet you are not burdened with constantly thinking about it.

Mawr · on April 24, 2023

> I prefer systems languages where typical failures cannot easily be ignored and yet you are not burdened with constantly thinking about it

This is self-contradictory. In particular, the only way you can have reliable error handling is if you are forced to think about each possible failure.

I assume by "cannot easily be ignored" you mean the way exceptions blow up at runtime? I don't find that an acceptable default for any non-scripting language.

ikiris · on April 22, 2023

people who aren't used to actually handling their errors get annoyed at typing return err all the time.

They'd prefer an easier way to not bother dealing with them with them without outright ignoring them via _

alpaca128 · on April 23, 2023

I'd get annoyed too, because I know there are much simpler alternatives. Language design should encourage doing things right by making it more convenient than the shortcuts.

In Rust that entire check can be a single "?" symbol. How much syntactic sugar is too much is a matter of preference, but I personally think that properly handling all errors without syntactic sugar turns into an unreadable mess because there's just a lot of things which could go wrong.

bsaul · on April 23, 2023

almost 50% of the time i want to add some debugging info to the error before forwarding it one layer above, to provide more context.

I think just forwarding all low-level errors is a really bad habit, and go forces you to at least think about this.

alpaca128 · on April 23, 2023

> I think just forwarding all low-level errors is a really bad habit

Why exactly is that a bad habit? In almost all situations where I return an error I already have enough context, I'm just wondering what else I'd add to that.

> go forces you to at least think about this

Boilerplate code definitely doesn't incentivize thinking.

bsaul · on April 23, 2023

most errors you encounters are with i/o and are stupid "can't read, can't write or can't serialize".

In a network environment (which is originally what go was made for) you often need to add tracing information, business-level identifiers or processing information related to your state etc.

I'm currently writing a fairly complex api in go, and to be honest this really hasn't bothered me once.

Not to say it doesn't exists, but with time i've come very suspicious of people complaints over go. Most of the time those complaints come from people that didn't realize they missed an opportunity to have written a much much more elegant solution to their problem.

ikiris · on April 23, 2023

or usually just try again with backoff.

ikiris · on April 23, 2023

just returning the error isn't handling it.

alpaca128 · on April 24, 2023

I never said it is, but to handle an error you have to pass it to the caller that can actually do something with it, whether it's trying alternatives, repeating the step, just logging it, or whatever else - a lot of functions just need to pass it on a couple of times and making this verbose in every single case doesn't make sense to me.

macintux · on April 22, 2023

My favorite language is Erlang, which is about as far from Go as it gets from an error handling perspective.

throw10920 · on April 22, 2023

Interesting choice. I'd think that as the context is "As many of today’s (2023+) coders do not have a formal CS/SE education" and the goal is education, a more popular language like Javascript or Python (or, heck, even PHP[1] /s) would be used, instead.

I've taken a look at Go, and while it does seem pretty approachable, it's definitely not nearly as common as Python/JS, and it's always significantly harder for me to learn a new concept when the examples are also in a language I'm unfamiliar with. Maybe that's just me, though.

[1] https://survey.stackoverflow.co/2022/#most-popular-technolog...

jandrewrogers · on April 22, 2023

I think Go is a pretty good choice for the purpose. It balances high-level ease of use and learning curve with decent access to the system-y parts of coding that are so important to databases. What you learn to do in Go will translate reasonably well to a true systems language if the user wanted to take database engine design to the next level.

Languages like Python or Javascript are so far removed from the system-y side of programming that the way you would implement the concepts in those languages would not translate to the way you would actually build a "real" database which is I think the purpose of the book. I think the objective isn't to teach the abstract concepts but how those concepts are expressed in real systems.

iknownothow · on April 22, 2023

I'm a data engineer and I only know Python. It appears Golang hits a sweetspot on many metrics such as performance, parallelism, ease of use etc and since 2016 there's been a lot of new data products and tools written in Golang. So it makes sense to me that the book would use a popular language for the domain.

switchbak · on April 22, 2023

Databases really do push runtimes in such a way that I think it makes sense to urge folks to use a system level language, or something close to it. In particular it'd be hard to cover concurrency (and parallelism) properly using vanilla CPython or JS, and I think that would impinge on the lessons learned.

That said, it'd be an interesting read on how to make a DB in pure Python.

eatonphil · on April 22, 2023

I don't normally think of Node and CPython as in the same performance bracket.

Regardless, you can do some crazy things in Node. See these notes about Node and MySQL:

https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...

switchbak · on April 23, 2023

In this context I meant that the parallelism primitives are different than what you'll get in a systems level language, which might make teaching those parts unnecessarily awkward and probably the concepts less translatable to other runtimes.

For sure they've got different performance profiles.

Very impressive what your group was able to do with Node, and continuing on with Zig. It's got me interested in learning more.

vira28 · on April 22, 2023

you may be interested in this https://github.com/avinassh/py-caskdb

wjholden · on April 22, 2023

I could be wrong, but my perception is that Go is so opinionated that you'll either write idiomatic Go or use another language. So, from that perspective there's some goodness in using Go as a learner's systems language.

sk0g · on April 22, 2023

It's opinionated, but it's not _that_ opinionated IMO. You can write awkward Java, shiny C, or whatever idiomatic Go is. Most people I worked with were from .net/Java worlds, and learned just about enough Go to be able to subject others to their coffee bean ideologies.

There's somethings the compiler will fail on like unused variable and the likes, but for the most part you need added static analysis and style checking -- some of which ships with the Go compiler.

zacksiri · on April 22, 2023

This is something I’m going to be buying for sure. I saw the site also has a “Build your own Redis”.

I’ve always wanted to understand how databases work so I can build my own.

Excited about this!

zinclozenge · on April 22, 2023

I'd be curious, if somebody bought the book, to know what kind of concurrency they write about, as in just 2 phase locking, or if it talks about MVCC.

samsquire · on April 22, 2023

If you want some sample code to implement MVCC, I implemented MVCC in multithreaded Java as a toy example

https://github.com/samsquire/multiversion-concurrency-contro...

First read TransactionC.java then read MVCC.java ( or follow the methods that TransactionC calls)

temp12192021 · on April 22, 2023

Here's the relevant headers in the ToC.

11. Atomic Transactions

11.1 KV Transaction Interfaces

11.2 DB Transaction Interfaces

11.3 Implementing the KV Transaction

12. Concurrent Readers and Writers

12.1 The Readers-Writer Problem

12.2 Analysing the Implementation

12.3 Concurrent Transactions

  Part 1: Modify the KV type

  Part 2: Add the Read-Only Transaction Type

  Part 3: Add the Read-Write Transaction Type

 12.4 The Free List

 12.5 Closing Remarks

_a_a_a_ · on April 22, 2023

Good question. I've no idea but if I was the author I wouldn't bother with concurrency, I'd assume single user only and always; a database without concurrency is still perfectly good database so it's hardly making the book title a lie.

anonymousDan · on April 22, 2023

Have to disagree, concurrent transaction processing is super interesting.

_a_a_a_ · on April 23, 2023

Sure it's interesting! Maybe a bit much to load on beginners though? Maybe the author can say.

airstrike · on April 22, 2023

Can we get a "build your own spreadsheet" next?

emmanueloga_ · on April 22, 2023

"A spreadsheet in fewer than 30 lines of JavaScript, no library used"

https://news.ycombinator.com/item?id=6725387

Not a book, but. :-)

layer8 · on April 22, 2023

Turbo Pascal came with a public-domain sample spreadsheet implementation (CALC.PAS aka MicroCalc) since version 1.0 (from 1983, 40 years ago!). Here is the version from Turbo Pascal 3 on GitHub: https://github.com/hindermath/MicroCalc

superlopuh · on April 22, 2023

Spreadsheet Implementation Technology exists, although I cannot give a very strong recommendation

airstrike · on April 22, 2023

Thank you!

kagevf · on April 22, 2023

Here’s one in scheme: http://people.eecs.berkeley.edu/~bh/ssch24/spread.html

It’s ... part of a book! :)

bhawks · on April 22, 2023

Spreadsheets have been teaching the fundamentals of functional programming to the unwashed masses for decades.

Couldn't be a better story.

kagevf · on April 24, 2023

And now it’s come full circle with scheme in excel https://www.codemag.com/Article/2207071/The-Excellent-Scheme... ... it’s a commercial plugin, though

codr7 · on April 22, 2023

There are many kinds of databases, I've used similar designs to [0] (Lisp warning, ymmv) successfully in several projects.

[0] https://github.com/codr7/whirlog

pg314 · on April 22, 2023

Maybe I'm missing something, but after a cursory glance, I can't find any place where that library does an fsync() call. How does it handle durability?

bsaul · on April 22, 2023

your comment made me laugh. I never thought about entering a codebase by searching for fsync, but in the case of a DB that's probably the best place to start :))

codr7 · on April 22, 2023

True, it's missing a call to (finish-output), I'll take a closer look later.

camel_Snake · on April 22, 2023

It costs $10 extra for the code alongside the ebook? Is that a standard practice?

mkl95 · on April 22, 2023

Not sure if I would call it "standard", but it's a widespread practice among project-based books.

zvmaz · on April 22, 2023

Interesting. I wonder if the knowledge one could acquire in the book would easily translate to mainstream databases.

layer8 · on April 22, 2023

For relational databases it certainly does.

mkl95 · on April 22, 2023

How long is the ebook version?

didiao · on April 23, 2023

oh, I just remember mit 6.830, use java to build a mini database ,free.

amelius · on April 22, 2023

No coverage of distributed databases?

samstave · on April 22, 2023

SELECT * FROM BOOK WHERE CHAPTERS=READ and KNOWLEDGE = ATTAINED

mgaunard · on April 22, 2023

1. Memory map a file

2. Manage access to it through a server

There you go, you know have a database.

orf · on April 22, 2023

1. Draw some circles

2. Draw the rest of the owl

https://www.pillarsofimpact.com/post/draw-the-owl

jandrewrogers · on April 22, 2023

Memory mapping is a poor way to make a database[0], not all storage can be memory mapped, and this doesn't provide any of the functionality you expect of a database like indexing and concurrency control.

[0] https://db.cs.cmu.edu/mmap-cidr2022/

jfrisby · on April 23, 2023

I saw a headline about the book on Reddit and made a joke to a friend of mine: "How much you wanna bet it uses mmap?" Well...

It uses mmap.

mgaunard · on April 23, 2023

The concurrency control is managed by the server that manages access.

All databases that are any different than these two points are just bad.

remram · on April 22, 2023

> Manage access to it through a server

This has real "draw the rest of the owl" energy.

How to make a DBMS? 1. Get a file 2. Make a DBMS

mamcx · on April 22, 2023

> 1. Memory map a file

Complex when you want to

> 2. Manage access to it through a server

A database is only "easy" IF:

- Append only

- No real "delete" or "updates" just to reiterate the above.

- Only Sequential scan

- Only need simple iterator-per-row

- No maintain secondary stuff like indexes, so not need to coordinate changes

- No concurrency

- Fit in RAM, and I mean in few MB

- No need to deal with SQL, use his own DSL (sql is so bad! so much weird stuff!, but is ok to have something sql-ish like LINQ)

- No need to deal with recursive data types, only scalars

- Is only embebed

- No need auth or security validations

Ok, after making this list, I sure forgot some other tips to make this easy!

mgaunard · on April 23, 2023

There is no requirement to have the file fit in RAM to mmap it, that's the point.

Obviously you'd have a file per column (or index), and use directories to represent tables.

This is the correct way of doing it and yet so few databases do it.

preseinger · on May 3, 2023

"correct" is not an objective measure, it's a function of use case

mmap is not a panacea, it improves specific access patterns by incurring specific costs, it's definitely not true that mmap is the right choice for all databases

mamcx · on April 25, 2023

Yes, I understood, but "mmap a file" that should be a database is not "easy", because now you can deal with segfaults and other weird things that are out the normal playbook.

Of course, depending on the other things of the list this is or not a major issue. Is more about how combining several ideas leads to a easy or complex implementation.

icedchai · on April 22, 2023

I worked on a commercial product that did just this, except it also had a "transaction log" component. All access through the server was logged, and the log could be used for replay/recovery. While it worked for them for thousands of simultaneous clients, it was not a general purpose solution. There were no real indexes: the "indexes" (hash tables, really) were built in memory at startup.

akamaozu · on April 23, 2023

i don't understand what you mean by no real indexes

why does generating indexes at start-up not count as having indexes?

(asking because i do this all the time)

icedchai · on April 23, 2023

I suppose I should've said "persistent" indexes. There was also only a single type supported: hash table. And it was hard coded to 2 fields. You had to change the code (which was C) to change the database.

This system was rarely restarted, so in practice it didn't matter what it did at startup, as long as it didn't take more than a few minutes, but it did place some limitations on data size. (This was a 32-bit system and everything was memory mapped.)

mkl95 · on April 22, 2023

1. Build your own database

There you go!

LAC-Tech · on April 22, 2023

You seem like a real go-getter who can cut through red tape and find simple solutions to complex problems :D

mkl95 · on April 22, 2023

The power of abstraction!

akamaozu · on April 23, 2023

why is this so heavily down-voted?

it feels correct, even though it's not a complete guide

(reliably persisting changes to disk is a big part of what dbs do, but is missing here)

mmphosis · on April 22, 2023

Good to now.