You're pretty free in how you want to represent the database, because you're unlikely to ever want to query (quickly) based on the contents of the tile, so representation doesn't much matter. My first cut would be a 64x64 tile, indexed by x & y from an arbitrary 0, each of which contains a 64x64 = 4096 (4K) string representation of the contents of the tile. There will be lots of spaces, so my first cut at efficiency would be to simply do inline RLE and embed ASCII numbers in the stream that represent the number of spaces, ie, "4096" = empty tile, "2047a2048" represents a tile with one a in the middle, etc. (This preserves some human readability, where gzip does not, but you could use that too.) But frankly, at 4K+noise per tile, and no compelling need to ever refer to the entire table at once, even the stupid string representation will get you a long way before it bothers any serious DB server; that's enough to store about 250-ish million tiles in a single terabyte (a bit of fudge in there for overhead). (And the live set would be small enough to fit in very little ram so I'm not too worried about that.)
By the time that representation is your Biggest Problem, you'll be able to afford someone to do something more clever.
By the time that representation is your Biggest Problem, you'll be able to afford someone to do something more clever.