YouTubeDrive: Store files as YouTube videos

dzhang314 · on May 25, 2022

Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.

The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.

Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)

dzhang314 · on May 25, 2022

One more thing: the choice of Wolfram Mathematica as an implementation language was a deliberate decision on my part. Not for any technical reason -- YouTubeDrive doesn't use any of Mathematica's symbolic math capabilities -- but because I didn't want YouTubeDrive to be too easy for anybody on the internet to download and use, lest I attract unwanted attention from Google. In the eyes of my paranoid freshman self, the fact that YouTubeDrive is somewhat obtuse to install was a feature, not a bug.

So, feel free to have a look and have a laugh, but don't try to use YouTubeDrive for any serious purpose! This encoding scheme is so horrendously inefficient (on the order of 99% overhead) that the effective bandwidth to and from YouTube is something like one megabyte per minute.

femto113 · on May 25, 2022

As far back as the late 1970s a surprisingly similar scheme was used to record digital audio to analog video tape. It mostly looks like kind of stripey static, but there was a clear correlation between what happened musically and what happened visually, so in college (late 1980s) one of my friends came into one of these and we'd keep it on the TV while listening to whole albums. We had a simultaneous epiphany about the encoding scheme during a Jethro Tull flute solo, when the static suddenly became just a few large squares.

Can see one in action here

https://www.youtube.com/watch?v=TSpS_DiijxQ

freedomben · on May 25, 2022

Nice thanks, this answered my biggest question, which was "will it survive compression/re-encoding." (yes it will). Very cool idea!

ArrayBoundCheck · on May 25, 2022

Do you have any idea how many more bits you'd be able to use if you applied any of the encoding transformations?

dzhang314 · on May 25, 2022

I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.

PragmaticPulp · on May 25, 2022

> that we can take the squares down to 10x10, maybe even 8x8 or 5x5

16x16, 8x8, or 4x4 would be the way to go. You'd want each RGB block to map to a single H.264 macroblock.

Using non order of 2 numbers means that individual blocks don't line up with macroblocks. Having a single macroblock represent 1, 4, or 16 RGB pixels would be ideal.

In fact, I bet modifying the original code to use a scaling factor of 16 instead of 20 would produce some significant improvements.

dubiousconst281 · on May 25, 2022

There's also the chroma subsampling issue. With the standard 4:2:0 ratios, you'll get half the resolution for the two chroma channels, and if I'm not mistaken, they are more aggressively quantized.

It would be better to use YUV/YCbCr directly instead of RGB.

ArrayBoundCheck · on May 25, 2022

I'm not sure if your examples are sticking to 0 or 255 RGB. If it is you might get a win by using HSL to pick your colors. If you change the lightness dramatically every frame maybe colors won't bleed across a frame. Then perhaps you can encode 2+ bits in hue and another 2+ in saturation getting a win and another minor one using 1+ bit on brightness (ie first frame can be 0 or 25%, next frame is 75% or 100%. I'm not too familiar with encodings though and how much it'd interfere with the other transforms

woobilicious · on May 27, 2022

YouTube's 1080p60 is already at a decimation ratio of about 200:1, then you have to consider how efficient P and B frames are with motion/differences. if your data looks like noise you're gonna be completely screwed since the P and B frames will absolutely destroy the quality.

There's a bunch of other things too, like YUV420p and TV colour range: 16-235, so you only get 7.7bits / pixel.

If anything you would want to encode your data in some way that abuses the P and B frames, and the macro block size of 16x16.

Coding theory for the data output at your end is only one side of the coin, the VP9 codec stupidly good compression is a completely different game to wrangle.

And I kinda doubt you'll get much better than your estimate of 1% from the original scheme.

https://www.youtube.com/watch?v=r6Rp-uo6HmI

lb1lf · on May 24, 2022

-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.

It worked a charm.

Second round? A year later, when the archive was still available from umpteen hosts.

For all I know, it still languishes on who knows how many old hard drives...

marginalia_nu · on May 24, 2022

Poor guys, still looking for the right codec to play the britney tape they downloaded 28 years ago.

iso1210 · on May 24, 2022

Disturbing, she'd have been 12 at that time.

marginalia_nu · on May 25, 2022

Hm, I thought she was popular in the mid '90s, but maybe it was more like the '00s?

freedomben · on May 25, 2022

No I'm pretty sure you're right on the timing. I was a teenager in the mid to late 90s and britney tapes were extremely common then, to the point that it was often used as a joke (much like in your story!)

danachow · on May 25, 2022

WinMX was release 21 years ago, and Britney Spears definitely didn’t break out until around 1999 around the same time as Napster. The difference between 1994 and 1999 is quite a bit in show biz/pop culture and an absolutely huge difference in public uptake of the internet.

hnbad · on May 25, 2022

That's right, 1999 was the release of her first album (Baby One More Time). I think the reason some feel like it must have been the mid-90s is that she went from that to the first major scandals (marriage, divorce and a shift in her musical style) as well as her first Greatest Hits album by 2004.

Her next album after that (along with the head shaving incident and being placed under the custody of her manager-dad) was 2007, so most people's memories of her as a "sexy teen idol" are likely from her 1999-2003 period, which in retrospect probably felt a lot longer, especially with the overlap of other young women pop stars in the same period (e.g. Christina Aguilera started in 1998).

vanattab · on May 25, 2022

I think she also did some kind of Disney kids singing and talent show before that right?

hnbad · on May 30, 2022

The Mickey Mouse Club, yes. Alongside Justin Timberlake, Christina Aguilera and others.

freedomben · on May 25, 2022

For the record, neither I nor the person I replied to said 1994 (unless they edited their post). I was thinking 97 or 98, so was still off, but at this point in my life being off by a year or two feels pretty damn close ;-)

danachow · on May 28, 2022

The person you replied to said in the prior post: “ to play the britney tape they downloaded 28 years ago.”

backspace_ · on May 24, 2022

Which brittany are they referring to?

idonotknowwhy · on May 24, 2022

Britney Spears

f0e4c2f7 · on May 25, 2022

Your story reminds me of a Linus quote.

"Real men don’t use backups, they post their stuff on a public ftp server and let the rest of the world make copies." -Linus Torvalds

S-E-P · on May 25, 2022

You devil! I'm pretty sure I remember running into a file that looked like that and a quick poke around showed it wasn't anything valid.

Funny how these things work since I'm pretty sure I remember running into it around 2008 (i'm a few years younger).

I think i just deleted it though since I was suspicious of most strange files back then; I was the nerd who didn't have friends so i used to troll forums for anything i could get my hands on.

bradwood · on May 25, 2022

"running into it"... Yeah. Right. ;)

capableweb · on May 25, 2022

Not sure how WinMX works, but back in the old DC++ days, you not only searched for things but could navigate directory structures directly from users connected to the same hub. It was uncommon for me to browse through some of the biggest/interesting users and see what they were sharing.

By doing that, I'm sure I stumbled upon my fair share of sketchy stuff unintentionally, so it's not hard to imagine the same for others :)

onemiketwelve · on May 28, 2022

Oops my finger slipped on the download button and I accidentally waited 2 days to let it finish. Clumsy me

jjice · on May 24, 2022

That's a perfect college CS story. Beer and bastardized files - what a combo!

jimmygrapes · on May 25, 2022

ah hell, you're the one who made my computer crash trying to open that and make me panic? damn you man

pingtickle · on May 24, 2022

Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.

414techie · on May 25, 2022

That is really interesting. I wonder if there were any other interesting uses of paid programming to solve problems like these around that time?

anyfoo · on May 25, 2022

Videonics DirectED, a video editing system, conveniently loaded its software from... VHS tape. Here are some details: https://twitter.com/foone/status/1325945997160165376 and apparently you can still buy it well-preserved as new old stock, complete with the VHS software tape: https://www.ebay.com/itm/124380109086

A little less crazy and more straightforward (software on audio tape was super common after all): Radio stations and vinyl discs that transmitted programs to the microcomputers of the time (C64, TRS-80 etc.) have quite a long tradition. Some examples:

http://www.trs-80.org/basic-over-shortwave/ https://www.youtube.com/watch?v=6_CZpFqvDQo&t=2s

hnlmorg · on May 25, 2022

Not paid for programming but essentially the same tech: VHS games used to encode data in exotic ways so that the content was both viewable on regular TVs with a regular VHS player, but also had some kind of playable content.

https://youtu.be/WI133HNGNfk

mwcremer · on May 25, 2022

Not quite paid programming, but Scientific Atlanta had a Broadcast File System that would send data to set top boxes over coax QAM channels used for digital TV. It would loop through all the content on the "carousel" repeatedly so all the boxes connected to that head end would eventually see the updates.

alphabet9000 · on May 25, 2022

i made something like this for live streaming encrypted audio/video, but for the web, if you are interested: http://pitahaya.jollo.org

woobilicious · on May 27, 2022

If I was to gamble I would say that Analog TV can store more data, compression algorithms usually work at say 1:200 compression ratio, they're extremely destructive, a raw 1080p60 in yuv420p is about 187MB/s, on the other hand a decent equivalent video on YouTube is about 1MB/s

8K832d7tNmiQ · on May 24, 2022

I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)

marginalia_nu · on May 24, 2022

This is a tangent. I must have been maybe 15-16 at the time, so somewhere around 20 years ago: One of the first pieces of software I remember building was a POP3 server that served files, that you could download using an email client where they would show up as attachments.

Incredibly bizarre idea. I'm not sure who I thought would benefit from this. I guess I got swept up in RFC1939 and needed to build... something.

babanin · on May 24, 2022

On my first job (in the beginning of the millennium) there was a limit on files you could download, something around 5Mb. If you wanted to download something bigger, you had to ask sysadmins to do that and wait... That was really annoying. So I and my colleague end up writing a service, that could download a file to local storage and chop it into multiple 5Mb attachments and send multiple emails to requestor.

After some time the limit on single file was removed, but daily limit was set up to 100Mb. The trick is that POP3 traffic wasn't accountable, so we continued to use our "service".

Denatonium · on May 25, 2022

That sounds suspiciously similar to how I used to download large files on a shared 2GB/month data plan. My carrier didn't count incoming MMS messages towards the quota, and conveniently didn't re-encode images sent to their subscribers via their email-to-MMS gateway. So naturally, I'd SSH into my server, download what I wanted to download, and run the bash script I wrote, which split the downloaded file into MMS-sized chunks, and prepended a 1x1 PNG image to them, and then sent them sequentially through my carrier's gateway. This worked surprisingly well, and I had a script on my phone which would extract the original file from the sequence of "photos". It may still work, but I've since gotten a less restrictive data plan.

hiq · on May 24, 2022

I couldn't download .exe files at some $CORPORATION. They had to be whitelisted or something, and the download just wouldn't work otherwise. But once you had the .exe you could run it just fine. You just had to ping some IT person to be able to retrieve your .exe.

Of course it was still possible to browse the internet and visualize arbitrary text, so splitting the .exe into base64-encoded chunks and uploading them on GitHub from another computer was working perfectly fine... I briefly argued against these measures, given how unlikely they are to prevent any kind of threat, but they're probably still in place.

semi-extrinsic · on May 25, 2022

We still cannot email each other .py files where I work. But .py.txt is of course fine...

behnamoh · on May 24, 2022

apparently e-mail is not much reliable for storing/keeping files. there have been cases where an old email with an attachment would not load correctly because the servers just erased the attachment file.

marginalia_nu · on May 24, 2022

This was a custom email server though, there never were any emails, it just presented files as though they were so that a client would download them.

Actually caused some problems for email clients, as they usually assumed emails were small. I got a few of them to crash with 200 Mb "attachments" (although this was in the early 00s, 200Mb was bigger than it is today).

qorrect · on May 24, 2022

I'm still confused on how this worked, did you email some address and get a reply with the attachment ?

mjochim · on May 24, 2022

Since GP says it was a POP3 server, I suppose you would set up an email account in your client with its inbox server pointing to that POP3 server. When the client requests the content of the inbox, the server responds with a list of "emails" that are really just files with some email header slapped on; so your email client's inbox window essentially becomes a file browser.

marginalia_nu · on May 24, 2022

Yeah, that's basically it.

Gigachad · on May 24, 2022

Interestingly, if you take a look at your emails from a few years ago, most of the non attached images will fail to load now.

ranger_danger · on May 25, 2022

They also experimented with encoding videos and arbitrary files into different kinds of single (still) image formats, some of them able to be uploaded to the same 4chan thread itself, with instructions on how to decode/play it back. Examples:

https://dpaste.com/HFTKAPM5V

https://github.com/fangfufu/Converting-Arbitrary-Data-To-Vid...

https://github.com/rekcuFniarB/file2png

https://github.com/nzimm/png-stego

https://github.com/dhilst/pngencoder

https://github.com/EtherDream/web2img

anyfoo · on May 24, 2022

I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.

softfalcon · on May 24, 2022

Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.

For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.

Handling this compression protection for so many different codecs is likely not feasible.

anyfoo · on May 24, 2022

Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.

But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.

But it's indeed cool to think about. 8)

rasguanabana · on May 24, 2022

How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.

patentatt · on May 25, 2022

Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.

kortilla · on May 25, 2022

Keep in mind that they can very well predict which videos will have any meaningful amount of views <99% and just encode on demand the ones that won’t.

patentatt · on May 25, 2022

I didn't know that, and am glad that they do transcode on the fly when appropriate. Very impressive that it's seamless to the end user, I've never sat waiting for a YouTube clip to play even when it's definitely something from the 'back catalog' like a decade old video with a dozen views.

copperx · on May 25, 2022

I'm not sure if it's a waste, as the video needs to be reencoded somewhere.

ALittleLight · on May 24, 2022

What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.

To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.

To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.

rasguanabana · on May 24, 2022

Why go that way. I’m no digital signal processing expert, but images (and series thereof, i.e videos) are 2D signals. What we see is spatial domain and analyzing pixel by pixel is naive and won’t get you very far.

What you need is going to frequency domain. From my own experiment in university times most significant image info lays in lowest frequencies. Cutting off frequencies higher than 10% of lowest leaves very comprehensible image with only wavey artifacts around objects. You have plenty of bandwidth to use even if you want to embed info in existing media.

Now here you have full bandwidth to use. Start with frequency domain, set expectations of lowest bandwidth you’ll allow and set the coefficients of harmonic components. Convert to spatial domain, upscale and you got your video to upload. This should leave you with data encoded in a way that should survive compression and resizing. You’ll just need to allow some room for that.

You could slap error correction codes on top.

If you think about it, you should consider video as - say - copper wire or radio. We’ve come quite far transmitting over these media without ML.

anyfoo · on May 24, 2022

We started with that approach, by assuming that the compression is wavelet based, and then purposefully generating wavelets that we know survive the compression process.

For the sake of this discussion, wavelets are pretty much exactly that: A bunch of frequencies where the "least important" (according to the algorithm) are cut out.

But that's pretty cool, seems like you've re-invented JPEG without knowing it, so your understanding is solid!

anyfoo · on May 24, 2022

That's essentially a variant of "bigger pixels". Just like them, your algorithm cannot guarantee that an unknown codec will still make the whole thing perform adequately.

Even if you train your model to work best for all existing codecs (I assume that's the "ML" part of the ML model), the no free lunch theorem pretty much tells us that it can't always perform well for codecs it does not know about.

(And so does entropy. Reducing to absurd levels, if your codec results in only one pixel and the only color that pixel can have is blue, then you'll only be able to encode any information in the length of the video itself.)

ALittleLight · on May 25, 2022

It's not guaranteed to perform well with unknown or new codecs - true. But, the implicit assumption is that YouTube will use codecs that preserve what videos look like - not just random codecs. If that assumption holds then the image recognition model will keep working even with new codecs.

anyfoo · on May 25, 2022

That's the thing though, "looks like" breaks down pretty quickly with things that aren't real images. It even breaks down pretty quickly with things that are real, but maybe not so common, images: https://www.youtube.com/watch?v=r6Rp-uo6HmI

So one question would be: Does your image generation approach preserve a higher information density than big enough pixels?

ALittleLight · on May 25, 2022

Why would you assume that the images in my algorithm aren't real images? For example, you could use 256 categories from imagenet as your keys. Image of a dog is 00000000, tree is 00000001, car 00000010, and so on.

anyfoo · on May 25, 2022

I'm not assuming they are not real images, I'm questioning whether to get to any information density that even out-performs "big enough colorful pixels", you might get into territories where the lossy compression compresses away what you need to unambiguously discriminate between two symbols.

And to get to that level of density I do wonder what kind of detailed "real image" it would still be.

If your algorithm were for example literally the example you just noted, then the "big enough colorful pixels" from the example video already massively outperform your algorithm. Of course it won't be exactly that, but you have the assumption that the video compression algorithm somehow applies the same meaning of "looks like" in its preservation efforts that your machine learning algorithm does, down to levels where differences become so minute that they exceed what you can do with colorful pixels that are just big enough to pass the compression unscathed, maybe with some additional error correction (i.e. exactly what a very high density QR code would do).

bambax · on May 24, 2022

Or, film pieces of paper in succession, in a clear enough manner that they're still readable even when heavily compressed.

ben174 · on May 24, 2022

OH, i get it :)

NonNefarious · on May 24, 2022

Back in the day, VCRs were commonly used as tape backup devices for data.

Now studios are using motion-picture film to store data, since it's known to be stable for a century or more.

colejohnson66 · on May 24, 2022

YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.

[a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source

mod50ack · on May 24, 2022

They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.

reassembled · on May 25, 2022

YouTube doesn’t serve ProRes though. What if you try uploading an h264 video and re-download?

dheera · on May 24, 2022

YT might still recompress your video, possibly using proprietary algorithms that are not necessarily DCT based

anyfoo · on May 24, 2022

As said, falls apart with re-encoding. But is a bit more interesting than what is more or less QR codes.

jrochkind1 · on May 24, 2022

I find it a bit more interesting to have something that actually works on youtube, even if only as a proof of concept.

metadat · on May 24, 2022

Could youtube-dlp and YouTube Vanced now be hosted on.. YouTube?

I wonder how long it'd take for Google to crack down on the system abuse.

Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

cush · on May 24, 2022

It's one of those problems that resolves itself.

The process of creating and using the files is prohibitively unusable and so many better solutions exist that YT doesn't need to worry about it

sevenf0ur · on May 24, 2022

Probably breaks TOS under video spam

throwaway92394 · on May 24, 2022

Just gotta add some good 'ol steganography

javajosh · on May 24, 2022

This brings up an interesting question: what is the upper-bound of hidden data density using video steganography? E.g. how much extra data can you add before noticeable degradation? It's interesting because it requires both a detailed understanding of video encoding and also understanding of human perception of video.

samatman · on May 24, 2022

I've seen drone metal videos where the video and audio could both be 90% steganography and I wouldn't know the difference.

pbhjpbhj · on May 24, 2022

I'd expect you could store more data steganographically than the raw video data.

You can probably do things like add frames that can't be decoded and so are skipped by a decoder; that effectively allows arbitrary added hidden data. That's maybe cheating.

If you stipulate that you can't already have a copy of the unaltered file, and the data has to be extractable from a pixel copy of the rendered frames ... that becomes more interesting, I think.

paranoidrobot · on May 24, 2022

Youtube doesn't give you the raw video back, it does transcoding to their given standard bitrates/resolution sets.

You'll notice this if someone has just uploaded a video to Youtube and the only version available for playback is some 360p/480p version for a few hours until Youtube gets around to processing higher bitrates.

So whatever you're encoding has to survive that transcode process.

RugnirViking · on May 25, 2022

A pretty massive amount I imagine. I attended a lecture on single image steganography and they were able to store almost 25% of the image's size and it was barely visible. Even 50% didn't look too bad.

Extending that into video files and it would likely be pretty massive, although you'd have some interesting time with youtube's compression algorithms

alpaca128 · on May 24, 2022

Good luck preserving it through YouTube's video compression. It's super lossy with small details, in bad cases the quality can visibly degrade to a point it looks more like a corrupted low-res video file for a few seconds (saw that once in a Tetris Effect gameplay video).

throwaway92394 · on May 24, 2022

I mentioned it in another comment, but while that does lower the bandwidth of a single frame, its not actually an issue. There's several DRM techniques that can survive a crappy camera recording in a theater.

"compression resistant watermark" turns up some good resources for it. QR codes are another good example of noise tolerant data transmission (fun fact - having logos in a QR code isn't part of the spec, you're literally covering the QR code but the error-correction can handle it).

The best way I can describe it is that humans can still read text in compressed videos. The worse the compression/noise the larger the text needs to be, but we can still read it.

tenebrisalietum · on May 24, 2022

Add a music track, it is now a psychedelic art video.

squarefoot · on May 24, 2022

A music track in which the music happens to be FSK data disguised as chiptune.

ranger_danger · on May 25, 2022

Then how is Roel Van de Paar allowed to be on youtube?

bliteben · on May 24, 2022

yeah wonder how long until the ban, also bans all of your descendants for 10 generations?

robonerd · on May 24, 2022

If you put youtube-dlp on youtube as a video, make sure to use youtube-dlp to it up.

throwaway0a5e · on May 24, 2022

>Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

If creators start encoding their source and material into their content Google would probably be fine with that because it gives them data but also gives them context for that data.

Edit: I meant like "director's commentary" and "notes about production" type stuff like you used to see added to DVDs back in the day. Not "using youtube as my personal file storage". Why is this such an unpopular opinion?

baud147258 · on May 24, 2022

> If creators start encoding their source material into their files Google would probably be fine with that

it'd depends, as I don't think people using YT to store files would watch a lot of adds

throwaway0a5e · on May 24, 2022

If creators use it like the appendix in a book I can see people watching ads on their way to it.

jklinger410 · on May 24, 2022

> If creators start encoding their source material into their files Google would probably be fine with that

Not true at all, lol. Google has a paid file storage solution. YouTube is for streaming video and that's the activity they expect on that platform. I couldn't imagine any service designed for one format would "probably be fine" with users encoding other files inside of that format.

pbhjpbhj · on May 24, 2022

I think the parent comment is limiting themselves to the embedding of metadata specific to the containing file. It would be like adding a single frame, but would potentially give useful information to Google. In those limited circumstances I think the parent is correct.

legitster · on May 24, 2022

This reminds me of an old hacky product that would let you use cheap VHS tapes as backup storage: https://en.wikipedia.org/wiki/ArVid

You would hit Record on a VCR and the computer data would be encoded as video data on the tape.

People are clever.

gibolt · on May 24, 2022

Early games and software would be delivered on audio cassettes that would then have to be 'played' in order to load your software temporarily into the device, which could take minutes

edit: Video from the 8-bit Guy on how this worked - https://www.youtube.com/watch?v=_9SM9lG47Ew

ben174 · on May 24, 2022

Wow, 2GB on a standard tape. For the time, that's incredibly efficient and cheap.

anyfoo · on May 24, 2022

Yeah. Video, even old grainy VHS, had a pretty high bandwidth. Even much more so with S-VHS, which did not become super popular though. (I'm actually wondering whether the 2GB figure was for S-VHS, not VHS. Didn't to the math and wouldn't be surprised either way, though.)

Dylan16807 · on May 25, 2022

A normal VHS encodes about 100 million scan lines over 2 hours. 20 bytes per scan line sounds feasible, since there's somewhere around 200-300 'pixels' of luma available in each scan line.

anyfoo · on May 25, 2022

Thanks, that's a very reasonable back of the envelope calculation.

There are many fun details about VHS, its chroma resolution, and especially some weirdness around the PAL delay line, but they all don't really matter for this.

Wikipedia says there is about 3MHz of bandwidth, so ~200-300 "pixels" seems like a very good ballpark (just going by the fact that a normal PAL signal has about 6 MHz and is commonly digitized as 720x576, 3MHz about halves the horizontal resolution and taking some pixels off for various reasons makes sense).

mobilene · on May 24, 2022

This is old school. When I first wrote code back in the Stone Age we used to store our stuff on cassette tape.

twh270 · on May 24, 2022

You had cassette tape?? Lucky... I had to write my 1's and 0's in the dirt with a stick.

Damn rain.

RedShift1 · on May 24, 2022

You guys had dirt?

quickthrower2 · on May 25, 2022

You guys had atoms? When I was a lad, there were only photons.

phtrivier · on May 25, 2022

You guys had ? When I was, we weren't.

quickthrower2 · on May 25, 2022

You guys will have spacetime and causality? I wont have that when I will be a young lad.

Random_Person · on May 24, 2022

I still have my Atari 400 and tape drive!

johnvega · on May 24, 2022

My family had Atari 400 with a tape drive. I remembered buying a tape with a game. We also use it for basic programming language and the Astroids game using a cartridge.

Random_Person · on May 25, 2022

Yep, I had the BASIC cartridge and used the tape drive almost exclusively for that. Coded up all sorts of little projects on that machine. I hated the membrane keyboard, but it worked!

madengr · on May 24, 2022

Ha ha, when I was a kid with my C64, I used my moms old reel-to-reel tape deck to store data.

I still have a C64 and tape drive.

There was a magazine in the 80’s where you could scan in the code with a bar code scanner.

philjohn · on May 24, 2022

The Alesis ADAT 8 track digital audio recorders used SVHS tapes as the medium - at the end of the day, it's just a spooled magnetic medium, not hugely different conceptually than a hard drive.

alar44 · on May 24, 2022

That's not really that hacky, audio cassettes were used forever, it's just a tape backup.

gattilorenz · on May 24, 2022

Yes! There were many such systems, LGR made a video for one of them, also showing the interface (as in: hardware and GUI) for the backup: https://youtu.be/TUS0Zv2APjU

jhgb · on May 24, 2022

I remember a similar solution that was marketed in a German mail order catalogue in late 1990s. It could have been Conrad, but I'm not 100% sure. I recall it being a USB peripheral, though. (Maybe I could find more about it in time...)

saint_angels · on May 24, 2022

Reminds me of a guy who stored data in ping messages https://youtu.be/JcJSW7Rprio

evgen · on May 25, 2022

Back in the day, when protocols were more trusting we would play games by storing data archives in other people's SMTP queues. Open the connection and send a message to yourself by bouncing it through a remote server, but wait to accept the returning email message until you wanted the data back. As long as you pulled it back in before it times out on that queue and looped it back out to the remote SMTP queue you could store several hundred MB (which was a lot of data at the time) in uuencoded chunks spread out across the NSFNet.

bluedays · on May 24, 2022

I watch these things and I begin to realize I'll never be as intelligent as someone like this. It's good to know no matter how much you're grown there is always a bigger fish.

qorrect · on May 24, 2022

I agree that there will always be smarter fish, but you can definitely be this smart it just takes the proper motivation ( or weird idea ) to wiggle its way into your brain.

alanh · on May 24, 2022

What part of the video discusses this? :D So far it’s about juggling chainsaws

Edit: OK, I see where this is going. Lol

antics · on May 25, 2022

This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).

From the README.md:

> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.

> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.

[1]: https://github.com/hausdorff/snapchat-fs

[2]: https://news.ycombinator.com/item?id=6932508

daenz · on May 24, 2022

How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.

dsr_ · on May 24, 2022

Back of the envelope estimate:

4096 x 2160 x 24 x 60 is your theoretical max in bits/second, 127 billion.

Assume that to counter YouTube's compression we need 16x16 blocks of no more than 256 colors and 15 keyframes/second; that reduces it to

256 * 135 * 8 * 15 = 4.1 million bits/sec.

That's not too awful. Ten minutes of this would get you about 300MB of data, which itself might be compressed.

daenz · on May 24, 2022

To do PiP (picture in picture), you would be restricted to a much smaller size, but otherwise good calculations.

pstrateman · on May 24, 2022

4k video is almost always 3840x2160

kuschku · on May 24, 2022

4K consumer video is 3840x2160, 4K Cinema video is 4096x2160.

Just like 2K consumer video is 1920x1080 and 2K Cinema video is 2048x1080

pstrateman · on May 25, 2022

sure, youtube videos are consumer though

kuschku · on May 25, 2022

Not necessarily — YouTube supports DCI 4K as well and videographers sometimes upload in DCI 4K.

behnamoh · on May 24, 2022

“hope you enjoyed this video. btw, the source code used in this tutorial is encoded in the video.”

accrual · on May 24, 2022

Would storing data as a 15 or 30 FPS QR code "video" be any more useful? At a minimum one would gain a configurable amount of error correction, and you could display it in the corner.

cush · on May 24, 2022

Yeah seems way easier than adding a link in the description

daenz · on May 24, 2022

Links die. As long as the video exists, the files that the video uses will always exist.

cush · on May 25, 2022

As if videos don't die...

alexb_ · on May 25, 2022

well the point is the video and the file are linked - the video cannot exist without the file

cush · on May 26, 2022

This is a classic case of overengineering a solution to a nonexistent problem.

On YouTube, the video and the description are also linked. They exist on the same page always.

And even if the concern this solution is covering is what if the video is somehow shared without the description, away from YouTube, then the video could just as easily contain the description or URL or QR code pointing to the file.

This is a just horribly unusable QR code.

alexb_ · on May 26, 2022

The description is not big enough to hold practically any data. You would need to link to it from there.... at which point the two are no longer linked into existence. Links go down insanely often.

cush · on May 26, 2022

I'm not saying embed the file I the description. Add a URL that points to the asset like we've been doing forever... This solves nothing.

alexb_ · on May 26, 2022

>add a URL that points

This works really well until it doesn't. I have seen so, so, so many videos have linked content in the description that links to sites which don't work anymore.

cush · on May 26, 2022

> links to sites

Wait so your expectation is that instead of Youtubers using URLs to link to websites, you would prefer and expect that they download and embed those websites into their videos for you? Like, a zip file of the whole site? For.... convenience?

Have you considered using archive.org or mirrors instead?

There are an infinite number of better solutions for this...

umvi · on May 24, 2022

Turns out any site that allows users to submit and retrieve data can be abused in the same way:

- FacebookDrive: "Store files as base64 facebook posts"

- TwitterDrive: "Store files as base64 tweets"

- SoundCloudDrive: "Store files as mp3 audio"

- WikipediaDrive: "Store files in wikipedia article histories"

WaxProlix · on May 24, 2022

I wrote one of these as a POC when at AWS to store data sharded across all the free namespaces (think Lambda names), with pointers to the next chunk of data.

I like to think you could unify all of these into a FUSE filesystem and just mount your transparent multi-cloud remote FS as usual.

It's inefficient, but free! So you can have as much space as you want. And it's potentially brittle, but free! So you can replicate/stripe the data across as many providers as you want.

turtledove · on May 24, 2022

I was an eng manager on Lambda for a time, and we definitely knew people were doing this, and had plans to cut it out if it ever became a problem. :D

WaxProlix · on May 24, 2022

Yeah, you'd need to find some sort of auto-balancing to detect this kind of bitrot from over-aggressive engineering managers & their ilk and rebalance the data across other sources. I think the multiple-shuffle-shard approach has been done before, maybe we could steal some algo from a RAID driver, or DynamoDB.

itake · on May 24, 2022

Back in the day when @gmail was famous for their massive free storage for email, ppl wrote scripts to chunk large files and store them as email attachments.

Grollicus · on May 24, 2022

I used this as a backup target for the longest time. Simply split the backup file into 10 MB chunks and send as mails to a gmail account. Encrypted so no privacy problems. Rock solid for years.

And as it was just storing emails it was even using gmail for it's intended purpose so no TOS problems..

shon · on May 24, 2022

Yup, did the exact same thing to back up all of the Wordpress installs on a free server I ran for friends.

adzm · on May 24, 2022

People did this on AOL in the 90s as well!

jprd · on May 24, 2022

Did you manage to get on the latest Mass Mail going out tonight?

RcouF1uZ4gsC · on May 24, 2022

With AOL, in the early 90’s you didn’t even need to do that. You could just reformat and reuse the floppy disks they were always sending you for free storage.

ihaveajob · on May 24, 2022

I know someone who published an academic paper on doing exactly this.

IshKebab · on May 24, 2022

Doesn't sound very noteworthy tbh. It's obviously possible and the implementation is straightforward.

867-5309 · on May 24, 2022

sounds like 99% of academic papers

IshKebab · on May 24, 2022

Most papers at least sound like they're notable!

jraph · on May 24, 2022

The less jam you have, the more you spread it out.

The opposite is also true. Brilliant ideas have lead to papers that can read obvious and terribly unremarkable.

wging · on May 24, 2022

See also https://github.com/qntm/base2048. "Base2048 is a binary encoding optimised for transmitting data through Twitter."

colinmhayes · on May 24, 2022

Still need around 30,000 more unicode characters for this to work.

wging · on May 24, 2022

Sorry, I edited the post concurrently with your comment - it now points to Base2048, the link I meant to post, which actually should work - rather than https://github.com/qntm/base65536 (which I think you're commenting on).

theblazehen · on May 24, 2022

> For transmitting data through Twitter, Base65536 is now considered obsolete; see Base2048.

Source: https://github.com/qntm/base65536

jasonlotito · on May 24, 2022

My friends and I had a joke called NSABox. It would send data around using words that would attract the attention of the NSA, and you could submit a FOIA request to recover the data. I always found it amusing.

mickeyp · on May 24, 2022

There's a feature in Emacs that does that (unsurprisingly.)

It's called `M-x spook'. It inserts random gibberish that NSA and the Echelon project would've supposedly picked up back in the 90s.

LukeShu · on May 24, 2022

spook.el was "introduced at or before Emacs version 18.52". And 18.52 was released in 1988. And spook.el in a comment says

    ;; Created: May 1987

So the things that the NSA and ECHELON would have picked up on back in the 1980s, not the 1990s :)

havblue · on May 24, 2022

I've heard of the loic ion cannon dos tool described as a shortcut to getting sent to jail. This sounds similar.

mechanical_bear · on May 24, 2022

Big difference. LOIC actually impacts a target.

thrdbndndn · on May 24, 2022

This is pretty tame compared to some actual, practical ones such as https://github.com/apachecn/CDNDrive

For people who don't read Chinese: it encodes data into ~10M blocks in PNG and then uploads (together with a metadata/index file as an entry point) to various Chinese social media sites that don't re-compress your images. I knew people have used it to store* TBs after TBs data on them already.

*Of course, it would be foolish to think your data is even remotely safe "storing" them this way. But it's a very good solution for sharing large files.

upupandup · on May 24, 2022

What a great time to write botnets

willcipriano · on May 24, 2022

I made a tool that lets you store files anywhere you can store a URL: https://podje.li/

metadat · on May 24, 2022

Is there an import URLs button? Otherwise, how does one reassemble the original?

willcipriano · on May 24, 2022

Click them, it's really for things that fit into one or two urls like small text files. I've used it for config files that were getting formatted incorrectly over corporate email that ate it as a attachment.

the_duke · on May 24, 2022

Github repos makes for a pretty good key-value store.

It even has a full CRUD API, no need for using libgit.

momofarm · on May 25, 2022

I wonder if we could use this technique at place which gov will censored senstive data upload to streaming site like mainland china or North Korea(they do have streaming site right?)

although for propganda use, shortwave / sat tv is a much much simpler way to distribute information to place like that, but I belive now its hard to get one SW radio for anyone.

vfinn · on May 25, 2022

Reminds me of when I tried to Gmail myself a zip archive, and it was denied because of security reasons iirc. I then tried to base64 it, and it still didn't work, same with base32, until finally base16 did work.

fomine3 · on May 25, 2022

I found some pirates uploads video to Prezi so they get free S3 video hosting.

Denatonium · on May 25, 2022

At one point there was a piece of software called deezcloud which exploited Deezer's user uploaded MP3 storage, allowing it to be used as free CDN cloud storage for up to 400GB of files. I don't think it works anymore, and I'm not sure if it ever worked well (I never tried it).

mike00632 · on May 24, 2022

I wonder if access permissions would be easier to maintain using Facebook...

dheera · on May 24, 2022

Until one day your base64 ciphertext just so happens to contain a curse word and you get banned for violating "community standards"

7373737373 · on May 24, 2022

I think I've seen similar blog posts about doing the same with the DNS and BGP networks

_dw7s · on May 24, 2022

or reddit: https://github.com/AncientEntity/PublicPyRedditStorage/

quickthrower2 · on May 25, 2022

We need an HNShowDeadDrive

behnamoh · on May 24, 2022

also Telegram

snarfy · on May 25, 2022

I remember my friend did something like this on an old unix system.

Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s

freestorage · on May 24, 2022

Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.

xhrpost · on May 24, 2022

They still do if you pay for Prime. I was surprised to see that even RAW files (which are uncompressed and quite large) were uploaded and stored with no issues. Not the same as "hiding" data but might still be possible.

karamanolev · on May 24, 2022

In the interest of technical correctness, RAW files are frequently compressed and even lossily compressed. For example, Sony's RAW compression was only lossy until very recent cameras.

Given that there are the options for uncompressed, lossy compressed and lossless compressed, I'd say RAW files differ in the stage of the data processing where capture is being done and doesn't imply anything about the type of compression.

What is relevant is that the formats vary widely between manufacturers, camera lines and individual cameras, so unlike JPEG, it's really hard to create a storage service that compresses RAW files further after uploading in a meaningful way. So anything they do needs to losslessly compress the file.

xhrpost · on May 25, 2022

Interesting, so are you saying that the RAW signal coming from the hardware is already often compressed even before hitting the main software compression?

karamanolev · on May 25, 2022

Oh, no. What I'm saying is that cameras often take the raw signal from the hardware, but then the camera software frequently compresses that signal before writing it to a raw file (.cr2, .arw, .dng, whatever). This compression can be lossy or lossless. It's important not to confuse the raw signal with the RAW file (an actual format, often specific to the camera manufacturer). Just by saying RAW file, assuming it's lossless or uncompressed is false. So it should be specified - uncompressed RAW (lossless almost by definition), lossy compressed, lossless compressed.

netsharc · on May 24, 2022

I guess you can store 24 bits of data as the R,G and B components of a pixel of an "image", and store it as a lossless image...

rabuse · on May 25, 2022

Shhhh, I still do this with encrypted database backups.

_trampeltier · on May 24, 2022

This story from 2016 comes to my mind.

https://www.bbc.com/future/article/20160225-the-quest-to-sol...

geoffeg · on May 24, 2022

This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.

banana_giraffe · on May 24, 2022

Reminds me of "Cauzin Softstrip", the format some computer magazines used back in the day to distribute BASIC programs, or even executables.

Random example from an issue of Byte:

https://archive.org/details/byte-magazine-1986-05/page/n432/...

lifthrasiir · on May 24, 2022

Searching HN for "paper backup" gives a lot of existing solutions, in fact too many that I don't know which one you saw.

aaaaaaaaaaab · on May 24, 2022

So you invented QR codes?

geoffeg · on May 24, 2022

Overly complicated, color QR codes.

advisedwang · on May 24, 2022

Seems like a great way to get your account closed for abuse!

LewisVerstappen · on May 24, 2022

You'd be surprised how much YouTube lets you upload.

I've been uploading 2-3 hours of content a day every day for the past few years. On the same account too.

I have fewer than 10 subscribers lol.

emptysongglass · on May 24, 2022

Lucky you. I just posted my first two videos from a conference that were banned within a day for violating "Community Guidelines" without appeal.

c0balt · on May 24, 2022

They let you sometimes get away with a lot more[0] ;)

[0]: https://www.youtube.com/watch?v=Olkb7fYSyiI

deanCommie · on May 24, 2022

How MUCH - yes - as long as it's videos, and it's not violating copyright, you're probably not violating any Terms of Service.

But I guarantee there is some clause in the ToS that this project violates.

bityard · on May 24, 2022

What kind of content do you upload? (Should "content" be in air quotes? :P)