Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.
The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.
Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)
One more thing: the choice of Wolfram Mathematica as an implementation language was a deliberate decision on my part. Not for any technical reason -- YouTubeDrive doesn't use any of Mathematica's symbolic math capabilities -- but because I didn't want YouTubeDrive to be too easy for anybody on the internet to download and use, lest I attract unwanted attention from Google. In the eyes of my paranoid freshman self, the fact that YouTubeDrive is somewhat obtuse to install was a feature, not a bug.
So, feel free to have a look and have a laugh, but don't try to use YouTubeDrive for any serious purpose! This encoding scheme is so horrendously inefficient (on the order of 99% overhead) that the effective bandwidth to and from YouTube is something like one megabyte per minute.
As far back as the late 1970s a surprisingly similar scheme was used to record digital audio to analog video tape. It mostly looks like kind of stripey static, but there was a clear correlation between what happened musically and what happened visually, so in college (late 1980s) one of my friends came into one of these and we'd keep it on the TV while listening to whole albums. We had a simultaneous epiphany about the encoding scheme during a Jethro Tull flute solo, when the static suddenly became just a few large squares.
I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.
> that we can take the squares down to 10x10, maybe even 8x8 or 5x5
16x16, 8x8, or 4x4 would be the way to go. You'd want each RGB block to map to a single H.264 macroblock.
Using non order of 2 numbers means that individual blocks don't line up with macroblocks. Having a single macroblock represent 1, 4, or 16 RGB pixels would be ideal.
In fact, I bet modifying the original code to use a scaling factor of 16 instead of 20 would produce some significant improvements.
There's also the chroma subsampling issue. With the standard 4:2:0 ratios, you'll get half the resolution for the two chroma channels, and if I'm not mistaken, they are more aggressively quantized.
It would be better to use YUV/YCbCr directly instead of RGB.
I'm not sure if your examples are sticking to 0 or 255 RGB. If it is you might get a win by using HSL to pick your colors. If you change the lightness dramatically every frame maybe colors won't bleed across a frame. Then perhaps you can encode 2+ bits in hue and another 2+ in saturation getting a win and another minor one using 1+ bit on brightness (ie first frame can be 0 or 25%, next frame is 75% or 100%. I'm not too familiar with encodings though and how much it'd interfere with the other transforms
YouTube's 1080p60 is already at a decimation ratio of about 200:1, then you have to consider how efficient P and B frames are with motion/differences. if your data looks like noise you're gonna be completely screwed since the P and B frames will absolutely destroy the quality.
There's a bunch of other things too, like YUV420p and TV colour range: 16-235, so you only get 7.7bits / pixel.
If anything you would want to encode your data in some way that abuses the P and B frames, and the macro block size of 16x16.
Coding theory for the data output at your end is only one side of the coin, the VP9 codec stupidly good compression is a completely different game to wrangle.
And I kinda doubt you'll get much better than your estimate of 1% from the original scheme.
-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.
It worked a charm.
Second round? A year later, when the archive was still available from umpteen hosts.
For all I know, it still languishes on who knows how many old hard drives...
No I'm pretty sure you're right on the timing. I was a teenager in the mid to late 90s and britney tapes were extremely common then, to the point that it was often used as a joke (much like in your story!)
WinMX was release 21 years ago, and Britney Spears definitely didn’t break out until around 1999 around the same time as Napster. The difference between 1994 and 1999 is quite a bit in show biz/pop culture and an absolutely huge difference in public uptake of the internet.
That's right, 1999 was the release of her first album (Baby One More Time). I think the reason some feel like it must have been the mid-90s is that she went from that to the first major scandals (marriage, divorce and a shift in her musical style) as well as her first Greatest Hits album by 2004.
Her next album after that (along with the head shaving incident and being placed under the custody of her manager-dad) was 2007, so most people's memories of her as a "sexy teen idol" are likely from her 1999-2003 period, which in retrospect probably felt a lot longer, especially with the overlap of other young women pop stars in the same period (e.g. Christina Aguilera started in 1998).
For the record, neither I nor the person I replied to said 1994 (unless they edited their post). I was thinking 97 or 98, so was still off, but at this point in my life being off by a year or two feels pretty damn close ;-)
You devil! I'm pretty sure I remember running into a file that looked like that and a quick poke around showed it wasn't anything valid.
Funny how these things work since I'm pretty sure I remember running into it around 2008 (i'm a few years younger).
I think i just deleted it though since I was suspicious of most strange files back then; I was the nerd who didn't have friends so i used to troll forums for anything i could get my hands on.
Not sure how WinMX works, but back in the old DC++ days, you not only searched for things but could navigate directory structures directly from users connected to the same hub. It was uncommon for me to browse through some of the biggest/interesting users and see what they were sharing.
By doing that, I'm sure I stumbled upon my fair share of sketchy stuff unintentionally, so it's not hard to imagine the same for others :)
Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.
A little less crazy and more straightforward (software on audio tape was super common after all): Radio stations and vinyl discs that transmitted programs to the microcomputers of the time (C64, TRS-80 etc.) have quite a long tradition. Some examples:
Not paid for programming but essentially the same tech: VHS games used to encode data in exotic ways so that the content was both viewable on regular TVs with a regular VHS player, but also had some kind of playable content.
Not quite paid programming, but Scientific Atlanta had a Broadcast File System that would send data to set top boxes over coax QAM channels used for digital TV. It would loop through all the content on the "carousel" repeatedly so all the boxes connected to that head end would eventually see the updates.
If I was to gamble I would say that Analog TV can store more data, compression algorithms usually work at say 1:200 compression ratio, they're extremely destructive, a raw 1080p60 in yuv420p is about 187MB/s, on the other hand a decent equivalent video on YouTube is about 1MB/s
I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)
This is a tangent. I must have been maybe 15-16 at the time, so somewhere around 20 years ago: One of the first pieces of software I remember building was a POP3 server that served files, that you could download using an email client where they would show up as attachments.
Incredibly bizarre idea. I'm not sure who I thought would benefit from this. I guess I got swept up in RFC1939 and needed to build... something.
On my first job (in the beginning of the millennium) there was a limit on files you could download, something around 5Mb. If you wanted to download something bigger, you had to ask sysadmins to do that and wait... That was really annoying. So I and my colleague end up writing a service, that could download a file to local storage and chop it into multiple 5Mb attachments and send multiple emails to requestor.
After some time the limit on single file was removed, but daily limit was set up to 100Mb. The trick is that POP3 traffic wasn't accountable, so we continued to use our "service".
That sounds suspiciously similar to how I used to download large files on a shared 2GB/month data plan. My carrier didn't count incoming MMS messages towards the quota, and conveniently didn't re-encode images sent to their subscribers via their email-to-MMS gateway. So naturally, I'd SSH into my server, download what I wanted to download, and run the bash script I wrote, which split the downloaded file into MMS-sized chunks, and prepended a 1x1 PNG image to them, and then sent them sequentially through my carrier's gateway. This worked surprisingly well, and I had a script on my phone which would extract the original file from the sequence of "photos". It may still work, but I've since gotten a less restrictive data plan.
I couldn't download .exe files at some $CORPORATION. They had to be whitelisted or something, and the download just wouldn't work otherwise. But once you had the .exe you could run it just fine. You just had to ping some IT person to be able to retrieve your .exe.
Of course it was still possible to browse the internet and visualize arbitrary text, so splitting the .exe into base64-encoded chunks and uploading them on GitHub from another computer was working perfectly fine... I briefly argued against these measures, given how unlikely they are to prevent any kind of threat, but they're probably still in place.
apparently e-mail is not much reliable for storing/keeping files. there have been cases where an old email with an attachment would not load correctly because the servers just erased the attachment file.
This was a custom email server though, there never were any emails, it just presented files as though they were so that a client would download them.
Actually caused some problems for email clients, as they usually assumed emails were small. I got a few of them to crash with 200 Mb "attachments" (although this was in the early 00s, 200Mb was bigger than it is today).
Since GP says it was a POP3 server, I suppose you would set up an email account in your client with its inbox server pointing to that POP3 server. When the client requests the content of the inbox, the server responds with a list of "emails" that are really just files with some email header slapped on; so your email client's inbox window essentially becomes a file browser.
They also experimented with encoding videos and arbitrary files into different kinds of single (still) image formats, some of them able to be uploaded to the same 4chan thread itself, with instructions on how to decode/play it back. Examples:
I only looked at the example video, but is the concept just "big enough pixels"?
Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].
Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.
[1] If that's what video encoding still does, I really have no idea, but you get the point.
Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.
For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.
Handling this compression protection for so many different codecs is likely not feasible.
Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.
But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.
How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.
Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.
I didn't know that, and am glad that they do transcode on the fly when appropriate. Very impressive that it's seamless to the end user, I've never sat waiting for a YouTube clip to play even when it's definitely something from the 'back catalog' like a decade old video with a dozen views.
What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.
To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.
To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.
Why go that way. I’m no digital signal processing expert, but images (and series thereof, i.e videos) are 2D signals. What we see is spatial domain and analyzing pixel by pixel is naive and won’t get you very far.
What you need is going to frequency domain. From my own experiment in university times most significant image info lays in lowest frequencies. Cutting off frequencies higher than 10% of lowest leaves very comprehensible image with only wavey artifacts around objects. You have plenty of bandwidth to use even if you want to embed info in existing media.
Now here you have full bandwidth to use. Start with frequency domain, set expectations of lowest bandwidth you’ll allow and set the coefficients of harmonic components. Convert to spatial domain, upscale and you got your video to upload. This should leave you with data encoded in a way that should survive compression and resizing. You’ll just need to allow some room for that.
You could slap error correction codes on top.
If you think about it, you should consider video as - say - copper wire or radio. We’ve come quite far transmitting over these media without ML.
We started with that approach, by assuming that the compression is wavelet based, and then purposefully generating wavelets that we know survive the compression process.
For the sake of this discussion, wavelets are pretty much exactly that: A bunch of frequencies where the "least important" (according to the algorithm) are cut out.
But that's pretty cool, seems like you've re-invented JPEG without knowing it, so your understanding is solid!
That's essentially a variant of "bigger pixels". Just like them, your algorithm cannot guarantee that an unknown codec will still make the whole thing perform adequately.
Even if you train your model to work best for all existing codecs (I assume that's the "ML" part of the ML model), the no free lunch theorem pretty much tells us that it can't always perform well for codecs it does not know about.
(And so does entropy. Reducing to absurd levels, if your codec results in only one pixel and the only color that pixel can have is blue, then you'll only be able to encode any information in the length of the video itself.)
It's not guaranteed to perform well with unknown or new codecs - true. But, the implicit assumption is that YouTube will use codecs that preserve what videos look like - not just random codecs. If that assumption holds then the image recognition model will keep working even with new codecs.
That's the thing though, "looks like" breaks down pretty quickly with things that aren't real images. It even breaks down pretty quickly with things that are real, but maybe not so common, images: https://www.youtube.com/watch?v=r6Rp-uo6HmI
So one question would be: Does your image generation approach preserve a higher information density than big enough pixels?
Why would you assume that the images in my algorithm aren't real images? For example, you could use 256 categories from imagenet as your keys. Image of a dog is 00000000, tree is 00000001, car 00000010, and so on.
I'm not assuming they are not real images, I'm questioning whether to get to any information density that even out-performs "big enough colorful pixels", you might get into territories where the lossy compression compresses away what you need to unambiguously discriminate between two symbols.
And to get to that level of density I do wonder what kind of detailed "real image" it would still be.
If your algorithm were for example literally the example you just noted, then the "big enough colorful pixels" from the example video already massively outperform your algorithm. Of course it won't be exactly that, but you have the assumption that the video compression algorithm somehow applies the same meaning of "looks like" in its preservation efforts that your machine learning algorithm does, down to levels where differences become so minute that they exceed what you can do with colorful pixels that are just big enough to pass the compression unscathed, maybe with some additional error correction (i.e. exactly what a very high density QR code would do).
YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.
[a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source
They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.
This brings up an interesting question: what is the upper-bound of hidden data density using video steganography? E.g. how much extra data can you add before noticeable degradation? It's interesting because it requires both a detailed understanding of video encoding and also understanding of human perception of video.
I'd expect you could store more data steganographically than the raw video data.
You can probably do things like add frames that can't be decoded and so are skipped by a decoder; that effectively allows arbitrary added hidden data. That's maybe cheating.
If you stipulate that you can't already have a copy of the unaltered file, and the data has to be extractable from a pixel copy of the rendered frames ... that becomes more interesting, I think.
Youtube doesn't give you the raw video back, it does transcoding to their given standard bitrates/resolution sets.
You'll notice this if someone has just uploaded a video to Youtube and the only version available for playback is some 360p/480p version for a few hours until Youtube gets around to processing higher bitrates.
So whatever you're encoding has to survive that transcode process.
A pretty massive amount I imagine. I attended a lecture on single image steganography and they were able to store almost 25% of the image's size and it was barely visible. Even 50% didn't look too bad.
Extending that into video files and it would likely be pretty massive, although you'd have some interesting time with youtube's compression algorithms
Good luck preserving it through YouTube's video compression. It's super lossy with small details, in bad cases the quality can visibly degrade to a point it looks more like a corrupted low-res video file for a few seconds (saw that once in a Tetris Effect gameplay video).
I mentioned it in another comment, but while that does lower the bandwidth of a single frame, its not actually an issue. There's several DRM techniques that can survive a crappy camera recording in a theater.
"compression resistant watermark" turns up some good resources for it. QR codes are another good example of noise tolerant data transmission (fun fact - having logos in a QR code isn't part of the spec, you're literally covering the QR code but the error-correction can handle it).
The best way I can describe it is that humans can still read text in compressed videos. The worse the compression/noise the larger the text needs to be, but we can still read it.
>Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.
If creators start encoding their source and material into their content Google would probably be fine with that because it gives them data but also gives them context for that data.
Edit: I meant like "director's commentary" and "notes about production" type stuff like you used to see added to DVDs back in the day. Not "using youtube as my personal file storage". Why is this such an unpopular opinion?
> If creators start encoding their source material into their files Google would probably be fine with that
Not true at all, lol. Google has a paid file storage solution. YouTube is for streaming video and that's the activity they expect on that platform. I couldn't imagine any service designed for one format would "probably be fine" with users encoding other files inside of that format.
I think the parent comment is limiting themselves to the embedding of metadata specific to the containing file. It would be like adding a single frame, but would potentially give useful information to Google. In those limited circumstances I think the parent is correct.
Early games and software would be delivered on audio cassettes that would then have to be 'played' in order to load your software temporarily into the device, which could take minutes
Yeah. Video, even old grainy VHS, had a pretty high bandwidth. Even much more so with S-VHS, which did not become super popular though. (I'm actually wondering whether the 2GB figure was for S-VHS, not VHS. Didn't to the math and wouldn't be surprised either way, though.)
A normal VHS encodes about 100 million scan lines over 2 hours. 20 bytes per scan line sounds feasible, since there's somewhere around 200-300 'pixels' of luma available in each scan line.
Thanks, that's a very reasonable back of the envelope calculation.
There are many fun details about VHS, its chroma resolution, and especially some weirdness around the PAL delay line, but they all don't really matter for this.
Wikipedia says there is about 3MHz of bandwidth, so ~200-300 "pixels" seems like a very good ballpark (just going by the fact that a normal PAL signal has about 6 MHz and is commonly digitized as 720x576, 3MHz about halves the horizontal resolution and taking some pixels off for various reasons makes sense).
My family had Atari 400 with a tape drive. I remembered buying a tape with a game. We also use it for basic programming language and the Astroids game using a cartridge.
Yep, I had the BASIC cartridge and used the tape drive almost exclusively for that. Coded up all sorts of little projects on that machine. I hated the membrane keyboard, but it worked!
The Alesis ADAT 8 track digital audio recorders used SVHS tapes as the medium - at the end of the day, it's just a spooled magnetic medium, not hugely different conceptually than a hard drive.
Yes! There were many such systems, LGR made a video for one of them, also showing the interface (as in: hardware and GUI) for the backup: https://youtu.be/TUS0Zv2APjU
I remember a similar solution that was marketed in a German mail order catalogue in late 1990s. It could have been Conrad, but I'm not 100% sure. I recall it being a USB peripheral, though. (Maybe I could find more about it in time...)
Back in the day, when protocols were more trusting we would play games by storing data archives in other people's SMTP queues. Open the connection and send a message to yourself by bouncing it through a remote server, but wait to accept the returning email message until you wanted the data back. As long as you pulled it back in before it times out on that queue and looped it back out to the remote SMTP queue you could store several hundred MB (which was a lot of data at the time) in uuencoded chunks spread out across the NSFNet.
I watch these things and I begin to realize I'll never be as intelligent as someone like this. It's good to know no matter how much you're grown there is always a bigger fish.
I agree that there will always be smarter fish, but you can definitely be this smart it just takes the proper motivation ( or weird idea ) to wiggle its way into your brain.
This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).
From the README.md:
> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.
> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.
How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.
Would storing data as a 15 or 30 FPS QR code "video" be any more useful? At a minimum one would gain a configurable amount of error correction, and you could display it in the corner.
This is a classic case of overengineering a solution to a nonexistent problem.
On YouTube, the video and the description are also linked. They exist on the same page always.
And even if the concern this solution is covering is what if the video is somehow shared without the description, away from YouTube, then the video could just as easily contain the description or URL or QR code pointing to the file.
The description is not big enough to hold practically any data. You would need to link to it from there.... at which point the two are no longer linked into existence. Links go down insanely often.
This works really well until it doesn't. I have seen so, so, so many videos have linked content in the description that links to sites which don't work anymore.
Wait so your expectation is that instead of Youtubers using URLs to link to websites, you would prefer and expect that they download and embed those websites into their videos for you? Like, a zip file of the whole site? For.... convenience?
Have you considered using archive.org or mirrors instead?
There are an infinite number of better solutions for this...
I wrote one of these as a POC when at AWS to store data sharded across all the free namespaces (think Lambda names), with pointers to the next chunk of data.
I like to think you could unify all of these into a FUSE filesystem and just mount your transparent multi-cloud remote FS as usual.
It's inefficient, but free! So you can have as much space as you want. And it's potentially brittle, but free! So you can replicate/stripe the data across as many providers as you want.
Yeah, you'd need to find some sort of auto-balancing to detect this kind of bitrot from over-aggressive engineering managers & their ilk and rebalance the data across other sources. I think the multiple-shuffle-shard approach has been done before, maybe we could steal some algo from a RAID driver, or DynamoDB.
Back in the day when @gmail was famous for their massive free storage for email, ppl wrote scripts to chunk large files and store them as email attachments.
I used this as a backup target for the longest time. Simply split the backup file into 10 MB chunks and send as mails to a gmail account. Encrypted so no privacy problems. Rock solid for years.
And as it was just storing emails it was even using gmail for it's intended purpose so no TOS problems..
With AOL, in the early 90’s you didn’t even need to do that. You could just reformat and reuse the floppy disks they were always sending you for free storage.
Sorry, I edited the post concurrently with your comment - it now points to Base2048, the link I meant to post, which actually should work - rather than https://github.com/qntm/base65536 (which I think you're commenting on).
My friends and I had a joke called NSABox. It would send data around using words that would attract the attention of the NSA, and you could submit a FOIA request to recover the data. I always found it amusing.
For people who don't read Chinese: it encodes data into ~10M blocks in PNG and then uploads (together with a metadata/index file as an entry point) to various Chinese social media sites that don't re-compress your images. I knew people have used it to store* TBs after TBs data on them already.
*Of course, it would be foolish to think your data is even remotely safe "storing" them this way. But it's a very good solution for sharing large files.
Click them, it's really for things that fit into one or two urls like small text files. I've used it for config files that were getting formatted incorrectly over corporate email that ate it as a attachment.
I wonder if we could use this technique at place which gov will censored senstive data upload to streaming site like mainland china or North Korea(they do have streaming site right?)
although for propganda use, shortwave / sat tv is a much much simpler way to distribute information to place like that, but I belive now its hard to get one SW radio for anyone.
Reminds me of when I tried to Gmail myself a zip archive, and it was denied because of security reasons iirc. I then tried to base64 it, and it still didn't work, same with base32, until finally base16 did work.
At one point there was a piece of software called deezcloud which exploited Deezer's user uploaded MP3 storage, allowing it to be used as free CDN cloud storage for up to 400GB of files. I don't think it works anymore, and I'm not sure if it ever worked well (I never tried it).
I remember my friend did something like this on an old unix system.
Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s
Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.
They still do if you pay for Prime. I was surprised to see that even RAW files (which are uncompressed and quite large) were uploaded and stored with no issues. Not the same as "hiding" data but might still be possible.
In the interest of technical correctness, RAW files are frequently compressed and even lossily compressed. For example, Sony's RAW compression was only lossy until very recent cameras.
Given that there are the options for uncompressed, lossy compressed and lossless compressed, I'd say RAW files differ in the stage of the data processing where capture is being done and doesn't imply anything about the type of compression.
What is relevant is that the formats vary widely between manufacturers, camera lines and individual cameras, so unlike JPEG, it's really hard to create a storage service that compresses RAW files further after uploading in a meaningful way. So anything they do needs to losslessly compress the file.
Interesting, so are you saying that the RAW signal coming from the hardware is already often compressed even before hitting the main software compression?
Oh, no. What I'm saying is that cameras often take the raw signal from the hardware, but then the camera software frequently compresses that signal before writing it to a raw file (.cr2, .arw, .dng, whatever). This compression can be lossy or lossless. It's important not to confuse the raw signal with the RAW file (an actual format, often specific to the camera manufacturer). Just by saying RAW file, assuming it's lossless or uncompressed is false. So it should be specified - uncompressed RAW (lossless almost by definition), lossy compressed, lossless compressed.
This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.
It's just recordings of myself when I'm doing deep work. I use OBS to stream my computer screen and a video recording of myself (mostly me muttering to myself).
It helps me avoid getting distracted (I feel like I'm being watched lol) and it's also interested to check back if I want to see what I was working on 3 months ago.
Are you screensharing while recording? What tooling do you use to do this if so?
Also, any potential issues with Google having access to proprietary code? I know the chance of any human at Google interpreting your videos is near-zero but still
Compression will limit the bandwidth of a given frame but you can work around it.
Some forms of DRM are already essentially this, compression - and even crappy camera recording from a theater - resistant DRM that is essentially stegonagraphy (you can't visually tell its there) exist.
EDIT: "compression resistant watermark" is a good search phrase if anyone is curious
Unless you tuned the NN on the files you get back from YouTube, so that it learns to encode the data in a way that is always recoverable despite the artifacts.
Others in these comments have also suggested steganography in both the video and audio streams. The problem with that is that when you retrieve a video from YouTube, you never get the original version back. You only get a lossy re-encoded version, and the very definition of lossy encoding is to toss out details that humans can't (or wouldn't easily) perceive, including ultra-sonic audio.
That is what redundancy and error correcting codes are for. It will reduce your data density, but I am sure you can find parameters that preserve the data.
isn't the point here that the sub-pixels being produced are so large that it would take a tremendous amount of artifacts to reduce them to an unreadable state?
in other words; if YTs compression was affecting it so badly that it prevented the data from being re-read, wouldn't that compression scheme render normal video-watching impossible?
We can see that data is encoded as "pixels" that are quite large, being made up of many actual pixels in the video file. I see quite bad compression artifacts, yet I can clearly make out the pixels that would need to be clear to read the data. It looks like the video was uploaded at 720p (1280x720), but the data is encoded as a 64x36 "pixel" image of 8 distinct colors. So lots of room for lossy compression before it's unreadable.
The code looks not too big (a single file). But it requries a paid symbolic language (Mathematica) to be used. Anyone with better Mathematica knowledge explain if it can be ported to another symbolic (Sage, Maxima) or non-symbolic languages (R, Julia, Python)
Yep! I'm the creator of YouTubeDrive, and there's absolutely nothing in the code that depends on the symbolic manipulation capabilities of Wolfram Mathematica -- you could easily port it to Python, C++, whatever. However, there are two non-technical reasons YouTubeDrive is written in Mathematica:
(1) I was a freshman in college at the time, and Mathematica is one of the first languages I learned. (My physics classes allowed us to use Mathematica to spare us from doing integrals by hand.)
(2) I intentionally chose a language that's a bit obtuse to use. I was afraid that I might attract unwanted attention from Google if YouTubeDrive were too easy for anybody to download and run.
I remember seeing years ago a python library called BitGlitter which did the same thing. It would convert any file to a image or video. You could then upload the file yourself. https://pypi.org/project/BitGlitter/
Probably not many. The advantage of plain old-fashioned radio is that the station doesn't keep track of the receivers. Whoever watches a YouTube numbers station is tracked six ways to Sunday.
Here I am trying my best to get my favorite videos OFF YouTube given that they could disappear at any second because of an account block, or just "reasons", and this link suggesting storing stuff with YouTube? By god, why? Sure, it's free, practically "limitless" slow file storage, but what a bad idea nonetheless....
Back in the 90’s I considered storing my backups as encrypted stenographied or binary Usenet postings, as a kind of decentralized backup, postings which would stick around long enough for the next weekly backup. (Usenet providers had at least a couple of weeks of retention time back then.)
This gave me a flashback of VBS on amiga… video backup system, record composite video on a vcr, and simple op amp circuit that would decode black and white blobs of video pixels, could backup floppies at reading speed. Was really impressive until, well, vhs… ;)
Just did a google and saw it had evolved over the years, used only the 1.0 implementation back in the days. For those on another nostalgic trip : http://hugolyppens.com/VBS.html
I wonder if something similar could be useful for transmitting data optically, like an animated QR code. Maybe a good way to transmit data over an air gap for the paranoid?
Popularity of such projects is the reason of imposing more and more constraints on systems that are somewhat open (at least open to use). Maybe instead of figuring out how to abuse an easy-to-use system, people should figure out how to abuse hard-to-use systems, like e.g. creation of open protocols for closed systems. That would be an actual achievement.
Yes we have all done or used something similar when we were younger, but really, should this be on the front page of HN?
This is abuse of a popular service and if it becomes popular it will only make YouTube worse and YouTube is getting worse without any additional help.
This ones not the best but it works. I would recommend zipping everything and then using that as a single file. (file size limit is ~2GB fyi)
https://github.com/Quadmium/PEncode
I think my favorite part of this is that the example video linked to this has ads on it. It's a backup system that pays you. Well, until someone at Youtube sees it and decides to delete your whole account.
This reminds me of Blame! where humans are living light rats in the belly of the machine. Lol, also reminds me of the geocities days where we created 50 accounts to upload dragon ball z videos.
I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).
I believe YouTube supports random access, or otherwise you wouldn’t be able to jump around in a video. Youtube-dl also supports resuming downloads in the middle, I believe.
Each frame gets the same amount of the file, about a kilobyte. So each frame is basically a sector. You need to read in a few extra frames to undo the compression, but otherwise it's just like a normal filesystem. And reading in a batch of sectors at once is normal for real drives too.
Even if you did need the frames to be self-describing, you could just toss a counter/offset in the top left corner for less than 1% overhead.
I like this. The last wave of Twitter users into the fediverse caused my AWS bill to go up 10 USD a month. Might have to start storing media files on youtube instead ;)
Reminds me of the other post that used Facebook Messenger as transport layer to get free internet in places that internet is free if you use Facebook apps.
Very cool. I wonder how difficult it would be present a real watchable video to the viewer. Albeit low quality, but embed the file in a steganographic method. I think a risk of this tech is that if it takes off, YT might easily adjust the algorithms to remove unwatchable videos. Perhaps leaving a watchable video could grant it more persistence than an obvious data stream.
Sure, but the more structure your video has to have, the harder it becomes to hide information stenographically within it. Your information density will become very low I think.
I also “invented this idea” from scratch in a series that exists solely in my mind where I abuse a variety of free services for unintended purposes.
I could seemingly never explain the concept to other developers in a meaningful way or cared myself to code these out.
Anyway my quick summary in this is just think of a dialup modem. You connect to a phone line and you get like a 56k connection. That sucks today, sure, but actually it’s kind of mind blowing for how data transfer speeds worked at the time.
You know how else you can send data via a phone line without a modem? Just literally call someone and speak the data over the phone. You could even speak in binary or base64 to transfer data. It’s slow, but it still “works,” assuming the receiving party can accurately record the information and hear you.
That seems to be what this main topic is. Using a fast medium (video player) to slowly send data over the connection, like physically speaking the contents of other data. But there could be some problems with this approach.
Mainly, YouTube will always recompress your video. For this method, that means your colors or other literal video data could be off. This limits the range of values you can use in an already limited “speaking” medium.
if this wasn’t the case, we would like to use a modem connection. Just literally send the data and pretend it’s a video. However, where I left off on this idea, we appear to be hard blocked due to that YouTube compression.
We can write data to whatever we want and label it any other file type. (As a side note, Videos also are containers like zip that could be abused to just hold other files)
But YouTube is an unknown wildcard that changes our compression and thus our data which seems to invalidate all of this.
If we somehow convert an exe to an avi, The YouTube compression seems to just hard block this from working like we want. If we didn’t have that barrier, I think we could otherwise just use essentially corrupted videos to become other file types if we can download the raw file directly.
(steganography is a potential work around I haven’t explored yet)
Without these, we’re left to just speak the data over a phone which compresses our voice quality and in theory could make some sounds hard to tell apart. This leaves us in the battle of what language is best to speak to avoid compression limiting our communication. Is English best? Or is Japanese? What about German? Which language is least likely to cause confusion when speaking but also is fast and expressive?
This translates into what’s the best compression method for text or otherwise pixels in a video where data doesn’t get lost due to compression? Is literal English characters best? What about base64? Or binary? What if we zip it first and then base64? What if we convert binary code into hex colors? Does that use less frames in a video? Will the video be able to clearly save all the hex values after YouTube compression?
This works on the same principle as the video backup system (VBS) which we used in the 1980's and the early 1990's on our Commodore Amigas: if I remember correctly, one three hour PAL/SECAM VHS tape had a capacity of 130 MB. The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts.
SGI IRIX also had something conceptually similar to this "YouTubeDrive" called HFS, the hierarchical filesystem, whose storage was backed by tape rather than disk, but to the OS it was just a regular filesystem like any other: applications like ls(1), cp(1), rm(1) or any other saw no difference, but the latency was high of course.
That's how digital audio was originally recorded to tape back in the 1970s and 80s: encode the data into a broadcast video signal and record it using a VCR.
In the age of $5000 10 MB hard drives, this was the only sensible way to work with the 600+ MB of data needed to master a compact disc.
That's also where the ubiquitous 44.1 kHz sample rate comes from. It was the fastest data rate could be reliably encoded into both NTSC and PAL broadcast signals. (For NTSC: 3 samples per scan line, 245 scan lines per frame, 60 frames per second = 44100 samples per second.)
130 MB for the whole tape is not a lot. It equals to a floppy disk throughput, which is probably not a coincidence. However, basic soldering implies that the rest of the system acts like a big software-defined DAC/ADC.
Dedicated controllers were absolutely out of the question because nobody could afford them, which is why Amigas were so popular: a fully multitasking, multimedia computer for 450 DM. That's 225 EUR! Somebody that cost sensitive won't even consider a dedicated controller; back then wasn't like it's today.
This was at a time when 3.5" floppy disks were expensive (and hard to come by), and hard drives were between 40 - 60 MB, so 130 MB was quite practical. The floppy drive in the Amiga read and wrote at 11 KB / s.
And yes, this was a DAC and an ADC in software, with added Reed-Solomon error correction encoding and CRC32. The goal was to be economical. The end price was everything; it had to be as cheap as possible.
Not immediately obvious from the Readme, but does this rely on YT always saving a providing download of the original un-altered video file? If not, then it must be saving the data in a manner that is retrievable even after compression and re-encoding, which is very interesting.
The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.
Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)