Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin
How do game companies share massive files? (bbc.co.uk)
34 points by Libertatea on Nov 22, 2013 | hide | past | favorite | 44 comments


As someone said on twitter the other day:

  If "tech experts" writing news articles aren't really experts, what does that say about all the other experts..?


So basically rsync?

Honestly synchronizing 50GB of contents across the globe doesn't sound that terribly novel or difficult to me, or the article makes a pretty bad job of explaining the difficulty.


> That's because some of the complete game files were as large as 50GB

50GB is not the total size of the files needing to be transferred - some of the files were 50GB. Knowing the total size would have been interesting for the sake of discussion.


Mmmh, that's not how I understood it. I've never worked in game development but 50GB seems a reasonable amount of space for a full game. Also TFA mentions that the issue is synchronizing with testing centres, I doubt they send the whole source files there, only the current debug build I assume (but really, even if it's the full material in 2013 it shouldn't be an issue unless they're using dialup).

But at any rate the size of the individual files is not what matters. The important part is the total size of the game files and the average change rate. I would be surprised if they weren't able to synchronize everything with a good internet connection and a bunch of rsync scripts.


Source assets take much more space than their shipped version. E.g. you'd want to keep Photoshop layer files for textures that are compressed to 4bpp in the shipped game because it would be hard to edit the final version. If the final game takes 18GB (judging by the torrent files a quick search will reveal) I'd imagine its source assets could not be less than 100 GB.


As a game developer, our shipping assets are about 4.4GB, but our Tarsnap account (which we use for backup) has passed 1.2 TB.

We use a lot of rsync.


Yeah thats bollox , sorry but no game has 50 gb files on launch , and what kind of idiotic company would send round a file they weren't using in launch code base ?


TLDR; Internal workflows, not customer facing. Enterprise dropbox.

The article title is misleading. They are actually talking about the internal workflow.

Raw assets, art assets, and such, can be pretty large.

When you have a studio in Country A doing PC, a studio in country B doing ps3, all wanting the base raw data but (presumably) transforming it in different ways, between how the build actually happens or artists tweaking something for a platform, it'd become a nightmare pretty quickly if you were to send it across a company private network (often just VPN pipes across the internet itself).

The article is, at it's heart, an ad for a company called 'Panzura', which sounds like an enterprise-specific dropbox. Boxes 'on-premise', security guarantees, and probably pretty expensive.


The raw assets are generally a lot larger than what becomes present on your machine. An example would be normal mapping[1].

[1] http://en.wikipedia.org/wiki/Normal_mapping


I'm aware of at least one PS3 game port that had around 40GB of data at launch. I'd be surprised if that were the only one.

Most games' source data is much larger than the shipped data size for a number of reasons, mostly because files designed for human editing encode a lot of information not needed by the game itself, and game source data is aggressively compressed making it unsuitable for further editing. A full snapshot of the source data for a game like Battlefield 4 is likely to be on the order of hundreds of GB, maybe getting up to the TB range. And if multiple teams are working on the same source data, then the source data needs to be sent between them.

Also, before the final few months of development, games' runtime data will often significantly exceed the final shipped size because compression and quality tradeoffs have not been decided and applied.


Most software or games companies..

Much like how software doesn't always ship with source code, games don't ship with source assets.

Normal maps are generated from high-polygon meshes that do not ship with the finished product.

Final textures are the result of psd files with layers, reference materials, alternative versions. These do not ship with the finished product.

Maps are compiled from a source format. E.g. vmf -> bsp. The vmf does not ship with the finished product.



I thought the same. Sending only the delta of the content has been known for a long while...


Not gonna lie, when I saw "delta" I immediately assumed Git or some other source control solution. What boggles the mind is how these professional game developers recently discovered this technology. I didn't think Git was so exclusively beholden to the web development community.


Git solves a different problem; sending large binary blobs around is not it. As another comment said, this is essentially RSync, not git.

I've been out of game development for a year, but unfortunately git is not being taken up much by large game developers. The main reason cited is actually git's handling of binary files, which leads to every checkout containing many duplicates of huge files. However this is possible to work around and the benefits of git still make it worth it.


The binary file reason is big enough, but it's also the ability to check out a sub tree without checking out the entire Repo. The repo is 1TB and there really is no reason for your texture artist to be checking out the raw uncompressed audio files.

The other reason not to use git in game development is because it's incredibly hard to use, and more than half the game development team are not programmers and/or not technical.

Getting them to use good source control discipline is hard enough with TortoiseSVN. I shudder to think the insane mess we would have using git.


So one thing is to use separate repositories - either separate git repos, or use git for code + design data, and something like Perforce for assets. We did this at my first studio (Free Radical Design) and it worked very well.

As for git being 'hard', yes it's not trivial to learn, but for people using just perforce/svn etc. they end up misusing so much that it's almost worthless - as Linus said, even just sending around tarballs is better.

I think if you're running a team, you need to give your team the benefit of the doubt that they can learn new tools, learn to use them effectively, and be more productive. If you spend some time teaching them why the tools are important and how to use them, then using git isn't really any more difficult than anything else. It's just different.


This is how I've structured our use at the place I'm at. TortoiseGit is better than svn to me. Submodules are the key though in our situation these are best when they can exist in the same directory structure.

We create a master repo with all the design assets, statement of work, and any other useful files for the end product. Then we have a client repo as it were of the files we serve in the wamp www directory. Because this is physically different, submodules make little sense. For the WPF builds that all live together in the same directory structure, submodules are perfect. My only hiccup is remembering to commit the subs via the master but that's a habit I need to change. The files in the wamp repo are deployed to client computers using git pull with a read only ssh key so people aren't committing from them.

Our wamp files are primarily text either in straight html or usually PHP. We take care not to commit user editable sections so we don't overwrite what a customer ultimately changes. This is kind of a hand rolled version of the now many git deployment tools out there and it works rather well.

I know this isn't gaming nor are our source files that large but the psds can be astronomical very quickly on just simple web pages. I could only imagine what goes into "next gen games". GIT is still useful for binary deltas but you definitely have to carefully plan how you partition things in repositories. If I can do this, really almost anyone can. Kiln brought excellent binary support to Hg so there's really no need to stick to svn. There may be areas where its still useful but the distributed nature of dvcs is just too compelling when your team is well... distributed. I do hear great things about perforce too and I hope any company uses the better vcs not because they think the users can't cope with anything new, but there is extreme benefit for doing so. Again, if I could help our team move into Git with very little hand holding, you would probably be surprised how quickly people adapted. Especially if something like TortoiseGit isn't that much different than TortoiseSvn or TortoiseHg for that matter.


If you've got assets in multiple repositories you then have issues about leeping branches in sync across all of them. This is absolutely no fun.


That said, git does have a delta compression mechanism for reducing disk space and saving network bandwidth that is quite complex:

http://stackoverflow.com/questions/9478023/is-the-git-binary...


Git's compromises break down with large enough files. Some git operations still need to have the whole blob in memory at once.

Support for chunking of big files would possibly solve this, but the git project is not interested(?).


You wouldn't use git for massive binary files that change on a regular basis. Particularly when you only need your testers to have the latest version of them, not the complete history (which is what git would give them).


You are fantastically mistaken if you think game devs don't use source control. Heh.

Of course they do, and I'm ... let's say pretty sure that much of EA runs on http://www.perforce.com/.

Also weird that the article starts with a BF4 image, but then claims that BF4 was developed in Californa. I guess that hurts my national pride (go DICE!).


I think a lot of game developers actually use perforce because it handles large files better. This is at least what I've gather from eavesdropping some of them (e.g., Jon Blow) on twitter.


Perforce is very popular in Games, I think 4/5 studios I've worked for used it.

I still vastly prefer git. Perforce does have some pluses, but it's mostly people just going with what they know.


When you look at how non-programming projects are managed you often see a lot of the same issues that source control addresses. Files (and information in peoples brains) become out of sync, or need to be locked.

Programmers always focus on the technology of source control, but the difference it makes is also social. It forces a particular workflow that allows tasks to be carried out concurrently. This workflow could be adopted without necessarily needing git.


At some point Microsoft used a third-party solution called Aspera FASP to accept uploads of final game disc images. It was crazy fast - when you launched it, it reached our office' net connection speed within 10-15 sec, and other Internet connections at the office dropp.ed


I remember that! I worked at MS in one of the gaming capacities (no need to be specific) and it was unreal how close to maximum saturation FASP would get. Such an awesome tool.


Sohonet 10Gbit/s. Used primarily by post production companies in Soho, London.

http://www.sohonet.com/sohonet-media-network/


Sneakernets are still the fastest way to transfer data. http://en.wikipedia.org/wiki/Sneakernet


Not necessarily the cheapest, though - express shipping a bunch of disks is an uncertain venture ("oops, we accidentally dropped it a few times, is that bad?") and "I need a return cross-Atlantic flight ticket NOW" is sort of expensive.

That said, I have heard of companies actually couriering semi-large volumes of sensitive data this way.


Also in some area's there is a legal aspect due to law of some contract that makes sending somebody with a briefcase with data an easier way to move forward without triggering some technicality in contract or law.

Indeed some countries and there data protection laws make things most interesting in legal ways to transfer data, even today. No standard on data protection World Wide and in itself a minefield, let alone other industries like medical research companies or those dealing with country wide firewalls.


I think the article should mention that it is a Swedish based game developer company, and a California based publisher for the game.

Sounds more fair that way.


"That's because some of the complete game files were as large as 50GB, and future games with more advanced graphics for the new Xbox One console and Sony's PlayStation 4 are likely to be even bigger."

Ahm no, Battlefield 4 is already a PC game and scaled down even on the next gen consoles, so console processing power isnt the limiting factor here.


I have always wanted to know more about Valve's steam architecture.

Why aren't they presenting at more conferences damnit!


Above a certain point, it becomes far quicker to send a hard drive by courier.


That used to be true, but considering that hard drive transfer rates cap out around 200 MByte/s, you'd have to send a large bunch of drives that must be read and written in parallel to be faster than a network connection.


If you are constantly mirroring your media drives with multiple redundancy, you can just unplug a set and post them.


of all the technical challenges facing us in the 21st century, sending a 50GB file quickly across some wire doesn't seem like the biggest!

also the feedback i see about battlefield's stability and quality has been very poor (on ps4) so I had to laugh when I saw the " to locate defects and improve quality" quote :)


Sounds a lot like git or other typical solution.


git would be pretty bad in this situation. It sounds more like rsync.


Thanks, I'll look into that.


Sponsored article?


Innit. Someone at EA/Panzura's PR firm just got a raise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: