While I can't say much about its handling when using it distributed, I have had some negative experiences with MinIO/ceph when handling files > 10G.
One example: missing error handling for interrupted uploads leading to files that looked as if they had been uploaded, but had not.
Both ceph and MinIO's implementations differ from AWS original S3 server implementation, in subtle ways. ceph worked more reliably, but IIRC, both for MinIO and ceph, there is no guarantee that a file you upload is readable directly after upload. You have to poll if it is there, which might take a long time for bigger files (I guess because of the hash generation). AWS's original behavior is to keep the socket open until you can actually retrieve the file, which isn't necessary better, as it can lead to other errors like network timeouts.
I got it working halfway reliably by splitting uploads into multiple smaller files, and adding retry with exponential backoff. Then I figured out that using local node storage and handling distribution manually was much more efficient for my use case.
So for larger use cases, I'd take the 'drop in' claim with a grain of salt. YMMV :)
One example: missing error handling for interrupted uploads leading to files that looked as if they had been uploaded, but had not.
Both ceph and MinIO's implementations differ from AWS original S3 server implementation, in subtle ways. ceph worked more reliably, but IIRC, both for MinIO and ceph, there is no guarantee that a file you upload is readable directly after upload. You have to poll if it is there, which might take a long time for bigger files (I guess because of the hash generation). AWS's original behavior is to keep the socket open until you can actually retrieve the file, which isn't necessary better, as it can lead to other errors like network timeouts.
I got it working halfway reliably by splitting uploads into multiple smaller files, and adding retry with exponential backoff. Then I figured out that using local node storage and handling distribution manually was much more efficient for my use case.
So for larger use cases, I'd take the 'drop in' claim with a grain of salt. YMMV :)