Rather than a guaranteed core and RAM as with N1/N2, resources for the underlying host can be dynamically balanced through live migrations, which GCP has already been using for years. Cool solution, and should work to save money for most workloads.
It’s a lot easier and safer to scale hosts horizontally than vertically. You can predict the limits and behavior of each host, the VMs/processes on each host don’t need to deal with fundamental resources changing, etc. For services I own that are high availability, require GC tuning, etc., these hosts with dynamic resource adjustments (also T2/T3 in AWS) are a nightmare because the behavior can change at runtime under load, exactly when I want it to behave predictably.
Oh definitely there are valid use cases for these, was just sharing my experience with them for my use cases.
We moved off of T2’s and back to C’s because of the unpredictable behavior under load. IIUC, T3s by default just bill you more instead of CPU throttling, which is a bit better for our use cases, but we haven’t tried them yet.
T3 look cheaper and better than E2 then, my only problem is region placement where Iowa and Taiwan are more central than anything AWS offers (still no central US region!?).
I'm in the MMO business, so very specific requirements.
T3 is pretty different (even in unlimited mode) than E2. As an example, t3.xlarge (4 vCPU, 16 GB, $.167/hr, so $.042/hr/vCPU roughly) only has a baseline performance of 40% (so 1.6 vCPU). If you cross that threshold in unlimited mode you pay an additional $.05/vCPU/hr (so more than doubling your cost). By comparison an e2-standard-4 is $.134/hour even if you run it flat out.
We take on the statistical multiplexing over the datacenter and move VMs around, instead of pushing it to you as an economic or performance-throttling risk when you need it most. If you want a burstable type, we do have an e2-{micro, small, medium} that only guarantees you 12.5%, 25% and 50% of your 2 guest-visible vCPUs. But that's more fit for dev workstations and so on.
Sorry if I was unclear. In unlimited mode, if you sustain greater than your baseline percentage, you pay for it (the key point of the sentence you’re quoting is that we take on the risk). One reason for this happens to be because AWS doesn’t do migration (yet?), but instead does an awesome job of doing in-place upgrades (see their talks on Nitro, for example).
We have many tools in our toolbox at our disposal: non-disruptive in-service updates moves live migration from a "must have to operate compute cloud service at all" to "helpful in some scenarios when the workload and/or situation warrants the impact to performance during precopy / potential post-copy phases."
But I would not assume that EC2 does not have that particular tool in the "fully production, and used" toolbox.
I have my doubts, in the past I've received decom-notifications that EC2 was going to be shutting down my instances in the near future due to underlying hardware failure (very helpful, since I was in the middle of triaging why the instance was behaving strangely). Seems like a poor customer experience to reap running instances if live migration is on the table.
T3 instances provide hyperthreadded vCPUs to EC2 instances, and the Nitro Hypervisor uses a core based scheduler (coscheduler) to ensure that cores are never shared between two EC2 instances.
Upstream Linux kernel changes that are based on some of the changes in the Nitro Hypervisor were posted to lkml in 2018: https://lwn.net/Articles/764482/
I hope to see the GCE team contributing more to the ongoing discussion on core based scheduling!
That doesn't really answer my question, if I have a t3-micro (which cores do not fill an entire physical core, so they are shared with others) am I guaranteed both of the cores for the instance are running on separate physical cores so that my two cores don't share one physical core?
This in order to allow for my server to continue operation if the steal rate of one core goes through the roof because some other instances running on my shared physical core are taking too many resources unexpectedly.
And how does Amazon explain still not having a central region in the US? I mean the multiplayer share of your revenues must be at least 10% by now?
I just managed to get a IONOS instance running in Kansas City (same distance from east/west-coasts) for low-and-behold 1€/month with unlimited data (18GB SSD and 512MB RAM). How is AWS going to compete with that?
A t3.micro has two vCPUs, where each vCPU is backed by a hyperthread of a physical core. Because the scheduler used by the Nitro Hypervisor core based scheduling (see [1]), the two vCPUs will always map to the two threads of a physical core. You will not run on two separate physical cores are the same time if you have only 2 vCPUs allocated to your T3 instance.
The scheduler can move where your vCPUs run based on available resources.
I can try to explain virtual machine CPU scheduling, but I can't explain when or where AWS will build new regions that have not been announced. :-)
Every search result I can see says that EC2 doesn't do live migration. You can try to balance things but you can only do so much if you can only move a VM when it happens to reboot by itself. (And there's no evidence I can find that they even do that.)
CPU hotplug has been supported for a long time. I once managed some Sun boxes that allowed replacing/upgrading CPUs without shutting down... They don't build em like that anymore.
Yes, but most workloads are fairly unprepared for this sadly. And they're really not ready for memory unplug. (I also miss the days of my multi socket boxes and plugging in CPUs and memory).
What do VM-guest memory-ballon drivers do right now when the host suddenly attempts to reserve more memory than the guest has free? I'd presume the kernel would just consider itself to be in an OOM condition, and start killing processes to free up the memory until it can return OK to the balloon driver, no?
Because, from what I understand, that's closer to the scenario we're talking about here: you're not abruptly yanking DIMMs (like physical memory hotplug); rather, you (the hypervisor) are gracefully letting the guest know that some memory is about to go away, and since you (the hypervisor) have your own virtual TLB, you can let the guest OS decide which "physical" memory (from its perspective) is going away, before it happens.
Linux and Windows have both supported it, but use tends to be at the fringes on mainframe/datacenter machines that are validated for it and so those paths aren't tested on a very wide variety of hardware and running applications. And adding CPUs and memory is one thing but removing is another.
CPU cores being hotplugged on & off was actually super common for a few years, and still is in a lot more devices than you'd expect.
It used to be a corner stone of power management on mobile devices. The Nexus 5, for example, would regularly run with just a single core online, hotplugging the other 3 off until hit with a load and then brought cores back online 1 by 1 as needed.
That behavior still is in some corners of the mobile world, but increasingly less so.
So the CPU hotplug path is as a result actually a lot more battle hardened than you'd expect, and a lot more consumer software than you'd think ran just fine in that setup without noticing.
I presume that this means that E2 instances won't have access to local scratch NVMe, since making use of local scratch NVMe disks currently prevents any feature that requires a live migration, like auto-migration on host maintenance, or modifying the VM's specs while stopped (as you can't stop VMs with local storage, only terminate them permanently.)
"Compute Engine can also live migrate instances with local SSDs attached, moving the VMs along with their local SSD to a new machine in advance of any planned maintenance." [1]
The pricing level for the ARM instances should be interesting. If Amazon prices them well below their Intel and AMD instance types, they could really drive adoption and lock-in.
> If Amazon prices them well below their Intel and AMD instance types, they could really drive adoption and lock-in.
Adoption? Sure.
Lock-in? Umm...
Most servers run Linux, and most software on Linux is distributed as source. The same reason that people can easily move to ARM - they can just recompile/download software for the correct architecture and everything mostly works - is the same reason they can leave easily.
All the other AWS proprietary APIs: sure. It'd take a stupendous amount of work for Netflix to migrate away from AWS. But running on ARM isn't really part of that.
> Most servers run Linux, and most software on Linux is distributed as source.
Being distributed as source code does not mean the source code is not architecture-specific. A lot of software has SIMD-optimized code paths, and it's common for these SIMD code paths to target SSE2 as the least common denominator (since AMD included SSE2 as part of the baseline of its 64-bit evolution of Intel's 32-bit architecture, every 64-bit AMD or Intel CPU on the market will have at least SSE2), so they have to be ported to NEON. And that's before considering things like JIT or FFI code.
> and most software on Linux is distributed as source
Clearly you aren't purchasing much software. If you're one of the (majority of) large software companies that use precompiled binaries from vendors for any of the components in your systems, you're at the mercy of which architectures your vendor(s) support.
Yeah exactly, you're not going to skimp on the CPU by choosing ARM to save $500 a year when you have software that costs you thousands every single year. You're going to stick with x86_64 instead of the crappy budget option that doesn't even run your software.
Twist: enterprise software goes the way of IBM mainframes, with a small number of large customers paying crazy prices for compliance and customization, and everything in the hardware/software stack being 10+x more expensive than its commodity equivalent.
Not every corporation has proprietary x86 binaries they need to run on a server. Every company I have worked at has had entirely open source server software and a few proprietary javascript libraries.
How many of those "precompiled binaries" are truly performance critical? If it's just a single component, you can run it in qemu-user mode and still come out ahead overall.
How many of those precompiled binaries are performance critical? How about all?
From the software that runs for hours to compile a model, the software that calculates results, to the interactive analysis software that needs to load GBs of data.
The whole reason to run these things on a server farm is that you need large and fast machines that are better shared to make sure they get optimal use of the machine and of the license pool.
Is this the archetypal AWS customer that drives the majority of intel’s sales to AMZN? That not only has lock-in with a closed source vendor, but that vendor is not agile enough to give ARM binaries for AWS workloads? Doubtful IMO. Maybe for EDA and CAD/CAM setups, but somehow I doubt those are enough to keep Intel as we know them afloat. Intel has huge reason to fear ARM on the server.
I was thinking about server farms in general, not AWS specifically. You’re probably right that AWS-type server are more likely to skew towards software that is available in source form.
Does anyone run proprietary software in a cloud context?
The whole point is that the hardware becomes as flexible as the software, spin up, spin down, blow it away and deploy a fresh instance if something goes wrong.
I can't see many people trying to do that with proprietary licensing?
Absolutely. You can even get the license and support contract costs built into the instance’s hourly rate. RHEL, Oracle DB, Windows Server, and SQL Server are all offered this way.
Amazon doesn’t have to offer ARM based CPU instances to customers to benefit. If it just runs it own offerings on ARM like the servers that run load balancers, SNS, SQS, etc where they control the software they can save a lot of money.
It does say at the top "originally written in 1998" and it was basically amended over time, but regardless I still love learning about "historical" views.
Example: A textbook from '95 had a pretty large section on "the future" of ECC and "current" trends in research. It was interesting seeing what came true, or what was still being worked on.
[Edit] Sorry for the snark. In a market economy, the only way companies like Apple might act in the same way that compassionate human beings would, is if it were in the interests of their bottom line. Perhaps articles like this will hit critical mass, and Apple will put out a statement, carefully composed to ensure that China doesn't lose face. Perhaps they'll do something tangible.
Come back 12 months later and nothing will have changed.
This is a false dilemma. There are plenty of publicly listed companies that also engage in moral behaviors. Pretending it's an either/or situation only serves to excuse the behavior and derail the discussion.
We should support companies that act morally, and punish others by not buying their products and criticizing them.
But expecting them to act morally out of their own volition seems foolish to me. Note also that I explicitly said publicly traded companies. Large ones always have a diverse ownership structure with institutional investors. A CEO that hurts profits by acting in the public interest will probably be replaced pretty soon, forced to fall in line, or have to otherwise do enough to offset.
The only way to motivate them is external pressure or benefits.
Eg Apple has probably decided to focus on user privacy to provide a contrast with Google and the Android ecosystem and maintain a good relationship with customers, not because privacy is "good".
The situation is somewhat different with private companies, where there are few owners that can actually have an impact.
Care to share some hard evidence to support that? Privately held companies certainly have some more freedom in their actions, but public corpos are a slave to the shareholder.
Public corpos only incentive is to create value for their shareholders.
Your elected government is illegally recording my calls, unless you believe the post-Snowden claim that they stopped (and there isn't any reason to). Why should I expect ethical behavior from this entity? I'd rather choose which software I route my data through than have the government "regulate" all of my options to include Clipper chips. I don't want the government to have any say about what I do with my data.
I do this personally and it's great. I've even set this up for extended family. But any real solutions to the problem will not be technological, they will have to be legislative...
Given that no reasonable legislation will likely pass against adtech, I'll be stuck buying rpis as gifts again for the next several years.
FreeBSD has always been so far behind in basic security migitations. It is 2020, and FreeBSD still doesn't have ASLR.