Time to rethink the infrastructure? Why AWS is bad news.

Owl_Superb · January 28, 2022

Dear NQ,

I had to make a forum account to make this post, but it's been brewing for a long time:

It is painfully, nakedly, embarrassingly obvious that the latest changes to Dual Universe were dictated by factors outside of the current game play or the community needs/requests. You are not hiding that you're dealing with server costs - whether computations or monetary - and removing tunnels, taxing tiles, limiting voxel complexity, and now limiting cores are all remedies that seem to be aimed at your daily AWS bill.

AWS is NOT the right infra for DU to use. Any working IT professional knows that AWS is only good for:
a) Prototyping
b) Established businesses with healthy margins and relatively static costs, with occasional high-demand bursts of money-bringing activity.

Dual Universe is out of prototyping, but the business is not established and your AWS costs seem to be ballooning relative to your revenue. How else can the recent changes be explained?

I don't think anybody asked you to remove the hand-mining completely, I was expecting auto-miners to just pick up the ore from the tile so I don't have to. And once it's gone, we pack up and move the MUs to a new tile. There is no justification for taxes on tiles at all: not from the game lore perspective, nor for any problems for players. If anything the only request from players was to deal with towers that are above >1Km height, and then you auto-HQ'd them at the last moment.

And do you think anybody wants their constructs to be more plain? I get it - you compress the constructs for storage, and the more complex structures don't compress as small. But that's none of my problem as a paying customer, and I certainly don't want to hear about it and I don't want the game I'm paying for to change for the worse because of your failure to build an RL business model or to change me enough money.

And now with the core limit changes you're completely over the top. I'm sick and tired of you dumping your problems on us. Maybe you're all eating instant noodles and nobody got their X-mas bonus, I'm sorry about that, but you don't need to punish players for it. There are plenty of people who would pay more money if you could just show that you'll use it for the benefit of the players:

1. Hire a creative director who will oversee the direction of the game from start to finish. We need lore, we need story, we need immersion. Most of all we need the changes to make sense in the context of the game, and not make us worry about your RL business model.

2. Hire a technical director who will oversee the transition of your infrastructure to a self-hosted or dedicated-hosted model with CDN integration like CloudFlare that can be used to deliver unlimited game assets over high speed HTTP for a laughable cost of $200/mo. Run from AWS, it's killing you, like it has killed many-many start-ups.

3. Hire an economic director, or two, who would oversee your RL business model and the in-game business models that are available to players and the over-all economy and its direction.

I understand that you'll need money for all of that, but you have a very intelligent game that attracts mature and established players, who can save you financially if only you can renew our hope that you truly intend to make the best space game possible. You have an excellent start so far, but you've hit a rut that would require creative financial models to overcome.

We as paying customers don't want to lose any game features and complexity, ever. Just like we wouldn't want our own house basements to get randomly filled with dirt. We'd pay for it to stay dirt-free, but you just won't give us the option.

We want you to build the best space game ever and we want it to become popular and we want to be able to sell/trade currency and accounts down the line for real money like all successful MMOs do. There are so many aspects of monetization that you're not capitalizing, yet you're obviously suffering an urgent financial crisis. Worse off you're wringing yourself and the player base with endless stress, which can seemingly be resolved, if only you could stand back and reassess what it is you are planning to create.

Please take notice of this pivotal moment and open up the honey, even if it comes with an increased price tag of multiple tier VIP memberships, paid core limits, paid talent boost implants and so on. Use whatever remaining hardcore player base you have left to soar on top, not sink to the bottom.

Good luck!

Nosomu · January 28, 2022

Well said

Zeddrick · January 28, 2022

Completely disagree with your point about AWS. Having actually done this sort of thing in a small company and saved a massive amount of money with AWS over the hosted solution (as in the AWS bill after moving 200 servers was lower than the *power bill* for the hosted solution) I can tell you that AWS is not as expensive as it looks.

Of course, as with anything, you can do it wrong and it can cost too much. But there are a lot of hidden costs with hosted solutions which you just don't have in AWS (including not needing to hire as many people). Done properly (which means not trying to make something which looks like a hosted solution in AWS) a lot of money could be saved here.

One good example for an MMO is that the server load is unlikely to be constant. There will be times in the day/week when a lot of compute is needed and times when it isn't. With AWS you can scale up and down to accomodate that but with a hosted solution you usually need to buy what you need for peak capacity and have that run idle when not needed.

blundertwink · January 28, 2022

3 hours ago, Zeddrick said:

One good example for an MMO is that the server load is unlikely to be constant. There will be times in the day/week when a lot of compute is needed and times when it isn't. With AWS you can scale up and down to accomodate that but with a hosted solution you usually need to buy what you need for peak capacity and have that run idle when not needed.

This is true, but maybe not the entire story.

An MMO needs a huge fleet of idle servers to properly handle scale with AWS -- it can't completely rely on AWS auto-scaling techniques because it isn't fast enough for gaming. Not with traditional EC2.

Especially in the context of a single shard system , auto-scaling isn't that effective because of the bursty/spiky nature of game traffic.

A group of players meeting for a battle will demand extra capacity almost immediately (especially with multi-crew ships) -- by the time AWS has scaled up those instances, the battle will likely be over. So it isn't just users logging in, it's the dynamics of how they play that dramatically impacts scale...especially when all your players share one physical game space.

The only way to smooth out these spikes is to maintain a fleet of idle servers that can buy time as the rest of the system scales, since scaling isn't fast for gaming.

Which....starts to lean toward a more traditional datacenter approach, because even with AWS you'll have to pay for idle servers to have good performance.

Quote

2. Hire a technical director who will oversee the transition of your infrastructure to a self-hosted or dedicated-hosted model with CDN integration like CloudFlare that can be used to deliver unlimited game assets over high speed HTTP for a laughable cost of $200/mo. Run from AWS, it's killing you, like it has killed many-many start-ups.

Yet on the other side....you can't just throw "game assets" on a CDN and call it a day...that's now how it works. Regardless, AWS has a CDN that's very competitively priced. It's likely NQ uses it.

A CDN is not a server. It isn't a database. Suggesting a CDN as a solution is like suggesting that Google use FTP to run their search...

I would wager AWS has helped empower more startups at lower cost than it has killed -- but to be fair even Amazon had embarrassing issues with scale when launching their MMO on AWS, so it doesn't have the best track record in the context of MMOs.

Regardless, this is an 8-year-old project...the time to switch hosts was long, long ago. It's far too late to pick everything up and migrate to a different hosting model.

blazemonger · January 28, 2022

There is some excellent private cloud solutions which will enable NQ to have far greater flexibility and ability to scale quickly. At the time NQ started their journey, these were really not yet in place.

NQ has sofar however not shown an understanding of this nor have they actively engaged in opportunities presented to them that would allow them to take advantage of such options.

Zeddrick · January 28, 2022

2 hours ago, blundertwink said:

This is true, but maybe not the entire story.

An MMO needs a huge fleet of idle servers to properly handle scale with AWS -- it can't completely rely on AWS auto-scaling techniques because it isn't fast enough for gaming. Not with traditional EC2.

Especially in the context of a single shard system , auto-scaling isn't that effective because of the bursty/spiky nature of game traffic.

A group of players meeting for a battle will demand extra capacity almost immediately (especially with multi-crew ships) -- by the time AWS has scaled up those instances, the battle will likely be over. So it isn't just users logging in, it's the dynamics of how they play that dramatically impacts scale...especially when all your players share one physical game space.

The only way to smooth out these spikes is to maintain a fleet of idle servers that can buy time as the rest of the system scales, since scaling isn't fast for gaming.

Which....starts to lean toward a more traditional datacenter approach, because even with AWS you'll have to pay for idle servers to have good performance.

I disagree again here. If there is a large quantity of idle servers then that's doing it wrong. You can spin up nodes surprisingly fast in AWS (well under a minute) and you should be able to predict when this is necessary. Also in games like this particular areas tend to be serviced by individual servers, so a large battle would typically all run on one server. That's how EvE works, for example. So really the number of servers in use at any given time would depend on how many people are logged in and that is definitely predictable with the time taken to launch new servers not being completely different than the time taken to log into the game in the first place.

And even if the game does require idle servers, it will probably only require them for a few hours of the day and you *will* have enough notice to create them when the logins spike or when people start moving near to each other.

Also there are some aspects of the game which would work with a CDN. Game updates for example. ALso I imagine there's a large amount of static data which gets updated with modifications. It certainly seems that when I get far away from ground I modified or when I go in with a surrogate the game starts with the original ground shape and then applies changes to that.

This is all speculation of course, I'm just challenging the assumption that AWS is bad for a small company to be built up on. Having actually built one on AWS I found it to be perfect, letting us have access to the sort of technology we could never have had otherwise because we couldn't afford the large upfront investments and longterm financial commitments required to have those things. IMO AWS is perfect for small businesses and it's only when you get to the medium scale (1000s of people and a high 6 figure annual AWS bill) that you can start to think about doing better with hosting.

Zeddrick · January 28, 2022

2 hours ago, blazemonger said:

There is some excellent private cloud solutions which will enable NQ to have far greater flexibility and ability to scale quickly. At the time NQ started their journey, these were really not yet in place.

NQ has sofar however not shown an understanding of this nor have they actively engaged in opportunities presented to them that would allow them to take advantage of such options.

The problem with private clouds, though, is capacity and availability. Say you need 10T of RAM and >10,000 cores at peak but less than 1/10th of that for 70% of the time. Private clouds are unlikely to have that sort of kit sitting around waiting to go so it's not certain that you will be able to have that. With AWS they can just kill some spot instances and give you what you need right away. I once started a system with 40T RAM and got all the nodes I needed in under 2 minutes. It was definitely expensive but I had a system which would cost millions to buy and was paying 100s per hour for it.

It depends how much flexibility you need really ...

CptLoRes · January 28, 2022

You use both..

For a game like this I would self host the servers needed for the regular players numbers, and then you use cloud to acquire short term compute when needed.

But if the game is built around AWS then it is not a trivial thing to move away, since AWS services are carefully designed so that you can't easily switch to something else afterwards.

Doombad · January 28, 2022

AWS is not a cost effective way to host DU. That said, shifting the model, at this point, is probably cost prohibitive. I doubt NQ has the capital to build out a hosted solution.

Zeddrick · January 28, 2022

Just now, CptLoRes said:

You use both..

For a game like this I would self host the servers needed for the regular players numbers, and then you use cloud to acquire short term compute when needed.

But if the game is built around AWS then it is not a trivial thing to move away, since AWS services are carefully designed so that you can't just easily switch to something else afterwards.

Again, it depends on how much flexibility you need here. Generally speaking you want all the servers in your cluster 'close' to each other -- meaning in the same datacenter. Otherwise the round-trip times on the interactions start to mean that you can't really get as much power as you think you're going to get. Generally speaking this means that if your usage is flat you could use something like self-host but if it's going to spike or scale (in two directions) you can't do that.

Say you're DU, for example, and you do a big beta launch event. The servers start to creak and grind, people are queueing longer than normal. You scale up and all is well. Then you make a mistake with your 0.23 update (for example) and 80% of the players leave. If you were self-hosted you are now screwed because when you scaled up you probably bought servers on a 5 year loan, extended your switching capacity way beyond what you now need, signed a new 1-year deal with a hosting provider, bought new SAN shelves for storage that your ops people still haven't finished getting online or whatever.

With AWS you just turn them off again.

What that means is that you really want to start this stuff off in a public cloud with lots of cheaply available power and lots of pre-made solutions to some of your problems (like backup) and then just grow and focus on making a good game. Then later when it's stable and big enough that it's unlikely to have a sudden influx of players (or outflux when New World opens or whatever) you look to more static solutions and compare the price.

blazemonger · January 28, 2022

3 hours ago, Zeddrick said:

The problem with private clouds, though, is capacity and availability. Say you need 10T of RAM and >10,000 cores at peak but less than 1/10th of that for 70% of the time. Private clouds are unlikely to have that sort of kit sitting around waiting to go

It depends how much flexibility you need really ...

While I take it you are exaggerating with 10,000 cores here for demonstartion purposes and not in context fo DU, a solution like HPE Greenlake would offer just that, it's literally pay as you go/need where you can have your max expected capacity on premise ready to be scaled up to when needed. The hardware sits on Premise on the edge between local DC and (private) cloud.

Zeddrick · January 28, 2022

1 hour ago, blazemonger said:

While I take it you are exaggerating with 10,000 cores here for demonstartion purposes and not in context fo DU, a solution like HPE Greenlake would offer just that, it's literally pay as you go/need where you can have your max expected capacity on premise ready to be scaled up to when needed. The hardware sits on Premise on the edge between local DC and (private) cloud.

Am not a huge fan of HPE. Haven't been involved with it for a few years but it was always the sort of organisation where you had a problem, tried to use HPE to solve it and then you had 2 problems. Or 3 or 4.

Again, it depends on how available and flexible the hardware really is and what you need. Can they give you 2x as many nodes for 1 hour a day and only charge 5% extra for example?

Time to rethink the infrastructure? Why AWS is bad news.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in