Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (56)

On Amazon Prime Video’s move to a monolith May 14, 2023 One size does not fit all: neither cloud nor on-prem Apr 10, 2023 Comparing AWS SQS, SNS, and Kinesis: A Technical Breakdown for Enterprise Developers Feb 11, 2023 Stop Shakespearizing Sep 16, 2022 Using GNU Make with JavaScript and Node.js to build AWS Lambda functions Sep 4, 2022 Monolithic repository vs a monolith Aug 23, 2022 Keep your caching simple and inexpensive Jun 12, 2022 Java is no longer relevant May 29, 2022 There is no such thing as one grand unified full-stack programming language May 27, 2022 Best practices for building a microservice architecture Apr 25, 2022 TypeScript is a productivity problem in and of itself Apr 20, 2022 In most cases, there is no need for NoSQL Apr 18, 2022 Node.js and Lambda deployment size restrictions Mar 1, 2021 Should we abolish Section 230 ? Feb 1, 2021 TDWI 2019: Architecting Modern Big Data API Ecosystems May 30, 2019 Microsoft acquires Citus Data Jan 26, 2019 Which AWS messaging and queuing service to use? Jan 25, 2019 Using Markov Chain Generator to create Donald Trump's state of union speech Jan 20, 2019 Let’s talk cloud neutrality Sep 17, 2018 A conservative version of Facebook? Aug 30, 2018 TypeScript starts where JavaScript leaves off Aug 2, 2017 Design patterns in TypeScript: Chain of Responsibility Jul 22, 2017 I built an ultimate development environment for iPad Pro. Here is how. Jul 21, 2017 Rather than innovating Walmart bullies their tech vendors to leave AWS Jun 27, 2017 Emails, politics, and common sense Jan 14, 2017 Don't trust your cloud service until you've read the terms Sep 27, 2016 I am addicted to Medium, and I am tempted to move my entire blog to it Sep 9, 2016 What I learned from using Amazon Alexa for a month Sep 7, 2016 Amazon Alexa is eating the retailers alive Jun 22, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 What can we learn from the last week's salesforce.com outage ? May 15, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 Managed IT is not the future of the cloud Apr 9, 2016 JavaScript as the language of the cloud Feb 20, 2016 Our civilization has a single point of failure Dec 16, 2015 Operations costs are the Achille's heel of NoSQL Nov 23, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 Setting Up Cross-Region Replication of AWS RDS for PostgreSQL Sep 12, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Big Data is not all about Hadoop May 30, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 Guaranteeing Delivery of Messages with AWS SQS May 9, 2015 We Need a Cloud Version of Cassandra May 7, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Exploration of the Software Engineering as a Profession Apr 8, 2015 Finding Unused Elastic Load Balancers Mar 24, 2015 Where AWS Elastic BeanStalk Could be Better Mar 3, 2015 Trying to Replace Cassandra with DynamoDB ? Not so fast Feb 2, 2015 Why I am Tempted to Replace Cassandra With DynamoDB Nov 13, 2014 How We Overcomplicated Web Design Oct 8, 2014 Infrastructure in the cloud vs on-premise Aug 25, 2014 Cassandra: a key puzzle piece in a design for failure Aug 18, 2014 Cassandra: Lessons Learned Jun 6, 2014

Building a Supercomputer in AWS: Is it even worth it ?

April 13, 2015

[caption id="attachment_245" align="aligncenter" width="300"]Columbia Supercomputer Photo credit Scott Beale Columbia Supercomputer
Photo credit Scott Beale[/caption]

The fact that Cray is still around is mind boggling. You'd think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World's Fastest Systems:
“This, to IDC’s knowledge, is the largest supercomputer sold into the O&G sector and will be one of the biggest in any commercial market,” the report stated. “The system would have ranked in the top dozen on the November 2014 list of the world’s Top500 supercomputers.”

Building one of the dozen fastest supercomputers isn’t new for Cray – they’ve got three in the current top 12 now. But what is unique is that most of those 12 belong to government research labs or universities, not private companies. This may be starting to change, however. For example, IDC notes that overall supercomputing spending in the oil and gas sector alone is expected to reach $2 billion in the period from 2013-2018.

Supercomputers come with astronomical costs:
So, you’re in the market for a top-of-the-line supercomputer. Aside from the $6 to $7 million in annual energy costs, you can expect to pay anywhere from $100 million to $250 million for design and assembly, not to mention the maintenance costs

In the 1990s I was involved in a student project to build a Linux Beowulf cluster out of commodity components. It involved a half a dozen quad-core servers, with something like a gigabyte of RAM each. It cost a fortune, and it required us to obtain NSF funding for the project. I don't recall the exact details.

I know, that a similarly configured modern cluster in AWS would cost a few hundred bucks a month if it was used continuously. But even the cluster we built at Clarkson was not used 24/7, and so if done right the same cluster would have cost a fraction of that in AWS.

Turns out I am not the only one who had an idea to build a Beowulf cluster in AWS:
After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

That last statement is important. Setting aside the costs, imagine the red tape required to put something like that together with the help of your on-premise IT department ?

At an AWS Summit a couple of years ago Bristol-Myers Squibb gave a talk on running drug trial simulations in AWS:
Bristol-Myers Squibb (BMS) is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines that help patients prevail over serious diseases. BMS used AWS to build a secure, self-provisioning portal for hosting research so scientists can run clinical trial simulations on-demand while BMS is able to establish rules that keep compute costs low. Compute-intensive clinical trial simulations that previously took 60 hours are finished in only 1.2 hours on the AWS Cloud. Running simulations 98% faster has led to more efficient and less costly clinical trials and better conditions for patients.

If I interpret that case study correctly BMS didn't even bother with an on-premise supercomputer for this.

AWS of course is happy to oblige:
AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

So, what would it cost to setup one of the worlds most powerful supercomputers in AWS and run it for one month ? I fully realize that this may not be a very accurate discussion, but let's humor ourselves and try to imagine the biggest of the Top 500 Supercomputers built in AWS.

As of June 2013, the biggest super computer was at National University of Defense Technology in China, and it had 3,120,000 CPU cores. Let's eyeball this in AWS using Amazon's cost calculator. I put together a coupe of different HPC configurations.

Amazon's g2.2xlarge instances have 8 cores and 15 gigabytes of RAM each. To get to the 3,120,000 cores one would need 390000 instances, which would cost $185,562,000.00 for a month, not including business support.

If you use No-Upfront Reserved for 1 year, the cost becomes $134,947,800.00 per month for a year. Three Year All-Upfront Reserved costs $2,889,900,000.00 up front and $100 a month.

Now, here is an important factor. On premises you have to build out the maximum capacity you will ever use, but in the cloud you can dynamically scale up and down as required by your workload. Whereas supercomputing was the domain of governments and wealthy corporations, it is now within reach of anyone building out in AWS.

Let's try this with c4.8xlarge. On-demand this costs $119,901,600.00 a month. Three Year All-Upfront is $1,609,335,000.00 .

Of course, we don't even know if such a thing is even possible on AWS -- to quickly spin up a few hundred thousand servers. How long would that take ? This would probably require a conversation with AWS sales, and probably a volume discount. But either way, something tells me that for such large specialized computational workloads it would be naive to assume that building a supercomputer in the cloud would be cheaper.

This is why renting supercomputing time is still more efficient than both owning one or trying to spin one up in the cloud.