Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (45)

The future is bright Mar 30, 2025 On Amazon Prime Video’s move to a monolith May 14, 2023 One size does not fit all: neither cloud nor on-prem Apr 10, 2023 Some thoughts on the latest LastPass fiasco Mar 5, 2023 Comparing AWS SQS, SNS, and Kinesis: A Technical Breakdown for Enterprise Developers Feb 11, 2023 There is no such thing as one grand unified full-stack programming language May 27, 2022 Which AWS messaging and queuing service to use? Jan 25, 2019 Using Markov Chain Generator to create Donald Trump's state of union speech Jan 20, 2019 Adobe Creative Cloud is an example of iPad replacing a laptop Jan 3, 2019 Facebook is the new Microsoft Apr 14, 2018 Leaving Facebook and Twitter: here are the alternatives Mar 25, 2018 Rather than innovating Walmart bullies their tech vendors to leave AWS Jun 27, 2017 Architecting API ecosystems: my interview with Anthony Brovchenko of R. Culturi Jun 5, 2017 TDWI 2017, Chicago, IL: Architecting Modern Big Data API Ecosystems May 30, 2017 Online grocers have an additional burden to be reliable Jan 5, 2017 Windows 10: a confession from an iOS traitor Jan 4, 2017 What I learned from using Amazon Alexa for a month Sep 7, 2016 Why I switched to Android and Google Project Fi and why should you Aug 28, 2016 Amazon Alexa is eating the retailers alive Jun 22, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 What can we learn from the last week's salesforce.com outage ? May 15, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 Our civilization has a single point of failure Dec 16, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 Setting Up Cross-Region Replication of AWS RDS for PostgreSQL Sep 12, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 Big Data Should Be Used To Make Ads More Relevant Jul 29, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 Attracting STEM Graduates to Traditional Enterprise IT Jul 4, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 Guaranteeing Delivery of Messages with AWS SQS May 9, 2015 We Need a Cloud Version of Cassandra May 7, 2015 The Clarkson School Class of 2015 Commencement speech May 5, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Where AWS Elastic BeanStalk Could be Better Mar 3, 2015 Trying to Replace Cassandra with DynamoDB ? Not so fast Feb 2, 2015 Why I am Tempted to Replace Cassandra With DynamoDB Nov 13, 2014 Infrastructure in the cloud vs on-premise Aug 25, 2014 Cassandra: a key puzzle piece in a design for failure Aug 18, 2014 Cassandra: Lessons Learned Jun 6, 2014 Things I wish Apache Cassandra was better at Feb 12, 2014

Building a Supercomputer in AWS: Is it even worth it ?

April 13, 2015

[caption id="attachment_245" align="aligncenter" width="300"]Columbia Supercomputer Photo credit Scott Beale Columbia Supercomputer
Photo credit Scott Beale[/caption]

The fact that Cray is still around is mind boggling. You'd think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World's Fastest Systems:
“This, to IDC’s knowledge, is the largest supercomputer sold into the O&G sector and will be one of the biggest in any commercial market,” the report stated. “The system would have ranked in the top dozen on the November 2014 list of the world’s Top500 supercomputers.”

Building one of the dozen fastest supercomputers isn’t new for Cray – they’ve got three in the current top 12 now. But what is unique is that most of those 12 belong to government research labs or universities, not private companies. This may be starting to change, however. For example, IDC notes that overall supercomputing spending in the oil and gas sector alone is expected to reach $2 billion in the period from 2013-2018.

Supercomputers come with astronomical costs:
So, you’re in the market for a top-of-the-line supercomputer. Aside from the $6 to $7 million in annual energy costs, you can expect to pay anywhere from $100 million to $250 million for design and assembly, not to mention the maintenance costs

In the 1990s I was involved in a student project to build a Linux Beowulf cluster out of commodity components. It involved a half a dozen quad-core servers, with something like a gigabyte of RAM each. It cost a fortune, and it required us to obtain NSF funding for the project. I don't recall the exact details.

I know, that a similarly configured modern cluster in AWS would cost a few hundred bucks a month if it was used continuously. But even the cluster we built at Clarkson was not used 24/7, and so if done right the same cluster would have cost a fraction of that in AWS.

Turns out I am not the only one who had an idea to build a Beowulf cluster in AWS:
After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

That last statement is important. Setting aside the costs, imagine the red tape required to put something like that together with the help of your on-premise IT department ?

At an AWS Summit a couple of years ago Bristol-Myers Squibb gave a talk on running drug trial simulations in AWS:
Bristol-Myers Squibb (BMS) is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines that help patients prevail over serious diseases. BMS used AWS to build a secure, self-provisioning portal for hosting research so scientists can run clinical trial simulations on-demand while BMS is able to establish rules that keep compute costs low. Compute-intensive clinical trial simulations that previously took 60 hours are finished in only 1.2 hours on the AWS Cloud. Running simulations 98% faster has led to more efficient and less costly clinical trials and better conditions for patients.

If I interpret that case study correctly BMS didn't even bother with an on-premise supercomputer for this.

AWS of course is happy to oblige:
AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

So, what would it cost to setup one of the worlds most powerful supercomputers in AWS and run it for one month ? I fully realize that this may not be a very accurate discussion, but let's humor ourselves and try to imagine the biggest of the Top 500 Supercomputers built in AWS.

As of June 2013, the biggest super computer was at National University of Defense Technology in China, and it had 3,120,000 CPU cores. Let's eyeball this in AWS using Amazon's cost calculator. I put together a coupe of different HPC configurations.

Amazon's g2.2xlarge instances have 8 cores and 15 gigabytes of RAM each. To get to the 3,120,000 cores one would need 390000 instances, which would cost $185,562,000.00 for a month, not including business support.

If you use No-Upfront Reserved for 1 year, the cost becomes $134,947,800.00 per month for a year. Three Year All-Upfront Reserved costs $2,889,900,000.00 up front and $100 a month.

Now, here is an important factor. On premises you have to build out the maximum capacity you will ever use, but in the cloud you can dynamically scale up and down as required by your workload. Whereas supercomputing was the domain of governments and wealthy corporations, it is now within reach of anyone building out in AWS.

Let's try this with c4.8xlarge. On-demand this costs $119,901,600.00 a month. Three Year All-Upfront is $1,609,335,000.00 .

Of course, we don't even know if such a thing is even possible on AWS -- to quickly spin up a few hundred thousand servers. How long would that take ? This would probably require a conversation with AWS sales, and probably a volume discount. But either way, something tells me that for such large specialized computational workloads it would be naive to assume that building a supercomputer in the cloud would be cheaper.

This is why renting supercomputing time is still more efficient than both owning one or trying to spin one up in the cloud.