Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (46)

On the role of Distinguished Engineer and CTO Mindset Apr 27, 2025 The future is bright Mar 30, 2025 2024 Reflections Dec 31, 2024 Working from home works as well as any distributed team Nov 25, 2022 Good developers can pick up new programming languages Jun 3, 2022 In most cases, there is no need for NoSQL Apr 18, 2022 Kitchen table conversations Nov 7, 2021 Returning security back to the user Feb 2, 2019 Let’s talk cloud neutrality Sep 17, 2018 What does a Chief Software Architect do? Jun 23, 2018 Leaving Facebook and Twitter: here are the alternatives Mar 25, 2018 When politics and technology intersect Mar 24, 2018 Nobody wants your app Aug 2, 2017 The technology publishing industry needs to transform in order to survive Jun 30, 2017 Rather than innovating Walmart bullies their tech vendors to leave AWS Jun 27, 2017 I tried an Apple Watch for two days and I hated it Mar 30, 2017 Copyright in the 21st century or how "IT Gurus of Atlanta" plagiarized my and other's articles Mar 21, 2017 Emails, politics, and common sense Jan 14, 2017 Here is to a great 2017! Dec 26, 2016 What I learned from using Amazon Alexa for a month Sep 7, 2016 Amazon Alexa is eating the retailers alive Jun 22, 2016 In Support Of Gary Johnson Jun 13, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 JEE in the cloud era: building application servers Apr 22, 2016 In memory of Ed Yourdon Jan 23, 2016 Operations costs are the Achille's heel of NoSQL Nov 23, 2015 Banking Technology is in Dire Need of Standartization and Openness Sep 28, 2015 I Stand With Ahmed Sep 19, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 Social Media Detox Jul 11, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Your IT Department's Kodak Moment Jun 17, 2015 Big Data is not all about Hadoop May 30, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Why I am Tempted to Replace Cassandra With DynamoDB Nov 13, 2014 Software Engineering and Domain Area Expertise Nov 7, 2014 Docker can fundamentally change how you think of server deployments Aug 26, 2014 Wall St. wakes up to underinvestment in OMS Aug 21, 2014 "Hello, World!" Using Apache Thrift Feb 24, 2013 Thoughts on Wall Street Technology Aug 11, 2012 Happy New Year! Jan 1, 2012 Eminence Grise: A trusted advisor May 13, 2009

Building a Supercomputer in AWS: Is it even worth it ?

April 13, 2015

[caption id="attachment_245" align="aligncenter" width="300"]Columbia Supercomputer Photo credit Scott Beale Columbia Supercomputer
Photo credit Scott Beale[/caption]

The fact that Cray is still around is mind boggling. You'd think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World's Fastest Systems:
“This, to IDC’s knowledge, is the largest supercomputer sold into the O&G sector and will be one of the biggest in any commercial market,” the report stated. “The system would have ranked in the top dozen on the November 2014 list of the world’s Top500 supercomputers.”

Building one of the dozen fastest supercomputers isn’t new for Cray – they’ve got three in the current top 12 now. But what is unique is that most of those 12 belong to government research labs or universities, not private companies. This may be starting to change, however. For example, IDC notes that overall supercomputing spending in the oil and gas sector alone is expected to reach $2 billion in the period from 2013-2018.

Supercomputers come with astronomical costs:
So, you’re in the market for a top-of-the-line supercomputer. Aside from the $6 to $7 million in annual energy costs, you can expect to pay anywhere from $100 million to $250 million for design and assembly, not to mention the maintenance costs

In the 1990s I was involved in a student project to build a Linux Beowulf cluster out of commodity components. It involved a half a dozen quad-core servers, with something like a gigabyte of RAM each. It cost a fortune, and it required us to obtain NSF funding for the project. I don't recall the exact details.

I know, that a similarly configured modern cluster in AWS would cost a few hundred bucks a month if it was used continuously. But even the cluster we built at Clarkson was not used 24/7, and so if done right the same cluster would have cost a fraction of that in AWS.

Turns out I am not the only one who had an idea to build a Beowulf cluster in AWS:
After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

That last statement is important. Setting aside the costs, imagine the red tape required to put something like that together with the help of your on-premise IT department ?

At an AWS Summit a couple of years ago Bristol-Myers Squibb gave a talk on running drug trial simulations in AWS:
Bristol-Myers Squibb (BMS) is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines that help patients prevail over serious diseases. BMS used AWS to build a secure, self-provisioning portal for hosting research so scientists can run clinical trial simulations on-demand while BMS is able to establish rules that keep compute costs low. Compute-intensive clinical trial simulations that previously took 60 hours are finished in only 1.2 hours on the AWS Cloud. Running simulations 98% faster has led to more efficient and less costly clinical trials and better conditions for patients.

If I interpret that case study correctly BMS didn't even bother with an on-premise supercomputer for this.

AWS of course is happy to oblige:
AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

So, what would it cost to setup one of the worlds most powerful supercomputers in AWS and run it for one month ? I fully realize that this may not be a very accurate discussion, but let's humor ourselves and try to imagine the biggest of the Top 500 Supercomputers built in AWS.

As of June 2013, the biggest super computer was at National University of Defense Technology in China, and it had 3,120,000 CPU cores. Let's eyeball this in AWS using Amazon's cost calculator. I put together a coupe of different HPC configurations.

Amazon's g2.2xlarge instances have 8 cores and 15 gigabytes of RAM each. To get to the 3,120,000 cores one would need 390000 instances, which would cost $185,562,000.00 for a month, not including business support.

If you use No-Upfront Reserved for 1 year, the cost becomes $134,947,800.00 per month for a year. Three Year All-Upfront Reserved costs $2,889,900,000.00 up front and $100 a month.

Now, here is an important factor. On premises you have to build out the maximum capacity you will ever use, but in the cloud you can dynamically scale up and down as required by your workload. Whereas supercomputing was the domain of governments and wealthy corporations, it is now within reach of anyone building out in AWS.

Let's try this with c4.8xlarge. On-demand this costs $119,901,600.00 a month. Three Year All-Upfront is $1,609,335,000.00 .

Of course, we don't even know if such a thing is even possible on AWS -- to quickly spin up a few hundred thousand servers. How long would that take ? This would probably require a conversation with AWS sales, and probably a volume discount. But either way, something tells me that for such large specialized computational workloads it would be naive to assume that building a supercomputer in the cloud would be cheaper.

This is why renting supercomputing time is still more efficient than both owning one or trying to spin one up in the cloud.