Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (64)

Strategic activity mapping for software architects May 25, 2025 On the role of Distinguished Engineer and CTO Mindset Apr 27, 2025 The future is bright Mar 30, 2025 2024 Reflections Dec 31, 2024 My giant follows me wherever I go Sep 20, 2024 Are developer jobs truly in decline? Jun 29, 2024 Some thoughts on recent RTO announcements Jun 22, 2023 One size does not fit all: neither cloud nor on-prem Apr 10, 2023 Should today’s developers worry about AI code generators taking their jobs? Dec 11, 2022 Working from home works as well as any distributed team Nov 25, 2022 Why you should question the “database per service” pattern Oct 5, 2022 Good developers can pick up new programming languages Jun 3, 2022 There is no such thing as one grand unified full-stack programming language May 27, 2022 Peloton could monetize these ideas if they only listen May 15, 2022 Good idea fairy strikes when you least expect it May 2, 2022 Best practices for building a microservice architecture Apr 25, 2022 TypeScript is a productivity problem in and of itself Apr 20, 2022 In most cases, there is no need for NoSQL Apr 18, 2022 A year of COVID taught us all how to work remotely Feb 10, 2021 Making the best of remote work - Coronavirus blues Mar 16, 2020 TDWI 2019: Architecting Modern Big Data API Ecosystems May 30, 2019 Using Markov Chain Generator to create Donald Trump's state of union speech Jan 20, 2019 The religion of JavaScript Nov 26, 2018 Let’s talk cloud neutrality Sep 17, 2018 Fixing the Information Marketplace Aug 26, 2018 What does a Chief Software Architect do? Jun 23, 2018 I downloaded my Facebook data. Nothing there surprised me. Apr 14, 2018 Nobody wants your app Aug 2, 2017 Node.js is a perfect enterprise application platform Jul 30, 2017 Design patterns in TypeScript: Chain of Responsibility Jul 22, 2017 Singletons in TypeScript Jul 16, 2017 Rather than innovating Walmart bullies their tech vendors to leave AWS Jun 27, 2017 Architecting API ecosystems: my interview with Anthony Brovchenko of R. Culturi Jun 5, 2017 TDWI 2017, Chicago, IL: Architecting Modern Big Data API Ecosystems May 30, 2017 Collaborative work in the cloud: what I learned teaching my daughter how to code Dec 10, 2016 Don't trust your cloud service until you've read the terms Sep 27, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 What can we learn from the last week's salesforce.com outage ? May 15, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 JavaScript as the language of the cloud Feb 20, 2016 OAuth 2.0: the protocol at the center of the universe Jan 1, 2016 Our civilization has a single point of failure Dec 16, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 Attracting STEM Graduates to Traditional Enterprise IT Jul 4, 2015 Your IT Department's Kodak Moment Jun 17, 2015 The longer the chain of responsibility the less likely there is anyone in the hierarchy who can actually accept it Jun 7, 2015 Big Data is not all about Hadoop May 30, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 The Clarkson School Class of 2015 Commencement speech May 5, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 What can Evernote Teach Us About Enterprise App Architecture Apr 2, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Software Engineering and Domain Area Expertise Nov 7, 2014 Wall St. wakes up to underinvestment in OMS Aug 21, 2014 Software Engineers Are Not Doctors Aug 3, 2014 Cassandra: Lessons Learned Jun 6, 2014 Java, Linux and UNIX: How much things have progressed Dec 7, 2010 Eminence Grise: A trusted advisor May 13, 2009

Building a Supercomputer in AWS: Is it even worth it ?

April 13, 2015

[caption id="attachment_245" align="aligncenter" width="300"]Columbia Supercomputer Photo credit Scott Beale Columbia Supercomputer
Photo credit Scott Beale[/caption]

The fact that Cray is still around is mind boggling. You'd think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World's Fastest Systems:
“This, to IDC’s knowledge, is the largest supercomputer sold into the O&G sector and will be one of the biggest in any commercial market,” the report stated. “The system would have ranked in the top dozen on the November 2014 list of the world’s Top500 supercomputers.”

Building one of the dozen fastest supercomputers isn’t new for Cray – they’ve got three in the current top 12 now. But what is unique is that most of those 12 belong to government research labs or universities, not private companies. This may be starting to change, however. For example, IDC notes that overall supercomputing spending in the oil and gas sector alone is expected to reach $2 billion in the period from 2013-2018.

Supercomputers come with astronomical costs:
So, you’re in the market for a top-of-the-line supercomputer. Aside from the $6 to $7 million in annual energy costs, you can expect to pay anywhere from $100 million to $250 million for design and assembly, not to mention the maintenance costs

In the 1990s I was involved in a student project to build a Linux Beowulf cluster out of commodity components. It involved a half a dozen quad-core servers, with something like a gigabyte of RAM each. It cost a fortune, and it required us to obtain NSF funding for the project. I don't recall the exact details.

I know, that a similarly configured modern cluster in AWS would cost a few hundred bucks a month if it was used continuously. But even the cluster we built at Clarkson was not used 24/7, and so if done right the same cluster would have cost a fraction of that in AWS.

Turns out I am not the only one who had an idea to build a Beowulf cluster in AWS:
After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

That last statement is important. Setting aside the costs, imagine the red tape required to put something like that together with the help of your on-premise IT department ?

At an AWS Summit a couple of years ago Bristol-Myers Squibb gave a talk on running drug trial simulations in AWS:
Bristol-Myers Squibb (BMS) is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines that help patients prevail over serious diseases. BMS used AWS to build a secure, self-provisioning portal for hosting research so scientists can run clinical trial simulations on-demand while BMS is able to establish rules that keep compute costs low. Compute-intensive clinical trial simulations that previously took 60 hours are finished in only 1.2 hours on the AWS Cloud. Running simulations 98% faster has led to more efficient and less costly clinical trials and better conditions for patients.

If I interpret that case study correctly BMS didn't even bother with an on-premise supercomputer for this.

AWS of course is happy to oblige:
AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

So, what would it cost to setup one of the worlds most powerful supercomputers in AWS and run it for one month ? I fully realize that this may not be a very accurate discussion, but let's humor ourselves and try to imagine the biggest of the Top 500 Supercomputers built in AWS.

As of June 2013, the biggest super computer was at National University of Defense Technology in China, and it had 3,120,000 CPU cores. Let's eyeball this in AWS using Amazon's cost calculator. I put together a coupe of different HPC configurations.

Amazon's g2.2xlarge instances have 8 cores and 15 gigabytes of RAM each. To get to the 3,120,000 cores one would need 390000 instances, which would cost $185,562,000.00 for a month, not including business support.

If you use No-Upfront Reserved for 1 year, the cost becomes $134,947,800.00 per month for a year. Three Year All-Upfront Reserved costs $2,889,900,000.00 up front and $100 a month.

Now, here is an important factor. On premises you have to build out the maximum capacity you will ever use, but in the cloud you can dynamically scale up and down as required by your workload. Whereas supercomputing was the domain of governments and wealthy corporations, it is now within reach of anyone building out in AWS.

Let's try this with c4.8xlarge. On-demand this costs $119,901,600.00 a month. Three Year All-Upfront is $1,609,335,000.00 .

Of course, we don't even know if such a thing is even possible on AWS -- to quickly spin up a few hundred thousand servers. How long would that take ? This would probably require a conversation with AWS sales, and probably a volume discount. But either way, something tells me that for such large specialized computational workloads it would be naive to assume that building a supercomputer in the cloud would be cheaper.

This is why renting supercomputing time is still more efficient than both owning one or trying to spin one up in the cloud.