I’ve become a fan of the show Mad Men. What I like most about the series is the faithful depiction of life in the 1960’s, in many ways simpler times especially with respect to technology. The fashions, the styles, the furniture all evoke distant latent memories from my childhood. Thank you Don Draper for teaching this physicist the basics of how men should dress for business but who thought up avocado as a color for kitchen appliances? While these human oriented and superficial differences are nuanced and amusing, the technology differences are truly startling. In the 1960s armies of typewriters clicked loudly in every professional office in a way that would put the worst keyboard slammer to shame. Typewriters, whiteout, copiers, intercoms…this was the high point of office technology in the 1960s. It’s remarkable to think that if you needed to send information to a business colleague in 1962 it had to go by post mail. It could take a week to get a response and who knows how many iterations it might take to conclude your interaction. Today we can send documents back and forth around the world in seconds. I was recently at a conference in Copenhagen where I communicated and worked at the hotel, conference center and numerous cafes using my office machines in the USA. How did anything get done before the internet?
In the field of high performance computing (HPC) the world of just 15 years ago has a similar charm. The datacenters of the mid 1990s were ruled by serial computers. We reliably counted on chip vendors to deliver 2x performance gains every two years. Computers were less capable so the physics they could model was not as complex as it is today. Many models were simplified using approximations, e.g. 2D vs 3D or empirical vs analytic or ab-initio models. The language of choice was Fortran while C was just beginning to get a foothold, and C++ was still practiced in secret by grad students. Contrast that bucolic scene with today. First, computers are much more capable; over 100x times faster in fact so the physics they can model is more sophisticated and the codes are more complex. Since the codes are more complex we make more use of object-oriented languages like C++ which are extensible, more easily maintained and better describe code architectures and object relationships than monolithic languages like Fortran or C. The biggest difference though is the complex multi-level parallelism that we program to today. The Intel Sandy Bridge and Ivy Bridge chips are extremely powerful state of the art compute platforms but to realize their potential the developer must elicit program parallelism from at least three levels, at the register level using SSE/AVX, at the core level using pthreads, OpenMP or MPI and at the node level using MPI. Compilers help, but not nearly enough. Heterogeneous platforms like Nvidia GPUs, which continue to make advances in HPC, have their own fine-grained parallel programming model that requires a significant level of unique understanding of hardware to optimize performance. I’ve written elsewhere about heterogeneous computing and GPUs .
The result is that writing performant code for a modern computing architecture is a difficult task and has become a highly specialized skill in itself. Whereas 15 years ago it was not uncommon or unreasonable for an expert in a scientific domain such as geoscience or quantum chemistry to be asked to write simulation and modeling code for their field, that prospect is much less tenable today. It’s difficult and rare to be a domain expert and also to know how to map the governing equations that define your field to a modern computing architecture. It’s an exceptional person that can do both well. These are the people, by the way, that I look to hire. Part of the problem is that in the academic world of physics, chemistry and engineering there is little incentive to spend large amounts of time developing and porting codes. Graduate students get scant academic benefit from heroic efforts on code where publications in their field are the currency of advancement. Computer scientists and computer engineers whose purview does include performance and optimization don’t often have the science background or the interest to understand the application domain in sufficient detail.
The skills gap has implications for both sides of the employment equation: for computational scientists and for the industry groups that employ them. First addressing computational scientists, I have a strong word of encouragement for students in science and engineering with a compelling interest in software development, numerical analysis and high performance coding. There are jobs for you and I personally believe the opportunities are growing. I’ve bet my whole career and founded a company on this belief so if I’m wrong I’ll see you on the unemployment line! The evolution of the field since 2005 when I founded SRT has only reinforced my view on this subject. Several trends inform my outlook that opportunities in computational science will continue to grow. The increasing power of the stand-alone workstation is one. Today’s deskside workstations are the equivalent of yesterday’s rack-based supercomputers. A typical engineering workstation can support 512GB of DRAM and offer 100GB/s of memory bandwidth to two 3.1 GHz oct-core processors. They can support four or more NVIDIA GPU compute cards such as the new K10’s which offer over 4.5 Teraflops of single precision floating point performance each. The point is that the power of the deskside workstation is having a democratizing effect on supercomputing and pushing the tools for powerful simulation into the hands of more people for the study of more problems in more diverse application domains. The result of this is a growing demand for the people who can understand, write and run these codes.
A second trend spurring demand for HPC skills is the growing need for the movement and analysis of huge datasets generated by all aspects of the digital economy. Scientists have been dealing with this problem for years whether it’s particle physicists looking for specific patterns (e.g. the Higgs boson) in enormous datasets generated by their instruments or bio engineers searching through huge genome databases for similarities and patterns that mark a particular disease or trait. Big data is the new big problem and its solution will drive demand for people with HPC skills.
On the other side of the employment equation are the industry groups that employ computational scientists. In addition to the large independent vendors of scientific HPC software there are many companies in oil and gas, aerospace, finance and pharma and others that need these skills. One consequence of the skills gap for employers is that most of these by necessity will consider candidates from outside their specific application domain. Finance is a good example. Since the early 1990s financial firms have been hiring physicists and mathematicians to analytically and numerically work out the complex valuation problems that arise in modern financial engineering. From personal experience as well I know this is true in Oil and Gas where I was part of a team of PhD’s with backgrounds in physics, applied math and engineering. The principle behind this is that in hiring and grooming new computational scientists, lacking candidates with strong experience in both the science and compute together, companies consider it more practical to take those with a strong background in the basic sciences and teach them the computational essentials rather than the other way around. In an article about scientist’s jobs in the July 7th issue of the Washington Post it’s noted that physicists have fared better than others with unemployment between 1 and 2 percent . This can be attributed, in my opinion, to the widely held view of physicists as generalists who are good at learning new fields and solving problems in general.
A second interesting consequence of the skills gap for employers is the emergence of partnerships between HPC specialists and domain scientists in physics, chemistry and engineering. Many companies have recognized that their core competency is not in scientific computing yet they have critical business needs for extremely complex codes that run as fast as possible on modern architectures. To meet this need they have taken up partnerships with companies that specialize in this area. The energy industry has been a leader in these types of engagements, with several prominent partnerships between major oil companies and service companies or independent software vendors. Reservoir simulation for example is a core technology for most oil companies. Companies use these simulators to model the flow of oil, gas and water in the sub-surface of the earth in the presence of wells to predict resource extraction and optimize the development of fields. Reservoir simulators are large and complex codes and only the biggest of the majors and some of the national oil companies still develop their own. Others have formed partnerships with companies that specialize in the development, optimization and maintenance of technical codes. Some examples of public partnerships of this sort for reservoir simulation are Shell/CMG, BP/Halliburton and Chevron/Schlumberger. These partnerships can take many forms but in general the energy company will provide direction on features and formulations and the partner company will provide professional software development services, implementation, optimization and maintenance. In most cases the service company can market the product freely perhaps with a specialized code branch or a 6 to 12 month pre-release for the sponsoring company. It may seem counter-intuitive to allow the code to be marketed to potential competitors but there are ways to safeguard company IP and the sponsoring company is mainly interested in having access to a professionally developed and maintained code. Enlarging the market ensures that more thorough testing is done and decreases the burden on the sponsoring company for continued support of all development.
So how does one become a computational scientist? The label itself implies an interdisciplinary nature and in fact great computational scientists can be found in almost any field that requires both a deep knowledge of science and computing. There are a few separate computational science degree paths available from US institutions however in my experience it’s more common to find these in Europe or Asia. More often computational science is an elective path within a science or engineering major. Many students don’t consider themselves computational scientists until graduation nears and the lack of positions in their field proper coupled with a frank inventory of their marketable skills suddenly turns them into computational scientists or scientific programmers. In my recruiting efforts I focus on candidates from physics, applied math and electrical or aero engineering but I receive resumes from all scientific disciplines and review them impartially. As a side note I consider recruiting the most important thing I do in my position at SRT. I struggle to understand CEO’s of small companies that outsource this function or don’t take a strong personal interest in it. This perhaps is the subject of a future post.
Finally what does the future look like for scientific computing? There is nothing on the horizon that indicates coding for modern architectures will get easier. In fact power considerations argue for more flexibility to the programmer to orchestrate operations on local data, leading to more and more complex memory hierarchies all open to the developer to manage  . If anything, coding will get more complicated and coupled with continuous improvements in hardware capacity will drive the demand for skilled computational scientists from diverse academic disciplines and encourage innovative industry partnerships.