Dr. Tamás Hegedűs, senior research fellow at the Institute of Biophysics and Radiation Biology of Semmelweis University. PhD supervisor in several doctoral schools (Theoretical and Translational Medicine Doctoral School – Semmelweis University, “Roska Tamás” Engineering and Natural Sciences Doctoral School – “Pázmány Péter” Catholic University, and Biology Doctoral School – “Eötvös Lóránd” University of Sciences), represents Hungary in the Steering Committee of Elixir 3D-Bioinfo, and HPC expert of the Government
Agency for IT Development. Independent impact factor of all his scientific publications and works: 2378. Field of research: Structural and dynamic characterisation of ABC proteins causing cystic fibrosis and multidrug resistance with a view to identifying drug targets.
www.hegelab.org

"... we create a movie from our set of data, which shows the movements of the atoms in a protein and consists of more than 10 thousand frames.” – an interview with Dr. Tamás Hegedűs, bioinformatician, an HPC expert of KIFU.

When and how did you get to know supercomputing?

When I used a supercomputer for the first time in my life, I was a postdoctoral fellow in Chapel Hill, North Carolina. The neighbouring lab was dealing with theoretical questions, and on one occasion I started a chat with the head of that lab; in five minutes I got access to a top100 machine. I didn’t know how to use it, but I was very interested, and everyone was helpful, and soon I got the hang of it. The fact that I was always interested in IT was also helpful. In 1992, when I took the GCE exam, I also considered the Technical University; but back then I felt I could have something to do with IT as a biologist while an IT specialist has less access to biology, so I chose the latter.

How will a researcher’s question turn into a mathematical calculation?

Taking a concrete example, we create a movie from our set of data, which shows the movements of the atoms in a protein and consists of more than 10 thousand frames. Next, various mathematical algorithms are used to identify frames in which the movements of the atoms of this protein change to a special degree and direction. These mathematical methods are necessary because an average protein of ours consists of about 20,000 atoms.

Isn’t science becoming dehumanised?

No, because the human factor plays an important role all along. The first recognition is that you need a supercomputer. When you get this tool, you need to learn how to use it, and then decide what you want to compute. When the computing part and the movie are completed, beside the mathematical algorithms, we can also study the movements of the atoms of the proteins with our own eyes because the human mind has a level of speciality that makes it capable of recognising a particular and important pattern in such complicated movements too.

Does this statement hold for other fields too?

Curiosity is one of the most fundamental human traits and may be able to drive the heterogeneous HPC user community into a single direction. While a biologist or a meteorologist is not an IT expert, they have the motivation to dive into information technology, otherwise they cannot solve their exciting problems. In many cases, the solution finally emerges from the cooperation of researchers from various fields.

To what extent do you perceive the development of supercomputers?

I have always been interested in proteins but when I started my research work, the movements of proteins comprising only 20 amino acids could be computed so I stayed with the experimental work. Just for the sake of sensing the proportions, the membrane proteins we study contain about 1500 amino acids. This means about 20 thousand atoms that we fit into the lipid bilayer, which means even more atoms, and then we also place all this into water, that is, a huge number of atoms will be present in the simulation system. Between 2005 and 2009, we used 128 processors of this top100 supercomputer for 3 to 4 weeks to calculate the movements of such a system within a period of 100 nanoseconds. With Leo (which is the Hungarian supercomputer paying tribute to the work of Leó Szilárd and having attained position No.308 in the TOP500 world rank list in 2014 – the editor’s note), where we could already use GPUs, we
could complete such calculations within about a week using 16 CPUs and 3 GPUs (and fewer resources and electricity).

How did you join the community of Hungarian supercomputer users?

When I came back from the US in 2009, my research required the national HPC resources. At the same time, I accidentally got to know Dr. Tamás Máray through a friend – the person who created HPC in Hungary and was responsible for the operation thereof for decades. Since the program we use for molecular dynamics calculations is perfectly suitable for the load testing of CPU and GPU capacities, the test operation of Leo already brought exciting results for us.

Do you have a project or result that would have been unfeasible without a supercomputer?

Almost all of them. A little more precisely, I use supercomputers for at least 50% of my work. Molecular dynamics simulations must be run for a long time and repeated several times. Currently, these are being done in cooperation with Helmut Grubmüller, a senior research fellow of the Max Planck Institute in Göttingen because of the huge capacities they need. Of course, when the latest Hungarian supercomputer (Komondor – the editor’s note) will be here, that will bring a fundamental change in domestic opportunities.

Which supercomputer-based work of yours are you the most proud of?

Around 2013-2014, under the framework of an international cooperation, we demonstrated the way a single mutation disrupts the structure of the CFTR protein and the sites to which individual drug molecules bind.

The CFTR protein: The malfunctioning of the CFTR (cystic fibrosis transmembrane conductance regulator) protein causes a disease known as cystic fibrosis, which is associated with respiratory and digestive problems, an inflammation of the glands, infertility in males, and intestinal obstruction in infants.

While our partner (Gergely Lukács, McGill University, Montreal) obtained experimental data of lower resolution about this, we could study these processes on an atomic level with our simulations. We demonstrated that the drug used for one of the CFTR mutations is ineffective with another mutant, or may even be harmful to use. This clearly underlines the importance of basic research, for example, in providing customised therapy.

Is there an alternative to supercomputers? Can such tasks be solved in other ways?

No, there isn’t. Cloud-based services are always limited in some way – in general, their data protection solutions significantly reduce computing performance.

How fast can someone get the skills of using a supercomputer?

It depends on how this person wishes to use this tool. If this person has basic Linux skills, and others have written programs to solve the problem, he or she will only need to be shown how to put the job into the workflow, and that’s it, it can run. If the program has to be installed from the source code, then there could be difficulties, which may be solved relatively easily with the help of those who wrote the program, other researchers using it, and the staff of the Competence Centre. When it comes to writing an individual program, then this person has to learn how to parallelise. Depending on the programming background and the complexity of the problem, this may even take years.

Can someone learn how to use a supercomputer from descriptions and tutorial videos?

With proper Linux skills and with the help of a tutorial video, the basics can be acquired even within a day, but supercomputing is inevitably a community genre. Resources are scarce and expensive, and cooperation is a must for the utilisation rate of these tools to approximate the maximum. And this also needs all users to be as efficient as possible.

Who can you turn to if you might have questions?

Primarily, I get a lot of help from the staff of the Competence Centre - since I work for them as an expert until 2022 - and also from the physicists working in the GPU laboratory of Wigner (Wigner Research Centre for Physics, ed.) led by Gergely Barnaföldi.

Do these consultations have a community-building effect? Do you also keep in touch after the joint work?

There are loose micro-communities that have evolved spontaneously and they do not even organise themselves. But researchers using supercomputers can also be found in communities being built around theoretical research projects. These include the Hungarian Society for Bioinformatics, the QSAR and Modelling Group of MKE (Hungarian Chemical Society), and the Chemometrics and Molecular Modelling Work Committee of the MTA SZAB (Academic Committee of Szeged of the Hungarian Academy of Sciences). This is a community about chemical calculations, and they hold the KeMoMo-QSAR Symposium normally in May. But we’ll meet in Szeged in the end of September this year because of the pandemic. (The head of CheMicro, Antal Lopata earned imperishable merits in achieving that certain chemical program packages are available in Hungary and will be installed on the new HPC – the editor’s note). I also absolutely find it useful for the HPC community to have an online space organised from below by the users. This is an ongoing process, and I hope we will soon see it going live.

What are your expectations with regard to the Competence Centre? How would you rank its tasks?

I trust that the Competence Centre will accelerate the learning, accessing, and problem-solving processes to ensure a significantly smoother progress of the research projects requiring HPC. Researchers cannot wait for weeks or months for the opportunity and for the results! International examples also confirm that a quick and simple administration is indispensable for high-quality services. This would need a lot more experts. And what’s more, you don’t merely need good experts. You need outstanding experts to be able to leverage the capacities and maintain competitiveness. Certain progress has also been made in this field – PhD grants adapted to market realities are an excellent basis for promising careers, and ensuring continuity here is of key importance. However, it is also important to pay attention to the fact that reduced income after obtaining a PhD may be a demotivating factor. That is, competitive salaries would play an important role in keeping our outstanding experts.

In what fields of science would you think that integrating basic supercomputing skills into the curriculum is important?

In the field of natural sciences and engineering definitely, because artificial intelligence and deep learning will explode.

Deep learning: Deep learning teaches computers to recognise patterns by using the basic parameters of the data, be it speech, images, objects, or phenomena. These machine learning algorithms are already present in our everyday life – they recommend products, content, or potential new social media contacts to the users.

Although many believe that it has already exploded, I think we are only in the doorway of changes that will rewrite the fundamentals of everything we think about scientific work today. A decades-long problem in my field has practically been resolved by a company using deep learning techniques and HPC capacities the size of which is unimaginable in academic research. These and other types of runs are extremely resource-intensive even with serious optimisation, but beyond my own field, epidemiology experts and those performing next-generation sequencing or meteorologists can’t do without supercomputing skills either.

Can supercomputers be regarded as a sort of a “philosopher’s stone” useful for all fields of science, or is it just a practical tool like chalk?

Rather a Swiss Army knife. Highly versatile in skilled hands. Moreover, it is becoming useful in an increasing number of fields, and the associated hardware is also evolving very fast. Hence, amortisation is also fast, and therefore, one should start planning the purchase of the machine after the next one already before installing the latest one.

Have you ever been inadvertently inspired by a user representing a completely different field of science?

Basically, I get in touch with the representatives of natural sciences. Typically, these include biologists, bioinformaticians, physicists, and chemists, very often from interdisciplinary fields. The above-mentioned Wigner GPU Laboratory organises conferences, workshops, and courses on a regular basis, and the range of guests is exceptionally heterogeneous also including people from larger companies, and that’s why I know more about the trending topics of deep learning. This year, the GPU Days will be coupled with an artificial intelligence basic research matching event. And it also happened that I got into a wrong user group due to a
misunderstanding, and it resulted in a journal article of mine co-authored by the leader of that group Gábor Parragi (university of Sciences of Szeged).

What makes an efficient supercomputer user?

Experience. You can learn the basics, and there are routine calculations, but you also need practice. When an alpha helix winds out, I will notice if I see it in our movie. You need experience to feel what could go wrong and how to troubleshoot it when a program developed by researchers outputs a completely meaningless error message. But you also need the experience of others – everyone needs it, and I need it too. In the same way as a physician with a decades-long routine often consults his or her colleagues, it seems that we should also always pay attention to each other’s insights.

What would we need for big breakthroughs and profitable outcomes?

Even more investment and expenditure in basic research, at least five times the current budget. That would confer a proper basic momentum to exploratory research upon which much more profitable innovation activities could be built. In other words, more innovation could be implemented with a lower total investment. Of course, this would also boost supercomputer use, and thus would further increase the demand as a self-propelling process.

What do you think of the future of supercomputers?

Ever-increasing performance using ever-decreasing capacities with ever-decreasing power consumption, but ever-increasing academic and industrial demand. Supercomputer use is expected to rise and will affect all areas of life because we will model an increasing number of things. There is an increasing amount of data, which requires an increasing number of calculations that yield results of increasing quality and accuracy. To simplify it: we will live better and longer while causing less environmental damage. Supercomputers also play an indispensable role in fighting the coronavirus pandemic – they are used for both modelling the spread of the pandemic and for understanding the structure and movements of the proteins of the virus. Mapping the structure of a protein via experiments may take years, but now it only takes a day using supercomputers and deep learning.