HPC Portrait- Bálint Gyires-Tóth
He has been dealing with theoretical and applied machine learning since 2007. In 2008, he was the first to create a hidden Markov model-based text reading procedure in Hungarian. Since 2014 deep learning has been his primary research area. In addition to theoretical deep learning, he emphasizes practical applications such as multidimensional time series modeling and prediction, as well as image and audio classification and clustering, together with natural language processing. In 2017, he was awarded the NVIDIA Deep Learning Institute (DLI) Certified Instructor and University Ambassador title, and from 2022 he is an external expert of the KIFÜ HPC Competence Centre and the National Tax and Customs Administration.
How did you learn about supercomputing?
During my doctorate course, my topic was machine speech generation based on machine learning, at that time we were still working on CPUs. In 2013, during my doctoral defense, deep neural networks received significant attention within machine learning, which were more and more often run on GPUs. In my doctorate, I dealt with an "earlier" machine learning process, but in 2013 another explosion was very close. After my defense, I started researching deep neural networks, which is when I came into contact with supercomputing.
What was your first work and experience with a supercomputer?
I started working on deep neural networks, text-to-speech conversion, ergo machine text reading. I adapted the algorithms, which I developed using the previous method, to deep neural networks. Around the same time, I also participated in an image recognition competition, where I also gained a lot of experience.
What do you use the supercomputer for?
Since then, I have dealt with a relatively large number of topics, primarily in the direction of research and development. On the one hand, there are topics related to basic research, and there are applied research and company developments. High-performance computing is needed in two cases, one is when we work with very large databases, and when we don't have a lot of data, but we want to find the best model with the help of a large number of experiments. I work full-time at BME, where I have the opportunity to participate in numerous domestic and international research projects. In many cases, an HPC environment is not required, but the increased capacity allows for more and faster experiments. The end result will be a better model that recognizes an image or speech more successfully. We can generate better quality sound, or if we work with company data on a specific topic, we can give a more accurate forecast. It is important to note that deep learning solutions are highly scalable. If we reach a solution on 10 GPUs in a given time, then on 20 GPUs - if not in half as much time - but the calculations can be performed much faster. This is very important when we have to perform a large number of experiments, in which case it is significant whether we wait thirty days for the results, or three, or maybe a few hours. In these cases, it becomes especially important to work on HPC. When we run the first experiments, we usually only wait for the beginning of the teachings, it works or it doesn’t - this is the most important question. After that, we observe the convergence of the teachings under different settings, so-called hyperparameters. In such cases, it also matters how long we wait for the first, very rudimentary results, a minute or twenty, which is why HPC is a great help in these projects. Finally, when we already have a well-functioning deep learning framework, we have the largest models with tens or even hundreds of billions of free parameters that we train for weeks or even months to achieve the best results.
How can your results be used?
In the case of deep learning and machine learning, basic research is often completed oin an applied research topic. Currently, this field of science does not work like, for example, mathematics, where each assumption can be proved by proof. In deep learning, we can mostly show empirically, through practical examples, that the developed algorithm performs consistently better than the previous ones. So even basic research often has results, for example, if we develop a new procedure for time series modeling, it is often demonstrated ion real time series. In the case of industrial usage, we tailor the solution to the specific application area. At this time a common topic is anomaly detection, either in relation to a manufacturing process or telecommunications or WiFi network. For example, we try to identify malfunctions and abnormal operations. Besides the previously mentioned one, perhaps the three biggest topics of deep learning are machine vision, natural language processing (NLP) and speech technologies. Image recognition can be very versatile.
Do you have a project or achievement that could have been implemented without a supercomputer?
It is questionable where the limit is, what we call a supercomputer. Due to the lack of resources, we are often forced to look for other solutions, however, in many cases it would be extremely difficult to request/reach international-level results without a supercomputer. The arriving Komondor is a massivehuge opportunity for the Hungarian research and development community. If they get access to 20-50 or more GPUs, we can actually talk about supercomputer usage. In comparison, the combined usage of 16 modern GPUs means an enormous computing capacity. Such a resource raises deep learning research to a completely different level. However, a research won't be better if it gets a lot of GPUs.
Is there an alternative to the supercomputer in your field? Can your tasks be solved differently?
It also depends on how we define HPC. If, for example, the limit is at least 100 GPUs, then yes. If we include a modern 4-16 GPU system in this category, then there is none. In the first case, the primary alternative is for research groups to publish their results many times. Also, they not only write an article about it, but also publish the model trained over hundreds of GPU hours. We can further tune this according to our own goals. So, we can specialize in a general image recognition model for, such as, medical image processing, in a way that it inherits the knowledge of the pre-trained model. Experience shows that we can achieve better results by not starting completely from scratch. We have to be able to solve our tasks without hundreds of GPUs. However, many pre-taught networks are available in many subject areas, and there are different methods of how to teach them onward. We still have to run a lot of training to find the best model. If we have an environment with many GPUs, we can expect to achieve faster and better results. If we don't have enough capacity, we achieve some results and think that this is the limit, we can't do better than that. However, if we have enough capacity, there is a good chance that we will cross the threshold that from proof-of-concept experiments we obtain a deep learning-based solution that can be used in practice. A few years ago, the translation programs were less good, but machine learning developed, larger and larger databases were placed under the ever-developing algorithms, and there was adequate computing capacity, so it was possible to pass the level where everyone who wants to translate now uses it.
Did the supercomputer surprise you in any way?
Yes. When I first got access to a larger number of high-end GPUs, I was surprised by how difficult it was to use them effectively and well. We can use it, that's not a problem, but it's very difficult to drive many high-quality GPUs in a way that makes sense and doesn't start to develop a "waste a shot" type of solution.
How challenging do you find using a supercomputer?
Challenge is not the first word that comes to mind, but opportunity. It's a huge opportunity. The fact that the Komondor 216 comes with GPU is not a breakthrough in global terms, but at the level of Hungary, if we can work together with researchers and engineers who have the intention and knowledge to make effective use of this, then this is a huge opportunity. Among other things it is a change, to make our place more stable on the international map of machine learning. Operating it is definitely a challenge, so we have to help users so that their goals are best supported by the system. What we have overcome one by one must be bridged in an organized way, with education. There is no well-tried procedure, it is also impossible to develop a method for one topic and it will work everywhere. There are, of course, available solutions that can be set up relatively quickly in the case of a particular subject area, but if we have a special task, and this is the case in most instances, then the solution is far from trivial. There are many tasks that do not necessarily serve the specific purpose - data collection, preparation, database construction, cleaning, solution testing, model training - and work differently in an HPC environment.
How quickly can you get the knowledge you need to use HPC?
It depends on the subject area. There are places where it is pre-prepared, for example, for image recognition or natural language processing, many such solutions are available, and there are even places where the scripts and config files are available to configure an entire GPU cluster. In principle, these can be used immediately with minimal modifications, but what needs to be modified is the really valuable expert knowledge. This level, which is needed to solve a task close to a general topic, is not so difficult to reach. But if we want the user to be able to work in more than just a specific scheme, longer training is required, in which practice on live projects also helps significantly. Actually, when you put your first supercomputer project together by yourself, that's when you really learn the pitfalls. For example, once when we switched to 32 GPUs from 4, it was not only faster, but also slower at first.
To whom would you recommend using a supercomputer?
Anyone who has dealt with larger databases should try it at least once to see how HPC works - if for no other reason but to have it in their toolbox. If someone deals with machine learning and has already had to wait days during teaching, I would definitely recommend it to them. But also for those who haven't started anything with their data yet or haven't even started recording it. Primarily, I wouldn't recommend HPC to them, but overall data-driven modeling, i.e. machine learning. They should try to identify the data that is already being recorded or could be recorded, and if not with many GPUs, start modeling. This is the only way they can gather enough experience to determine whether they need more computing capacity. But it is also a very big responsibility of this profession that if someone does not need machine learning and/or HPC to solve a specific task, then we identify that too.
What does it take to be an effective supercomputer user?
Mostly project experience. Of course, you need basic education, as well as lexical knowledge and programming practice, but learning this is not so difficult.
Why did you apply to the Competence Centre's expert team?
Machine learning, deep learning and the data-driven world are very close to me, and I am the happiest when I can deal with them(is). It fills me up. It is also encouraging that a new supercomputer is being built in Hungary, and that I can help make its use efficient until the end of 2022. My personal inspiration is the example of my grandfather, he taught and researched all his life, and among other things, he also introduced the programming mathematics training. (A supercomputer was also named after Professor Béla Gyires in 2010. ed.) It can help a lot that I have connections with many researchers and research groups, I see what is needed - and I myself do research on several topics. I wish the Komondor to be a supercomputer that can be used and used well. I would like to contribute to the creation of a system where, if someone has the knowledge to use it effectively, will get access and support.
What do you think is the most important task of the Competence Centre?
In addition to the computing capacity provided for research, education is perhaps the first. It is necessary to ensure the tools, of course on several levels, with which Komondor can be used effectively. This is supported by the creation of a suitable software infrastructure, it should be easy to set the parameters, and ready configurations should be available, so that as few engineering hours are spent as possible if someone wants to create such a system. And the third is that if someone gets stuck, we should help him. Here, it is rather the connection between the user and the consultant that the HPC Competence Centre can undertake, it is enough to tell them who is worth contacting.
How do you perceive the development of supercomputers?
Absolutely. Computational capacity in the field of deep learning is very closely related to the quality of the results. The performance of GPU cards is constantly increasing, the hardware environment is developing, and the transfer speed within and between machines is increasing. For now, the trend in deep learning is that larger models, larger databases and faster machines give better results. It brings paradigm shifts. Our field of science is driven by computing power and data volume. The data is often very expensive, it is not available in all cases, and we don't have the opportunity to collect tens of thousands of data points on a rare disease, for example, because it should already be done on a global level. However, the computing capacity "only" has to be purchased.
How do you see the future of supercomputing?
If it happens roughly as it has in recent years, so there is no paradigm shift in the field of machine calculations and we "just" continue like this, according to Moore's law, which means doubling every two years, whichthat is also astonishing. With CPUs, we have already started to lag behind, but with GPUs, development is a little faster. This means that if we look at the history of computing, where we started from and keep this doubling, computing capacity will grow as much by 2024 as it has in the last 70 years. If there is nothing new in machine learning, nothing new in technology - which is quite unlikely - we will still have larger databases and much greater computing capacity. It is also certain that HPC will bring astonishing results in various new application areas. Machine learning is also reaching more and more areas of application, it will be used by a much wider circle, and capacities will be needed. Not to mention, you don't need such deep knowledge to use it at the application level, it will also be an option in the toolbox of smaller software development companies. This opens up new opportunities in many areas, such as law, agriculture and, museums. If we are moving into areas of application, where thanks to deep learning the development is a quarter of the tech sector’s thanks to deep learning, great innovations can still be expected.