Often while writing an application or adding a module to an existing application we wonder if we should go with multithreading. Multithreading can be a powerful tool, but only if used the right way. Lets examine.
There are two main reasons when one should consider multithreading,
- We have a GUI application and want to improve the user experience.
- We have a batch job processing application and want to improve the performance.
Lets take the first situation. I have developed a desktop application which splits large files into smaller chunks. I choose a movie file and want to split this into smaller chunks. Depending on the size of the file this operation may take anywhere from few seconds to few minutes. Lets say in the middle of the operation, I realize I chose the wrong file. If my application is single threaded I cant do anything to stop this as the main thread is working away splitting the file. So, if my application uses one thread, I can do only one thing at a time.
Now it makes sense to have this application multithreaded. You can start the file splitting on one thread while you have the main thread listening to user inputs. The thread doing the splitting work can be interrupted in case the user cancels the operation or needs to do something else.
So, writing GUI applications with multithreading makes a lot of sense and almost all modern day applications are multithreaded.
Now lets take the second scenario. We have a batch processing job to do. We have a bunch of xml files which we have to parse and persist in a database. If I do this operation in my main thread, I would have to parse each xml file and put the data in database sequentially. If I want to speed this process up, the logical thing to do would be to process these files and update the database in parallel. But its not that simple, there is a catch.
One CPU can process only one thread at a given time. There is also an overhead involved in thread scheduling as time slicing will be done to give CPU time to different threads. So if 50 threads are started to process 100 files, the job will definitely slowdown on a single CPU machine. At the same time doing the process in one thread does not make sense as the process involves i/o and db operations which are time consuming. A thread waiting on i/o is automatically removed and another thread is given a chance by most operating systems today. So the ideal number here would be around 5 threads. In case the environment has more than one CPU, the number of threads can be increased. This can even be decided at run time. The number of CPUs available for the VM can be got from the Java API Runtime.getRuntime().availableProcessors(). This can be multiplied with a suitable factor to get the number of threads required to process the job.
Multithreading for batch jobs should be considered based on the platform on which the process runs. Even better would be to have this configurable at run time.
It is good to consider the above while developing multithreaded GUI or batch jobs. In case we didn’t get the required performance, we now know where we were going wrong.