Java Bottleneck

I don’t know if it can be generalized to any programming languages. I usually use Java for analyzing document data, and most of my codes do very intensive computation taking up 100% CPU usage while iterating the documents a few thousand times. One execution or experiment takes almost the whole day or a couple, so the code optimization is very important to save several hours. I found out from my codes that surprisingly, memory allocation was taking more time than computation itself. Especially, it should be avoided to repeatedly allocate memory for large arrays. When Java allocates memory for an integer array or double array, the elements are initialized to be zero automatically. Considering this initialization can be done efficiently at the memory level, however, I would conclude that reserving consecutive memory space is the most time-consuming part.

Java Bottleneck

Efficient Algorithm

As I need process with a large number of data many times for research, it has become important to me to make both time and memory efficient algorithms. Using threads is so basic, but most time it works really well. Recently, I implemented a hierarchical kmeans algorithm, and I had to put much effort to process about 6 million 128-D vectors. I changed all double variables to chars and retained only those variables without which the algorithm doesn’t work. Still, it takes too much time. Maybe I should use threads and relax the condition of convergence.

By the way, I’m happy that Professor Jung teaches the algorithm class this semester on very practical topics such as approximate algorithms, randomized algorithms, and so on. I’m not sure I can actually use them, though.

Efficient Algorithm