We address the problem of efficient schemes for mapping arbitrary parallel algorithms onto distributed memory message passing multiprocessors with mesh topologies. We analyze a particular mapping scheme, find the reasons for its low efficiency, and show that mapped algorithms tend to be both wider and higher than necessary. As a result, they generally execute too slow while at the same time occupying an excessive number of processors. Two approaches to the improvement of the scheme are presented, one direct, and the other indirect. In the direct approach, we describe four nontrivial improvements of the scheme but also prove their NP-completeness. In contrast, in the indirect approach the original scheme is followed by a refinement procedure that incrementally improves the mapped algorithms. We describe four different heuristic refinement procedures. Experimental results show that the indirect approach offers a 51 % saving in processor resources and, at the same time, a 36 % saving in execution time, on average.