Here are two closely related papers (unfortunately, they are not open access.)
An Efficient Program for Many-Body Simulation (1985):
http://epubs.siam.org/doi/abs/10.1137/0906008
A portable distributed implementation of the parallel multipole tree algorithm (1990):
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=518690
Here’s an open access paper on fast multipole methods: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2634295/pdf/nihms77405.pdf