How fast are your implementations? Mine should work at about 500.000 points/second/core. I'm not going to spend much time optimizing it but I was curious to know if other people work at a much higher speed or about the same.