1. 10

tl;dr 32 FPGA MD5 units running at 50 MHz do 1000x worse than a single 2.7 GHz Xeon core. Oh well, I tried.

    1. 3

      Doesn’t really matter if it’s slow if you met your objectives. It looks like you learned something and your tutorials are well done!

      Have you considered pursuing a fully-unrolled architecture like in this paper or this code? You end up with a 64-deep combinational logic pipeline and instead of using a shifter you just wire the shifted bits across (since your pipeline is fully unrolled). Obviously computing any individual hash takes 64 + load overhead cycles but since you’re cracking you can pump a plaintext each cycle and get tremendous throughput.

      1. 2

        I considered fully unrolling the pipeline when I was starting out, but decided against it thinking that I could fit more units on the FPGA by keeping the logic block count low. That’s why I did things like using a single 32-bit adder. But even then, 32 compute units take up >30% of the logic elements on the FPGA and an age to compile.

        There are clearly a lot of optimizations I could make, but given that I’m so far behind even a CPU I don’t think its really worth my time. I’ll be exploring real-time processing next, which I think the FPGA will be better suited for.

    2. 3

      I agree with bri3d; the fact that you learned something and wrote up good tutorials is awesome! It’s pretty difficult for most people to get started with FPGAs because the documentation sucks and EEs don’t spend nearly as much time blogging and writing tutorials as CS folks.

      If you care about the practical aspect, I’ve spent a significant chunk of my life doing designing CPUs and dedicated hardware accelerators[1] and I’d be surprised if the cost/benefit works out for FPGAs over GPUs for MD5. It’s possible that you can squeak ahead in terms of TCO if you take into account the cost of powering the devices, building boards and racks (I expect that you can cram more of these chips onto a board than you can with GPUs, which lets you use fewer racks), etc., and don’t include the value of your time, but it would be a lot of effort.

      [1] I don’t mean this as an appeal to authority. Just sayin' I’ve spent some time thinking about this sort of thing.

    3. 1

      Great series. Please continue to do more. I am doing a module starting in a few weeks to do with fpga’s so this is nice to read up on what I am getting myself into :)

      1. 1

        What’s a module? Is it some sort of class? I should warn you that I am making things seem a lot easier than they actually are in these posts. It is hard to convey the frustration of debugging embedded systems in words (well, at least not in words acceptable in polite conversation). Plus, I haven’t even gotten to interfacing with peripherals yet and that’s the hardest part. I’ll be talking about the audio codec in my next post, so that will hopefully give you a sense of the complexities inherent in interfacing with external systems.

        1. 1

          Yes its a class. I am attending university at the moment. Yes, people have told me FPGA is difficult but worth the effort sometimes. Anyways, I hope you get the time to go into the details for us newbies. Think of it like educating the rest of us on how you would have wanted your FPGA class to be like :) I believe I will be doing a project with the FPGA too, so I will be looking around for inspiration on what to do.

          1. 1

            For some final project ideas, you may want to look at the projects from our embedded systems class. http://www.cs.columbia.edu/~sedwards/classes/2013/4840/index.html

            And yeah, it’s definitely worth it, even though it’s quite challenging.