Alright I've had my logibone sitting idle for a while now, but I just had a crazy idea that seems entirely plausible, but I'm limited in my understanding of how to program the fpga.
Let's say I trained a convnet (using tensorflow for instance) for a specific task and once trained extracted all the computed weights and somehow fed them into a the conv3x3 primitive in hard-cv.
So for this to work I would need to have many conv3x3 blocks. The image would be fed to say 64 conv3x3 with different weights creating 64 'feature maps'. To reduce x,y dimension I would probably need to implement the 'max pool' operation which seems pretty simple.
I imagine I would eventually be limited by the amount of gates to run a full neural network, or memory to store weights. The idea would then be to run the end of the network (fully connected network) onto the cpu at this point.
I'd love to hear your thoughts on this, I think it's totally feasible but I'm more qualified to train the net than feed it to this hardware / software hybrid I'm thinking about.
Comments