Recently found out about RenderScript which apparently allows to implement highly parallel native computations that can run either on CPU or GPU of the phone.
Already thinking of implementing h.264 encoder in RenderScript. Things like DCT 4x4 and motion search are massively parallel tasks and they take the majority of time during encoding.
The only problem I am seeing at this point is I-macroblock encoding since these explore intra-frame prediction, meaning that the left and top macroblocks should be fully encoded and reconstructed before even starting to encode the current I-macroblock. Though this will be a problem in I-frames I don't see it as a big problem in P-frames. The plan is to first encode everything as P-macroblocks and then selectively replace those over threshold with I-macroblock. The hope is the number of I-macroblocks will not be too large.
Did some initial testing on my Moto X first gen and turns out this one doesn't have the RenderScript Driver for GPU at all. So all my tests are on CPU for now. I have came up with initial kernels for DCT 4x4 and diamond motion search. The DCT 4x4 gives me 20fps on 1920x1080 video and diamond motion search gives me 20fps on 640x480 video. Not so great so far but way much better then Java.
Ordered Nexus 6, since apparently Nexus 5/6/10 do have RS GPU driver. Waiting to get my hands on it.
PS: RenderScript drivers are supplied by device manufacturer and are supposed to be in /system/lib/. By default there's vanilla CPU-based driver -- libRSDriver.so). Since my Moto X has Snapdragon chipset the GPU is Adreno (320 in my case) so the RS GPU driver would have been in /system/lib/libRSDriver_adreno.so . Unfortunately it's not there.