Only a few more general posts before we get to the details. These post help with background information and getting some reason behind the requirements.
OK, so we want a general purpose multicore software solution to use in our embedded systems. The system has multiple cores, so it should do something in parallel, right? That brings up the subject of granularity.
Intel made big news last week with the announcement of their new parallel tool set, Intel Parallel Building Blocks. They have a video presentation that explains how it is supposed to work. Once again, the tool is build on top of the OS.
Wikipedia has the best definition. In the design phase we have to decompose the system into parts. Which parts can run in parallel and which can not? That is the key to unlocking Amdahl’s Law. The more stuff that happens serially, the less advantage the multiple cores give us. So the question presents itself is what is the right level of granularity for us?
To paint with a big brush, we can divide a system into three levels of granularity, in order from finest to coarsest, we’ll call them data, process, and system.
The new Intel tools seem to concentrate on parallel data processing. The video gives examples of data level granularity. When the application requires lots of calculation on a large data set, we are doing data level granularity. An old school example are vector processors. A wide register set holds multiple pieces of data. The same operations are then carried out on the registers. The ARM NEON co-processor does vector math using this method.
Process level granularity is what we get on a PC when running separate processes or threads. The interprocess communication (IPC) becomes the problem.
System level granularity is usually associated with multiple processor systems. It is possible to run two complete operating systems on the two cores on a chip. If the memory space is not shared, they will run happily side by side and never know about each other.
The best fit
The thing about doing data level granularity, it is not generic. Intel has made tools to make it easier, but you, the developer must write the code, then add the special sauce. There are special library calls for the parallel for, and other C code key works. It is super specific to the application, so not useful for a “generic” solution.
System level granularity does not help us either. We are trying to build a super efficient embedded system here. Running two independent operating systems seems a bit wasteful. There are reasons to do this sort of thing. In the world of security and high availability they make redundant systems. The two systems must execute the same instructions totally in parallel or the system is in error.
So that leaves process level granularity. This is the one for us.
That leaves the issue of IPC. If we are using Linux, Stevens wrote a book for us. UNIX Network Programming, Volume 2: Interprocess Communications. The book is just 592 pages. It is a good book. I recommend all of his books if you write code for Linux or use TCP/IP. But, please, 593 pages, that is ridiculous. He does not even really address multicore specifically.
So, what are we really asking for in our multicore embedded system? Some more requirements are falling out of this discussion.
- The system will make tasks run in parallel.
- A simplified interprocess communication is needed.
Those two sound minor, but they have huge impact on how the system works.
Next time I recommend some lite reading, then on to the multicore solution.