“Approaching Multicore Conference” Live Blog

I am tuned into the Virtual Conference from work. I hope no one from work actually reads this.

Virtual Conference

The technology is pretty cool. It is a fancy flash based web site. They have an Auditorium where they do presentations, chat rooms and an Exhibit hall.
The Exhibit Hall has mini sites for various vendors. Each vendor has an intro video, information kiosk where you can download data sheets and white papers, and some have a give away. For the give away, they want a survey.

I have no illusions; every click and mouse hover is tracked.

Multicore Presentations

Every single presentation starts with the same slides: clocks have topped out, power goes up with smaller geometries, hardware is doing multicore, multicore software is hard.

We get that. Everyone who is attending is there because they get the fact mutlicore is harder. We want answers and solutions, not the same thing again and again.

The Multicore 101 was mostly the same slides, yet again.

I missed most of the key note. I will try and catch it later, but I had a meeting.

Mentor Graphics and Wind River keep pushing virtualization with multicore. They are linked together, but I do not see it. It seems like if we pay for two cores, I want to fully use both cores. The idea is you put a virtualing RTOS that will schedule the running OS on a given core. The higher levels are not multicore aware.  It sounds like a lot of wasted cycles for code portability.

Chats

There are a number of “sponsored” chats. The first was pretty lively.

There is an assumption that seems like a given to everyone in the room. A multicore system will use Linux SMP. To me that is not a given. The question was asked, “what is the best Linux”. Maybe none is my answer.

I did ask about interrupts. I am the only person asking. People were discussing cache and low level details. Am I missing a bigger picture.

A big BOO goes to the Intel sponsored chat. No one from Intel was in the chat room. Sad.

I asked in one of the forums about keeping legacy code versus a complete rewrite. They sort of missed the point of my question. I wanted an opinion on what was more important to everyone. Is it being able to port old code, even if the mulitcore performance is not fully utilized? Or is a great clean multicore solution compelling enough to cause a rewrite?

No real answer. It is a very difficult question.

Update…

There was some interesting chat in the Tools chat room. I tried to pump up event driven programming. I hope someone checks out the link to this blog. Any comments would be nice, even if they are flames.

I thought the panel discussion was a bit flat.

The last chat was on avoiding the “Train Wreck”. No one there piped up and said, “Yes, I have used X to build system Y and we have shipped ZZZ devices with great success.” The room was full of semi company marketing people and folks like me, trying to learn something new.

Conclusion

Worth the time, other than the stale introductory stuff. At some point the presenters need to realize they are talking to engineers. We all know why we are there.

Were you there? What do you think? Please leave a comment.

Event Driven Multicore Processing

So, what gets us close as possible to the ideal system?

A Simpler Software solution

The July 2006 issue of Embedded Systems Design provides a software solution close to our ideal. The article Build a Super Simple Tasker by Miro Samek and Robert Ward presents a simple software scheduler. A few major points from the article will be listed here, but it is recommended reading for anyone planning to actually build the system. Practical Statecharts in C/C++ by Miro Samek also has tons of useful information on just this sort of system.

  • The SST takes advantage of the fact that most embedded systems are event driven. The forever loop is a bad fit for modeling the event driven nature of embedded systems.
  • An SST task can not be an endless loop. An event causes a chain reaction of executing functions. At some point, if a single event occurs, that event will be processed, and the execution will stop. The SST will go back to an idle state.
  • All tasks are a regular C function calls. The task runs to completion and returns. As stated above, there can be not infinite loops. The SST does not preempt tasks, except during interrupt handling.
  • A single stack keeps track of the execution contexts. The SST does not manage a stack for each task. The one stack supported by the processor hardware is used for all tasks running on that processor
  • All tasks have a priority. The priorities are uniquely assigned. The lowest priority is the idle loop. It is the only infinite loop in the system.
  • All inter-process communication is handled by event queues. Events are queued until their task is the highest priority in the system. No globals, flags, pipes or other mechanisms are provided.
  • The SST must ensure that at all times the CPU executes the highest priority task that is ready to run. For more than one CPU, the highest priority tasks.

The SST is simple, and easy to port. The fact that a single way is provided for tasks to communicate, by passing event messages, means there is a single inter-process communication mechanism to handle when multiple processors are introduced. It also provides enough of an operating system that the changes for a multiprocessor system can be encapsulated. The ‘user’ tasks can be blissfully unaware if there are one, two or seven processors in the system. For these reasons and more that will become clear as we continue, it is ideally suited to a multiple processor system.

Does all this scale?

From both the software and hardware point of view the system should also be scalable. If the software knows how many processors there are and uses this information for scheduling or performance, the system does not scale very well. The best possible system would be able to add processors, and not change the software. The software can be tested on a single processor system, and then if greater performance is required, a second processor can be added.

If you have one processor per task, things are great. Each task can be tied to one processor. That is a very nice way to set up the system, but it does not scale. How do you add or remove processes? It does not scale because it is not flexible. Either that, or you have 100 processes, and there are less than 100 tasks. Again each task can be tied to a processor. If you happen to go over 100, well then you are just screwed.

And, once again, we are confronted with how to handle interrupts.

A flexible hardware

The problem with the hardware is not finding the right processor. The SST can be easily ported to any processor that uses a stack. The hardware needs to be modified to support multiple processors. The exact changes are not yet apparent.

What about the processor stack?

In the description of the SST above, it was stated that there is only one stack. Two microcontrollers can not share one stack! Each processor has its own stack, and nothing in the processor has changed. Compare this to most operating systems that have one stack per software task or process. The big advantage in the SST of not having to manage memory space for multiple stacks has not been lost.

Wait, what about interrupts?

Lets suppose there is only one interrupt output from the interrupt controller, there are two interrupt inputs, one on each processor. What do we do with the output of the interrupt controller?

The one output could be tied to both inputs. That is obviously a very bad idea. Both processors would take the interrupt then some other mechanism would be needed to figure out exactly who does the interrupt handling. Lots of overhead that is just not needed.

One processor could be dedicated to handling all interrupts. But, what if that processor is bogged down with interrupts and the other processor is idle? The system is not sharing the load equally.

We should insert some new hardware between the interrupt controller and the processors. That custom bit of hardware can make sure the interrupt load is shared correctly. What is correct?

It could send one interrupt to processor 1, then one to processor 2, and so on. The number of interrupts per processor would be equal, but is that ideal? Not really. If processor 1 is doing the ‘life support’ calculation, and processor 2 is doing the idle loop, it would be better to interrupt processor 2.

So, really what we want is an interrupt handler that always interrupts the processor running the lowest priority task.

We can go through that next time.

Update…

I am updating so this shows up again after the virtual conference post.

Multicore is like a Pool Table

I am going to drive this metaphor into the ground. It is not a perfect metaphor. It is a very leaky abstraction.

Balls on the table

Many balls on the table

A good way to think of a system is to imagine a pool table. Imagine pool balls scattered randomly around the table. Now, a pool shark walks up and hits the cue ball. The cue ball hits some other balls, one goes into a pocket, and the other balls roll to a stop.

The position of each ball represents the state of the system. The pool shark causes an external event that changes the state of the system. The player uses the stick to interrupt the system, and changes the state of the table. It is useful to think of the cue ball as an interrupt handler, and the colored balls as tasks. Every time one ball hits another, a message is passed to that “task”. The message to a pool ball from another pool ball that hits it is force and direction.

The pool table is a massively parallel machine. Each ball acts on a message immediately. In a single processor system this could not happen. Think about a pool table where only one ball could be moving at once. That models a one processor system.

The way to make one processor look like many is to time slice each process. The pool table also models a time slice operating system making it appear as if more than one ball is moving at a time. Imagine each ball moving a little bit at a time. Still only one moves at any given time, but it looks like they all make progress. If the switching is fast enough, to us slow-moving humans, we can not tell the difference from the real world massively parallel system.

Multicore has more balls

Adding more processors lets more than one ball, or task, be moving at a time.

OK, this is nice, but systems rarely have an event then everything settles to a complete stop before the next event happens.

Now, imagine 3 ten year old boys playing speed pool on the table in the basement. In speed pool, everyone has a cue ball. The only rule is that a player’s cue ball must come to a complete stop before a player hits it again. The other balls need not stop before any give player shoots. Now this is like a real system. One player can hit a ball into other balls already in motion. Chaos and interrupts are happening constantly, one interrupt can cause a chain reaction that affects other tasks that are executing.

As the designer of this system your job is easy. Just make sure nothing goes wrong and all the balls hit each other in the right order.

It is a useful mental model to help you visualize a system. That is a good thing.

Full details next post.

The Requirements

Two Ways to Requirements

I have written three page rants about requirements in the past. I will not subject my gentile readers to that torture in this blog.

Basically there are two ways to do requirements, full out or back of the envelope. If you do full out you have to read the books, get a database, hire a consultant and implement a change request procedure. Anything that will fit on one page is back of the envelope. In my experience, any process in between is a setup for failure.

For this project every thing below this line is “the back of my envelope”.

Continue reading

New Marvell three core chip

Marvell announces new three core chip

It was on Slashdot, so it must be news. The actual articles are here and here from Multicore Info. The actual press release from the Marvell site has more details than both articles combined.

Very cool idea. It is a dual core with a third processor optimized for low power. When things are idle, only the low power core is used.

What Operating Systems are supported?

Damn software guy asking ugly questions like that. There are no details in the articles. If it is for smart phones, a safe bet is that it will run some flavor of Linux.

Of course there in no information on how this chip handles interrupts.

Are there any ideas for our design? A good “should” requirement is “the software will use a low power mode for the processor when no work is needed”. The event-driven programming model is great for a low power application. No events to handle means no processing.

Book Review, Event-Driven Embedded Systems, Worth Reading

The book is Practical UML Statecharts in C/C++, Second Edition: Event-Driven Programming for Embedded Systems. Why do I think this is so important? It just is, so trust me, OK.

First, don’t let the title scare you. If you are writing embedded systems software, you need to read this book.

The book is all about building state machines for embedded systems. I am afraid the whole UML part is a bit of buzzword bingo. They thought a title with UML in it would sell better. If you have a prejudice against UML set it aside and keep reading. It is the book about the Quantum Platform, an embedded RTOS  sold by a company called Quantum Leaps.

The author, Miro Samec, came up with the operating system and concepts in the book. I must admit, I am a bit of a raving fan boy. I have recommended the book to anyone who will listen, and forced some people who worked for me to read it. (Sorry Frank.)

Stacking rings as states

Stacking rings as state

The text builds the ideas of Statecharts. For those who do not know, a statechart is simply a way to build a hierarchical state machine. In my brain, I see the statechart like my kid’s  stacking rings toy. Each ring is a state. To get to the top state or ring, all the lower rings must be in place. To get from the top to the bottom, all the rings in between must be removed.

A jump from bottom to top is not allowed. The software must move from the bottom state through the intermediate states, to the top state, which will handle the event. This means that entry code or exit code in each state must be executed. A higher level state is contained inside the parent state. The way it works is smooth and clean, but different enough that it is hard to grasp at first.

Events are very important to making all this work. An event is the atomic element to pass messages from one statechart to the others. Events are handled by a state. The statechart will keep moving from state to state until an event is handled. That is hugely oversimplified, but it gives you an idea.

The Quantum Framework has all the code needed to handle the events, and build the state charts. Like all good frameworks, you write your application code, and the framework handles all the plumbing. The book calls this the “Hollywood Pattern”, don’t call us, we’ll call you. Nice stuff.

Even if you do not buy or use the Quantum Platform, or even buy into the concept of state charts, you should buy the book. Why? Because Miro has a couple of chapters on “other” ways to implement statemachines in C and C++. They are worth the price of the book.

As you travel through your career as a software and firmware engineer, you will see a whole lot of code. Some of it will be statemachines implemented as switch statements, tables and other constructs. Some of the code you see will not be a statemachine, but really should. Knowing five ways to build a statemachine and the “bad” things about each way is gold.

If you want a taste before you buy, read this article from EE Times, Build a Super Simple Tasker. Hints are given at many of the ideas from the book.

So, in case you haven’t guessed, my multicore ideas are also based on the book and event-driven programming. The event driven model solves the requirements nicely, and even helps answer the one big question, “what about interrupts?”

The legal disclaimer I am required to say as a blogger, I did not get the book for free. I bought the first edition, and thought it was good enough that I bought the second edition. Both purchased with my own hard-won cash. The link above is not an affiliate link.

Have you read it? What did you think, please comment if you liked the book, or hated it.

The Best Way to Pimp the Chip Sales Guy

When the vendor shows up with a hot new multicore chip., there is one question you have to ask. It is a fair question. You can ask just to hassle the sales guy, but you really should ask. They may not know, nobody seems to talk about it. I wonder why?

The question is, “how does it handle interrupts?”

In an embedded system with custom hardware, you will actually write an interrupt handler. It is not someone else’s problem. So how the processor handles interrupts it huge. In multicore, of course it is even worse.

The Cell processor has one PPE and a eight co-processors. The PPE handles all interrupts. In a system with lots of peripherals and lots or real time happening, that could be a problem.

So, one more requirement, the system should spread the interrupts among the processors.

Any other questions I should be asking the chip vendors?