July 3, 2006 -- There’s lots of chatter about the move to multi-cores on a chip, or multi-processor SoC. At times there appears to be more heat than light on this topic. One designer’s multicore may be what another designer calls multiprocessor, and vice versa. People confuse the underlying architecture of a multiprocessor system with the programming model for the applications that run on top of it. Some people’s imaginations run out of steam with two or four processor cores on a single chip. Others talk about thousands of processor cores in a single system, and thousands on a single chip. In fact, there are real devices in the marketplace involving hundreds of thousands of processors in the complete system, with 192 processors on a single die – the Cisco CRS-1 Router with its Silicon Packet Processor, for example.
When we want to talk about multicore and multiprocessor, it’s important to recognize the key parameters of such systems that help divide them into a taxonomy, which can be used to classify any particular architecture into the right type, and then help determine the right design approach for building a multiprocessor system, and using one.
One very common approach in multicore systems is to design an architecture in which each processor has a common and shared view of the rest of the system resources. This includes the memory system and system bus – although of course each core may have its own cache memory. However, in this kind of ‘symmetric multiprocessor’ or SMP system, it is necessary to support cache coherency protocols so that each processor has a correct, common view of system memory. This then allows tasks to be assigned to any of the symmetric processors, and tasks may also be moved around from processor to processor at will. A task suspended from one processor while waiting for memory accesses could be restarted on another, because they all share this memory view.
This SMP architecture seems reasonable for general purpose processing, where there is no need or large benefit gained from tailoring processors to be specifically targeted for specific tasks. It is also reasonable for a general-purpose platform, to run new applications that may have been unknown when the platform was designed. If a multi-core SMP architecture is combined with a multi-threading programming model, then applications might be able to take advantage of the multiple cores and coherent memory model to achieve some modest speedup. Some applications may be able to achieve a small speedup of 1.5-3X if divided into several concurrent threads and run on multiple cores in such a system. Another way to take advantage of multicores would be to run several tasks at once, and when a thread of one stalls on memory access, another thread, either from the same or a different task, may be able to run in its place.
We thus see multicore offerings of from 2 to 8 processors per die, each of which may be able to run 2 or 4 threads, with scheduling and cache coherency supported by a suitable combination of hardware and software support. These are directed to general applications on desktops or servers and might be organized into extremely large server-based clusters for general application support.
There are some who advocate an SMP multicore cluster as a general approach for embedded applications of all types. However, this may not be particularly optimal when designers know a lot about the applications for a multiprocessor system in advance. Assuming one is designing a deeply embedded system for a multimedia application, for example, the design team may have some notion of how to partition the application into a set of concurrent and pipelined tasks, and may have some idea of the communications that is required between tasks – the amount and types of control and data information that must be passed from one task to the next. Armed with this kind of information, a different design strategy may be best. This approach, an asymmetric multiprocessing (AMP) solution, chooses the kinds and configuration of processors that best match the various tasks involved in the application.
The AMP approach has classically been used for applications such as wireless handsets, where the combination of a RISC processor for protocol stack and user interface processing and a DSP for voice encoding/decoding became the cliche of the 2G wireless world. Now designers are looking to AMP approaches for many media and image processing applications. This may go beyond two or three processors. As many as five to ten or more might be useful for a single application oriented subsystem, all tailored to the particular task mapped to them. And by using configurable and extensible processors, it may be possible to achieve performance increases and reductions in energy consumption that go well beyond what is achievable with more conventional fixed instruction set cores (an order of magnitude or more). If multiple AMP-based subsystems are combined, a design team might end up defining an MPSoC (multiprocessor system-on-chip) with 20, 30, 40 processors or more.
Of course, a good architecture long term for multiprocessing may involve a combination of SMP and AMP – AMP for those applications that are well known ahead of time, and which would benefit from application specific customization of the processors; SMP on a few general-purpose cores for applications without heavy computing or communications requirements, or new applications not anticipated when the platform was designed.
Practical MPSoC devices of today seem to be limited to a few 10’s of processors at most, unless there is a strong application regularity and scalability that would allow large scale replication of a few basic processor configurations. Networking applications are one area that lends itself to a high multiprocessor count on a single die.
Once we have a clear taxonomy of multicore and multiprocessor approaches, designing the right platform for our application space requires some kind of design space exploration and mapping of application tasks to various architectural alternatives.
Architectural decision making for MPSoC will include answering several key questions:
- How many processors? How should they be configured? Should they be homogeneous, heterogeneous or a combination of the two?
- How do blocks communicate? Standard hierarchical buses, shared memory, point to point, network-on-chip (NoC), or a combination?
- What is the memory hierarchy? How much local instruction and data memory for each processor, how much system memory and how is it organized?
- What is the concurrency, synchronization and control model for the applications, and what programming model should be used?
- How do you control energy consumption and manage the system for low power?
Answering these questions for a particular MPSoC application will give us a good approximation to the architecture, but answering them will usually require building architectural models, mapping applications to them, and simulating the system. A number of research projects have pointed the way to analytical models to estimate architectural configuration parameters, but most design teams would always want to back up any solutions with simulation. In the last few years, a number of tools, models and modelling abstraction levels (such as transaction-level modelling) have begun to emerge to help designers build system level models to carry out design space exploration.
Perhaps one of the most interesting challenges in developing an MPSoC architecture is to ensure it is sufficiently general for a variety of applications in the particular domain it is designed for. Incorporating many processors is one way of extending the life of an MPSoC platform, since they are programmable, and sizing system memory and communications resources to include some extra design margin will give the platform downstream flexibility. Another approach is to use reconfigurable logic on die to offer in-field programmability and flexibility that may offer greater performance than software-only approaches. Since blocks of reconfigurable logic on die are significantly more power-consuming and offer less performance than standard cell blocks, choosing this strategy comes with a cost. However, it may be appropriate for some application spaces. There are significant cost-performance-power-application domain tradeoffs in adopting any kind of MP approach, and although simulation, formal analysis and experience all help in making appropriate decisions, there is also a strong role for system architects’ intuition.
Good system architects are not born – they are made from a combination of experience, formal training, and an intuitive feel (or “gut” feel) that lets them take risks in suitable proportion. The prospect of committing a complex MPSoC architecture to fabrication is both a daunting one, but one that is required for all design teams working on complex integrated products these days. Therefore, there needs to be sufficient work on developing and validating the right architectural concepts prior to this commitment.
How does a senior designer become a good system architect? They of course need to work as part of a product team and develop both generic and application-specific knowledge. They need to be able to cross software and hardware boundaries and have an understanding of what system-level design is all about. Although experience is necessary, there are ways of more rapidly developing the knowledge base to make an architect. Part of that is attending the right industry technical forums and trade shows and using the opportunity of having the world’s leading ideas and technologists working in MPSoC design methods and tools all laid out at one’s feet. In one intensive week, gaps in knowledge can be filled in, one can be exposed to leading research and industrial practice, and the offerings of vendors of all kinds with relevant tools, IP and methods can be explored. There is even the chance to buy an interesting book or two in an exhibit. DAC 2006 is an excellent opportunity for all interested in MPSoC to fill in gaps in their knowledge.
There are several useful sessions at DAC 2006 for those interested in MPSoC design and verification. A session on processor and communication-centric SoC design on Tuesday July 25, covers design exploration with pipelined buses, improved instruction set extensions for application-specific processors, a variety of custom memory hierarchies, and modelling a fault-tolerant MPSoC. Later on Tuesday, a session on MPSoC design methodologies and applications features a number of fascinating talks, including design space exploration for real-time biomedical monitoring, automated RFID tagging, multimedia applications and exploring both circuit-switched and packet-switched networks-on-chip.
On Wednesday, the MPSoC technical focus shifts to include a special session of a more tutorial nature on MPSoC design challenges, that combines IP viewpoints, leading academic research, and a startup ESL company all contributing to education on the topic. Another session on advanced topics in processor and system verification contains much of relevance to MPSoC, including discussions on verifying recent processors from Intel and IBM (the Cell broadband engine processor). There is also a relevant talk on verifying multimedia SoCs. On Thursday July 27, a session on network-on-chip analysis and optimization will allow attendees to dive into detail on this very interesting area of development which may be more and more important to MPSoC in the future.
On the DAC exhibit floor there will be demonstrations and information from a wide variety of IP and EDA companies – startups and established companies, and both small and large. Many of these offer models and tools that help in MPSoC design, integration and verification. There is also a DAC Pavilion panel on the exhibit floor on Thursday, on “troubleshooting the MPSoC design flow”. The Thursday keynote, on “The Challenges of Convergence”, by Alessandro Cremonesi of STMicroelectronics, will be sure to have some ideas of relevance to MPSoC design teams, as no doubt the Tuesday keynote, by Hans Stork of TI, on “Structuring Process and Design for Future Mobile Communication Devices”, also will.
There is also a relevant tutorial on Monday July 24, on ESL design methodology using SystemC. Many of the techniques, tools and modelling approaches discussed will be highly relevant to building MPSoC architectural models, and among those giving the tutorial will be key technologists from Philips and STMicroelectronics. A hands-on tutorial on Monday, dealing with virtual system prototypes for architectural optimization for low power, will also be an interesting one. Two other events of interest are a tutorial on hybrid modelling on Friday, and a workshop on Sunday July 23 on UML and SoC – the UML is becoming a set of notations of growing interest for specifying and modelling complex SoC devices.
Everyone interested in MPSoC will find much to interest them at the 43rd DAC, July 24 to 28, in San Francisco.
By Grant Martin, Tensilica, Inc.
Go to the Tensilica, Inc. website to learn more.