May 8, 2006 -- Portable and distributed multi-media applications are driving an ever increasing demand for computational performance at acceptable power consumption levels, both at the client and the infrastructure sides. At the same time, the software content driven by functional requirements is constantly increasing and is already in the millions of lines of code in some devices. Parallel processing offers power consumption relief but it brings new challenges as well. A number of multi-core chips are already in the market and with shrinking silicon geometries we will see an increasing number of (homogenous and heterogeneous) cores per chip.
Most of the current dual-core chips have shared memory and a bus. While the bus and shared memory architecture is quite simple from a programming perspective (the cores can easily exchange data), the bus will become a bottleneck as the number of cores increase. We can, therefore, expect to see more and more types of interconnects, such as multi-level buses, crossbars, point-to-point, mesh, network-on-chip, etc., in different logical structures, as well as non-uniform memory architectures much like they have been used at the board level and beyond for a long time. Effective interconnect systems are, or will become, as important for multi-core chips as efficient caches for a single processor. In fact, research at MIT (Agarwal, presented at the Multicore Expo, March 2006) shows that it can be more power efficient to move data from one core to another than to access memory. It is likely that there will be combinations of bus and other interconnect architectures on the same chip as the bus simplifies the transition of legacy code.
From a software application perspective it will be necessary to provide transparent access to the different cores regardless of the type of interconnect and logical topology used. This is much like on the Internet with a variety of different geographically distributed computers, communicating across a "heterogeneous connect fabric," but on a micro scale with a different set of requirements and constraints. Applications that are developed for a single processor need to be modified to take advantage of multiple processors, and those that are written for multiple processors may have to be modified to run on a larger number or different kinds of processors. A unified API that is designed specifically for communication in a closely distributed embedded system (multiple cores on a chip and/or multiple chips on a board), and that is agnostic to the type and number of cores, the type of inter-connects, the logical topology and operation systems, will substantially simplify the process of application migration, and application re-use across a product line and in next generation products.
An Inter Process Communication Framework (IPCF) with a unified API that abstracts away the specifics of the underlying hardware, cores and interconnects, and is consistent across different operating systems, will facilitate application distribution across multiple cores, just like sockets and TCP/IP do on the Internet for widely distributed computing (Figure 1). A closely distributed embedded system generally has more stringent resource constraints than "big iron" computers in regards to power consumption, memory and latency. The communications system in a multi-core chip must provide for low-latency, high-bandwidth data movement as well as inter-process/core synchronization and preferably provide a functionality range from extremely low-overhead streaming to low-overhead, but more flexible, messaging. Prioritized multi channel transfers to allow higher level layers to implement quality of service (QoS) capabilities are also desirable capabilities of the IPCF.
 |
Figure 1. An IPCF with a unified API that abstracts away the specifics of the underlying hardware, cores and interconnects, and is consistent across different operating systems, will facilitate application distribution across multiple cores. |
Multi-processing and communications concepts have been around for a long time and is widely used today, so what's the big deal? They have been applied mostly in widely distributed parallel computing and the difference now is that they are being applied inside a chip or on a board, with more resource constraints and other requirements (bandwidth, latency, etc.). They are also being applied in target markets with a much broader developer base (not used to parallel processing) and with different time to market requirements and shorter product life cycles.
To address the communication, and other issues related to the growing multi-core market, an industry effort, the Multicore Association involving embedded software, semiconductor and hardware vendors, was started about a year ago to enable and strengthen the multi-core ecosystem. One of the efforts under way is to define a standard communication API, CAPI, specifically designed for closely embedded systems. You can find out more by visiting the Multicore Association..
By Sven Brehmer. Brehmer is CEO and Founder, PolyCore Software.
Go to the PolyCore Software, Inc. website to learn more. |