A High-Level Programming Model for Embedded Multicore Processors



Journal Title

Journal ISSN

Volume Title



Traditionally, embedded programmers have relied on using low-level mechanisms for coordinating parallelism and managing memory. This is typically a herculean task, especially considering that this approach is processor-specific and requires that the process must be redone to target different deployment processors. As multicore technology becomes more prevalent in embedded systems, high-level approaches are being sought to reduce programmers' burden as they write code for more complex multicore systems. This dissertation explores implementing a high-level shared-memory parallel programming model for embedded multicore processors. The processor representative of this type that is used for this work is the TMS320C6678 (also referred to as C6678) digital signal processor (DSP) manufactured by Texas Instruments.

The C6678 is a high-performance fixed and floating-point DSP that comprises eight DSP core subsystems. In addition to external memory, it has roughly 8MB of on-chip memory, most of which may be configured as either cache or scratchpad. When a portion of its local on-chip memory is configured as cache, software-controlled mechanisms must be used to manage the coherence of shared data that is cached in core-local memories. When the same memory is configured as scratchpad, software-controlled mechanisms are also necessary to manage data movements between memory segments within the memory hierarchy. This memory organization brings additional challenges when developing applications for the C6678 as well as other processors with similar memory setups.

In this dissertation, we present a compiler implementation of a high-level programming model for managing parallelism in the C6678. This implementation is leveraged to automatically utilize scratchpad memory without additional intervention from the programmer. A high-level construct is also introduced for controlling data placement. An assessment of the performance impact of various memory configurations of the C6678 is also presented.



Programming languages, Compilers, OpenMP, Performance