Memory Beta | Main / Icmessage

Safety Processor Multi-core Messaging Research

Commentary

Major system operations should be deterministic whenever possible – we want to have a grasp on I/O rates and detailed knowledge of the data sources. This makes validation, integration, and debug more straight-forward and leads to a more robust system for which it is easier to identify unexpected behavior. This type of design means we can load the cores statically, and we would ideally not require automatic load balancing at run time as long as the hard real-time requirements for the system are met for expected operation, and there is sufficient slack planning for exceptional conditions.

Chip hardware determines available options for message passing between cores. May use I/O peripherals (for instance, Ethernet ports) assigned to each core, a shared memory/interrupt-based scheme, or special HW assisted inter-core messaging [queues] acceleration engines.

The best IPC model is a model that seamlessly supports a) inter-process/thread communication on the same core/OS domain, and b) inter-core communications within a multicore device. When this is done coherently, the tasks don’t need to have location knowledge of other tasks/objects they communicate with.

Introduction to AMP

AMP – Asymmetric Multiprocessing
Asymmetric multiprocessing, or AMP, AMP can be either homogeneous, where each core runs the same type and version of OS, or heterogeneous. In a homogeneous environment, developers can make best use of the multiple cores by choosing an OS that natively supports a distributed programming model (like Enea’s OSE).

In a heterogeneous OS environment the same principle applies – choose a solution wherein a common distributed programming model applies to each of the selected OS’s. Linx is an IPC handler for OSE and Linux. LINX is independent of the underlying processor, operating system, or interconnect, supports control and data plane applications over reliable and unreliable media.

In general, the AMP model works better for I/O bound processing, because it can scale to more cores better.

If I/O rates are fairly uniform and quantifiable, then the designer knows how to partition the cores to achieve maximum CPU utilization of each core.

If not, then some sort of load balancing mechanism is needed or at least desired. Properly implemented, the AMP model will allow applications running on one core to communicate transparently with applications and system services (device drivers, protocol stacks, etc.) on other cores, but without the high CPU utilization imposed by traditional forms of inter-processor communication.

In virtually all cases, OS support for a lean and easy-to-use IPC communications protocol will greatly enhance core-to-core operation. In particular, an OS built with the distributed programming paradigm in mind can take greater advantage of the parallelism provided by the multiple cores.

since AMP uses multiple OS instantiations, it typically requires a complete networking infrastructure to support communications between applications running on different cores. To implement the lowest level of buffer transfer, an AMP system may use I/O peripherals (for instance, Ethernet ports) assigned to each processor/core, or better they would use a shared memory/interrupt-based scheme, or special HW assisted inter-core messaging [queues] acceleration engines, depending upon the system’s hardware capabilities.

The best IPC model though is a model that seamlessly supports a) inter-process/thread communication on the same core/OS domain, b) inter-core communications within a multicore device, and c) inter-device communications with other CPUs in the networked system.

With the transparent message passing approach, local and remote communications become one and the same: an application can use the same code to communicate with another application, regardless of whether the other application is on the local CPUcore or on another core, or even outside the device.

Allocation of resources in AMP

With AMP, the application designer has the power to decide how the shared hardware resources (devices and/or shared memory) used by applications are divided up between the cores. Normally resource allocation occurs statically during boot time and includes physical memory allocation, peripheral usage, and interrupt handling. While the system could allocate or access the resources dynamically, doing so entails complex coordination between the cores. And this is a significant problem in a pure multicore AMP scenario. But it is one that can be ddressed by the techniques described in the section about Multicore Implementations for Telecom/Networking Applications below.

ThreadX salesman on AMP
“The core ThreadX doesn't address inter-core communication in AMP mode. Most customers will write their own inter-core communication primitives using shared memory or other hardware resources. We do have a port to OpenAMP, which is a more standard inter-core communication mechanism.”

SafeRTOS salesman on AMP
“FreeRTOS, OpenRTOS and SafeRTOS do not support AMP or SMP. Typically, an instance of SafeRTOS is used on each core, individually, redundantly. The SafeRTOS license allows for any number of instances within the same product, so there is no cost disadvantages. Some background first - for the seven years I’ve been handling N&S American sales, I find all my customers secretive about their safety system design/plans…i.e. not sharing anything not necessary, above the RTOS. I guess - letting the outside world know what apps are used (even that SafeRTOS is used) in a safety product induces a certain amount of additional risk. That said, I still asked my safety integration engineering lead but to no avail….he indicated the same as above, this data is not shared with his team. Pretty tight lipped.”

OpenAMP
https://www.multicore-association.org/workgroup/oamp.php https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug1186-zynq-openamp-gsg.pdf The OpenAMP provides an open source framework that allows operating systems to interact within a broad range of complex homogeneous and heterogeneous architectures and allows asymmetric multiprocessing applications to leverage parallelism offered by the multicore configuration. This comprehensive framework enables developers to manage the challenges associated with inter-process communication (IPC), resource management and sharing, and process control.

Xilinx chairs the group. Pre-ported OS support by FreeRTOS (what does this mean?). Compatibility with MCAPI to support high-performance use cases and zero-copy.

What is the difference between the open source version and commercial version of OpenAMP? The open source will cover some use cases that contributors put in effort behind. Commercial implementations will cover other use cases, including situations where you need a safety certification, when you are using a commercial (or possible an open source) hypervisor as the separation technology. Commercial solutions will also bring other values such as tools and support.

What is MCAPI?
MCAPI provides a standardized API for communication and synchronization between closely distributed (multiple cores on a chip and/or chips on a board) embedded systems. language-independent, processor and operating system agnostic communications protocol used to program multi-core devices. MCAPI provides three modes of communication: messages, packets, and scalars. MCAPI is a client / server model. A channel endpoint is used to either send OR receive data. A message endpoint can be used to send and receive data.

What about shared memory required for MCAPI?
An implementation can use a shared memory driver for sending and receiving data across nodes in a system, but shared memory is not required for using MCAPI. The MCAPI specification does not dictate the type of transport mechanism to be used to send and receive data.

Is OpenAMP applicable for a system with separate chips with different cores? There are 2 specific aspects that make the OpenAMP framework not efficient when using it outside of the single SoC.

There is a porting guide available.

Multi-core Association Implementations
https://www.multicore-association.org/products/index.php

--Mentor’s offering is Mentor Embedded MCAPI which provides “fast IPC messaging”. Mentor Services can assist with porting to other operating systems. They have a web seminar: https://www.mentor.com/embedded-software/events/mcapi-multicore-webinar

--ThreadX can be used in an AMP or SMP multicore system, with MCAPI-compliant inter-processor communications support from PolyCore Software. Poly-Messenger/MCAPI can communicate between multiple system nodes through any communications medium desired, including Express Logic's NetX TCP/IP stack where TCP or UDP transfers can be used to coordinate the activities of multiple tightly-coupled or loosely coupled cores and processors.

On the TriCore
https://groups.google.com/forum/#!topic/open-amp/6SKlvrQs8K8

RPMsg
Designed for intra-chip multi-core messaging using shared memory. RPMsg is an OpenAMP implementation, but RPMsg-Lite reduces size and improves modularity; recommended for lower performance cores.

-lite version source and documentation: https://github.com/NXPmicro/rpmsg-lite

Self-build Suggestions

FreeRTOS with/without OpenAMP
“Hi, I'm completely new to FreeRTOS and I'm trying to study it to understand if it fits my needs for my application. My idea is to use it on a Zynq board, so with a dual core ARM, building an AMP system with two FreeRTOS instances running at the same time. However, I need also strict synchronization between their tasks. From Xilinx notes, I saw that for bare metal applications the standard way to synchronize two cores is using polling on some shared variables in the OCM, but of course I don't like busy waiting. Is there a standard/suggested way to synchronize two instances of FreeRTOS? Of course also suggestions on a different approach to use both cores is welcome.”

“The Xilinx OpenAMP code is definitely one approach you could take, although it is intended for Linux to FreeRTOS comms, and may be a bit heavy for FreeRTOS to FreeRTOS comms.

We have implemented some light weight event driven AMP systems using thread safe circular buffers and direct to task notifications. There are several approaches that can be taken - but the basis is the same for all approaches; one core buffers some data then generates an interrupt in the other core, the ISR uses a direct to task notification to unblock a task that reads the data from the buffer and processes as necessary. Simple and light weight.”

FreeRTOS message buffer page, which mentions multi-core use:
https://www.freertos.org/RTOS-message-buffer-example.html The FreeRTOS/Demo/Common/Minimal/MessageBufferAMP.c source file provides a heavily commented example of how to use a message buffer to pass variable length data from one MCU core to another on a multi-core MCU.

Multi-core FreeRTOS: http://www.rbccps.org/wp-content/uploads/2017/10/06961844.pdf In this paper we describe an effort to design and implement a multicore version of FreeRTOS for symmetric multiprocessors (SMP’s).

https://www.cs.york.ac.uk/fp/multicore-freertos/spe-mistry.pdf A working multicore version of FreeRTOS is presented that is able to schedule tasks on multiple processors as well as provide full mutual exclusion support for use in concurrent applications. Mutual exclusion is achieved in an almost completely platform–agnostic manner, preserving one of FreeRTOS’s most attractive features: portability.

Using MCAPI -(basics: https://www.embedded.com/design/mcus-processors-and-socs/4433528/The-basics-of-using-MCAPI-in-multicore-based-designs)

Implementation paper on FPGA multi-processor:
https://www.researchgate.net/publication/220714343_Multicore_Communications_API_MCAPI_implementation_on_an_FPGA_multiprocessor

Another implementation paper for a thesis project:
https://pdfs.semanticscholar.org/3c6d/d8eb7a3b563489bfdcd03f2eb21d7fa63590.pdf

OpenMCAPI implementation, supported commercially by Mentor
https://bitbucket.org/hollisb/openmcapi/wiki/Home

Others
This paper on inter-core messaging for a TI DSP chip outlines mechanisms to use and shows use case examples.
http://www.ti.com/lit/an/sprab25/sprab25.pdf

hardware semaphores
core global/local memory map
memory protection
DMA
exceptions, interrupts, events

RTOS Certification paper

http://atcproyectos.ugr.es/recomp/images/stories/deliverables/T3_2_Report_Final_v100.pdf “To support general software communication in a multiprocessor configuration, atomic operations must be provided to retain consistency during access to shared memory. In addition, even with atomic operations providing consistency of data visibility, choices remain about the timing of the access to the shared memory. Polling by readers of data is a simple but lower performance option (due to the consumed cycles when no updates has occurred), the alternative is some mechanism to notify other cores when and if new data to be communicated has been provided.” Describes requirements for various core-2-core modules. A data item format proposal from the SafeRTOS folks.

Memory Copy/Move vs Sharing Comparison

Sharing saves memory space Moving can put the data in faster access memory Sharing avoids transfer delay Moving can free up the memory from the source, though it uses memory of destination Copy can give the data to multiple cores to work on in parallel

SAFEXchange by SAFERTOS

The SafeRTOS folks offer SafeXchange, but also targets multi-processor systems. They use the DDS (data distribution service) to describe the concept rather than implying compatibility.
https://www.highintegritysystems.com/safety-plugins/safexchange/

RTI Connext DDS Micro

https://www.rti.com/developers/case-code/automotive Publish-subscribe framework, decentralized (P2P), and designed to be mostly independent of OS services. Can do bare-metal also. Currently implemented for Ability to run on Linux (x86), Windows, FreeRTOS (ARM), vxWorks (PowerPC) and devices without OS (ARM). Ships with source, and advertises easy porting to new OS/compiler/arch. User-configurable modularity, constructed around a small kernel.

This company also makes the most popular DDS implementation, Connext DDS Pro.

Less Apt Options

LINX
LINX was released as free and open-source software, subject to the requirements of the GNU General Public License (GPL), version 2. Utilizing direct message passing, LINX scales from DSPs and microcontrollers to 64-bit CPUs. LINX is independent of the underlying processor, operating system, or interconnect, supports control and data plane applications over reliable and unreliable media. LINX enable application processes distributed across multiple operating systems, CPUs, and interconnects to communicate, as if they were running on the same CPU under the same operating system. Seems focused more on distributed IPC and scaling up to large systems, so it’s probably not ideal for inter-core messaging.

MPI is a communication protocol for programming parallel computers. Also not suited it appears.