Main / Icmessage
Safety Processor Multi-core Messaging Research CommentaryMajor system operations should be deterministic whenever possible – we want to have a grasp on I/O rates and detailed knowledge of the data sources. This makes validation, integration, and debug more straight-forward and leads to a more robust system for which it is easier to identify unexpected behavior. This type of design means we can load the cores statically, and we would ideally not require automatic load balancing at run time as long as the hard real-time requirements for the system are met for expected operation, and there is sufficient slack planning for exceptional conditions. Chip hardware determines available options for message passing between cores. May use I/O peripherals (for instance, Ethernet ports) assigned to each core, a shared memory/interrupt-based scheme, or special HW assisted inter-core messaging [queues] acceleration engines. The best IPC model is a model that seamlessly supports a) inter-process/thread communication on the same core/OS domain, and b) inter-core communications within a multicore device. When this is done coherently, the tasks don’t need to have location knowledge of other tasks/objects they communicate with. Introduction to AMPAMP – Asymmetric Multiprocessing In a heterogeneous OS environment the same principle applies – choose a solution wherein a common distributed programming model applies to each of the selected OS’s. Linx is an IPC handler for OSE and Linux. LINX is independent of the underlying processor, operating system, or interconnect, supports control and data plane applications over reliable and unreliable media. In general, the AMP model works better for I/O bound processing, because it can scale to more cores better. If I/O rates are fairly uniform and quantifiable, then the designer knows how to partition the cores to achieve maximum CPU utilization of each core. If not, then some sort of load balancing mechanism is needed or at least desired. Properly implemented, the AMP model will allow applications running on one core to communicate transparently with applications and system services (device drivers, protocol stacks, etc.) on other cores, but without the high CPU utilization imposed by traditional forms of inter-processor communication. In virtually all cases, OS support for a lean and easy-to-use IPC communications protocol will greatly enhance core-to-core operation. In particular, an OS built with the distributed programming paradigm in mind can take greater advantage of the parallelism provided by the multiple cores. since AMP uses multiple OS instantiations, it typically requires a complete networking infrastructure to support communications between applications running on different cores. To implement the lowest level of buffer transfer, an AMP system may use I/O peripherals (for instance, Ethernet ports) assigned to each processor/core, or better they would use a shared memory/interrupt-based scheme, or special HW assisted inter-core messaging [queues] acceleration engines, depending upon the system’s hardware capabilities. The best IPC model though is a model that seamlessly supports a) inter-process/thread communication on the same core/OS domain, b) inter-core communications within a multicore device, and c) inter-device communications with other CPUs in the networked system. With the transparent message passing approach, local and remote communications become one and the same: an application can use the same code to communicate with another application, regardless of whether the other application is on the local CPUcore or on another core, or even outside the device. Allocation of resources in AMPWith AMP, the application designer has the power to decide how the shared hardware resources (devices and/or shared memory) used by applications are divided up between the cores. Normally resource allocation occurs statically during boot time and includes physical memory allocation, peripheral usage, and interrupt handling. While the system could allocate or access the resources dynamically, doing so entails complex coordination between the cores. And this is a significant problem in a pure multicore AMP scenario. But it is one that can be ddressed by the techniques described in the section about Multicore Implementations for Telecom/Networking Applications below. ThreadX salesman on AMP SafeRTOS salesman on AMP OpenAMP Xilinx chairs the group. Pre-ported OS support by FreeRTOS (what does this mean?). Compatibility with MCAPI to support high-performance use cases and zero-copy. What is the difference between the open source version and commercial version of OpenAMP? The open source will cover some use cases that contributors put in effort behind. Commercial implementations will cover other use cases, including situations where you need a safety certification, when you are using a commercial (or possible an open source) hypervisor as the separation technology. Commercial solutions will also bring other values such as tools and support. What is MCAPI? What about shared memory required for MCAPI? Is OpenAMP applicable for a system with separate chips with different cores? There are 2 specific aspects that make the OpenAMP framework not efficient when using it outside of the single SoC. There is a porting guide available. Multi-core Association Implementations --Mentor’s offering is Mentor Embedded MCAPI which provides “fast IPC messaging”. Mentor Services can assist with porting to other operating systems. They have a web seminar: https://www.mentor.com/embedded-software/events/mcapi-multicore-webinar --ThreadX can be used in an AMP or SMP multicore system, with MCAPI-compliant inter-processor communications support from PolyCore Software. Poly-Messenger/MCAPI can communicate between multiple system nodes through any communications medium desired, including Express Logic's NetX TCP/IP stack where TCP or UDP transfers can be used to coordinate the activities of multiple tightly-coupled or loosely coupled cores and processors. On the TriCore RPMsg -lite version source and documentation: https://github.com/NXPmicro/rpmsg-lite Self-build SuggestionsFreeRTOS with/without OpenAMP “The Xilinx OpenAMP code is definitely one approach you could take, although it is intended for Linux to FreeRTOS comms, and may be a bit heavy for FreeRTOS to FreeRTOS comms. We have implemented some light weight event driven AMP systems using thread safe circular buffers and direct to task notifications. There are several approaches that can be taken - but the basis is the same for all approaches; one core buffers some data then generates an interrupt in the other core, the ISR uses a direct to task notification to unblock a task that reads the data from the buffer and processes as necessary. Simple and light weight.” FreeRTOS message buffer page, which mentions multi-core use: Multi-core FreeRTOS: http://www.rbccps.org/wp-content/uploads/2017/10/06961844.pdf In this paper we describe an effort to design and implement a multicore version of FreeRTOS for symmetric multiprocessors (SMP’s). https://www.cs.york.ac.uk/fp/multicore-freertos/spe-mistry.pdf A working multicore version of FreeRTOS is presented that is able to schedule tasks on multiple processors as well as provide full mutual exclusion support for use in concurrent applications. Mutual exclusion is achieved in an almost completely platform–agnostic manner, preserving one of FreeRTOS’s most attractive features: portability. Using MCAPI -(basics: https://www.embedded.com/design/mcus-processors-and-socs/4433528/The-basics-of-using-MCAPI-in-multicore-based-designs) Implementation paper on FPGA multi-processor: Another implementation paper for a thesis project: OpenMCAPI implementation, supported commercially by Mentor Others
RTOS Certification paperhttp://atcproyectos.ugr.es/recomp/images/stories/deliverables/T3_2_Report_Final_v100.pdf “To support general software communication in a multiprocessor configuration, atomic operations must be provided to retain consistency during access to shared memory. In addition, even with atomic operations providing consistency of data visibility, choices remain about the timing of the access to the shared memory. Polling by readers of data is a simple but lower performance option (due to the consumed cycles when no updates has occurred), the alternative is some mechanism to notify other cores when and if new data to be communicated has been provided.” Describes requirements for various core-2-core modules. A data item format proposal from the SafeRTOS folks. Memory Copy/Move vs Sharing ComparisonSharing saves memory space Moving can put the data in faster access memory Sharing avoids transfer delay Moving can free up the memory from the source, though it uses memory of destination Copy can give the data to multiple cores to work on in parallel SAFEXchange by SAFERTOSThe SafeRTOS folks offer SafeXchange, but also targets multi-processor systems. They use the DDS (data distribution service) to describe the concept rather than implying compatibility. RTI Connext DDS Microhttps://www.rti.com/developers/case-code/automotive Publish-subscribe framework, decentralized (P2P), and designed to be mostly independent of OS services. Can do bare-metal also. Currently implemented for Ability to run on Linux (x86), Windows, FreeRTOS (ARM), vxWorks (PowerPC) and devices without OS (ARM). Ships with source, and advertises easy porting to new OS/compiler/arch. User-configurable modularity, constructed around a small kernel. This company also makes the most popular DDS implementation, Connext DDS Pro. Less Apt OptionsLINX MPI is a communication protocol for programming parallel computers. Also not suited it appears. |