Main / Rtos
Comparing bare-metal FW vs. RTOS for your project: An RTOS can give you faster response in exchange for reduced throughput; i.e. bare metal is good for doing a few simple things at a high rate of speed, whereas an RTOS is better for handling many different kinds of things. Response requirements generally drive the choice of architecture. Here is the Xilinx take: From the SAFERTOS folks: “Should we Use an RTOS?”
More comparisons: PXROS vs. SAFERTOS most important points of difference - PXROS is larger in memory, with more features/utils - PXROS does not copy for messaging, it transfers memory region ownership between tasks; SAFERTOS uses copy - PXROS uses Messages/Mailboxes/Events while SAFERTOS uses Queues/Notifications/Semaphores for task sync/comm - SAFERTOS can implement shared memory between tasks by passing pointers on a Queue - PXROS claims to take advantage of Aurix HW architecture (vendors have close association) - PXROS vendor might be able to provide better platform support (preferred design, Aurix expertise) - SAFERTOS seems more platform generic in construction, and is ported to many more processors - We have identified an ASIL D TCP/IP stack for SAFERTOS - PXROS allows task dynamic memory allocation, SAFERTOS does not - PXROS runs micro-kernels on each core and the OS handles the internals - SAFERTOS needs an independent instance on each core and extra communication management (does not support AMP or SMP) - SAFERTOS based on the open source and very popular FreeRTOS kernel - SAFERTOS delivered as source, PXROS delivered as libraries/headers You can probably use bare metal if your application is fulfilling one purpose while handling one slow I/O stream. Beyond that, you want to consider an RTOS. From The Embedded Muse contributor: The basis of such a system is of course a scheduler where all tasks MUST complete before the next timer tick occurs. I took the inspiration for mine from RIOS ( https://www.cs.ucr.edu/~vahid/rios/ ) which I then refined to add debug printing from a buffer in the "dead" zone, a "run" LED which also served as a simple load monitor, and other niceties. There are of course a few considerations with such a system: 1. obviously all tasks must complete in the allotted time, or you have a fatal error. 2. everything is single thread, so generally atomicity is only a problem for memory objects which are accessed in ISRs. The general rule to only write to shared buffers etc from ONE (higher priority) source serves very well here, but specific cases do need to be thought through. 3. tasks needing finer granularity than the timer tick (in my case, actual generation of pulses for the motors) can operate in higher priority interrupts, driven from additional timers or other sources. 4. If tasks are kept reasonably simple, determining worst case timing is pretty easy - you just measure the worst case for each task, take the worst case sum, and add the worst case time given to ISRs. This for me is the KEY advantage. (No worries about task switching overhead or when certain tasks might pop up and throw things out. They all run, every time.) 5. Obviously blocking in a task (or an ISR) is a complete no-no. Many tasks end up being little state machines. Eventually the system becomes a group of state machines with separate responsibilities, which communicate via their APIs. C lends itself to this approach quite well if you see each .c/.h pair as an "object". (These days I might have multiple files in a folder, with only one "external API", to allow for unit testing within a module using Ceedling.) In such a way, separation of concerns and modularity is quite achievable. In modern systems, for instance certain SOC's that combine several embedded processors with FPGA real estate, such an approach could be hugely powerful - one bare metal processor (with hardware support as needed) does the hard RT stuff, the other perhaps runs embedded Linux or similar to provide networking and higher level tasks. All in all it seems to me that this approach has many advantages over an RTOS, at least until a certain level of complexity, from the point of view of ensuring hard real time performance. I find it a bit prosaic that Gliwa has entirely passed over such an approach in a book dedicated to the subject of timing in real time systems, and particularly the implication that anything more than the simplest tasks must use an RTOS. For me, one of the key factors in building systems successfully is to select the simplest and most appropriate techniques of "getting it done". I think that Gliwa has fallen back on the "defaults" of the auto industry here (well-proven though they may be) and bypassed a very important class of simpler solutions. TerminologyDevice stacks are all about making hardware peripherals available to application code through abstract and generic software interfaces. By placing more or less modules on a stack, you can choose the abstraction level you want to use in your application. The lowest level modules are specific for a particular hardware device. On top of that, you can stack higher level modules that provide more generic functionality to access the device. For example, at the higher, abstract level, you could choose to use a module to access a file system in your application. At the lower levels you still can select modules to decide which specific storage device you want to access (a hard drive, SD card, RAM drive, ...) Thus, the lower level modules are more specific for a particular peripheral while the higher level modules are less hardware specific and can even be used in combination with multiple peripheral devices. Middleware is a term sometimes used to describe software that lies between the device or service layer and application software. Or maybe a service itself that connects applications to a device of some type could be called middleware. A TCP/IP stack is sometimes called middleware. An RTOS itself is sometimes lumped in with middleware, given that it often operates above a target-specific platform abstraction layer that holds all the peripheral/driver modules. Some RTOS vendors use the terms peripheral and device specifically. A peripheral would be the lowest level module, while a driver would make use of the peripheral to provide services to an application. I like the construct that devices are off-chip ICs, while a peripheral is an interface module that's part of the SoC. Therefore device drivers make use of the lower level peripheral drivers. If not using an RTOS, be aware that an increasing number of threads will decrease run time determinism. MechanismsHow does a context switch work?In the case of the TriCore, the OS definitions code includes a struct for a special chunk of memory called the Context Switch Area (CSA). On a switch, the core registers are dumped to this memory block. Presumably there should be one for each task. RTOS basicsSAFERTOS training materials: https://www.highintegritysystems.com/rtos/rtos-tutorials/ context switchingDoes context switching (aside from ISRs) happen even if there is only one task? Possibly, because the scheduler may be periodically invoked by the kernel for housekeeping. Depends on the kernel behavior if only one task is created. interruptsIn real-time kernel based systems, the routines that service hardware interrupts are typically small and fast. Their main function is to capture or send data, and to notify the task scheduling kernel of any further processing required. The bulk of application processing is carried out by tasks running at the “background”’ level of the processor; i.e. its normal state when it is not executing an ISR. There are two general types of task scheduling in real-time kernel based systems:
multi-threadingThreadX recommends: Stack size is always an important debug topic in multithreading. Whenever unexplained behavior is observed, it is usually a good first guess to increase stack sizes for all threads—especially the stack size of the last thread to execute! If you want to end a thread that normally just runs an infinite loop, you can't use join alone because the master will wait forever. Instead, one idea is to check a global flag (like an atomic) each time through the loop, and set a stop bit from the master if you want the thread to end and then use join. re-entrant (reentrant) functionsA function that can be used safely by multiple threads. Also, a recursive function must be reentrant. ARM says "Code is reentrant if it can be interrupted in the middle of its execution and then be called again before the previous invocation has completed." (Ganssle) A routine must satisfy the following conditions to be reentrant:
Testing for re-entrancy is not straightforward, but here are some ideas: http://www.ganssle.com/articles/areentra.htm An example of a non-reentrant function is the string token function “strtok” found in the standard C library. This function remembers the previous string pointer on subsequent calls. It does this with a static string pointer. If this function is called from multiple threads, it would most likely return an invalid pointer. Functions that are not reentrant must be protected from interrupts. One way this can be accomplished is to put a wrapper that disables and re-enables interrupts around the calls. thread safetyThread safety is a property that allows code to run in multithreaded environments by re-establishing some of the correspondences between the actual flow of control and the text of the program, by means of synchronization. These mechanisms can be used to provide safety:
watchdogA nice feature of embOS to allow setting up software watchdogs: https://www.segger.com/doc/UM01001_embOS.html#Watchdog Multi-coreInformation about OpenAMP, a framework for communication between OSes in a multi-core system. http://openamp.github.io/docs/linaro-2017/OpenAMP-Intro-Feb-2017.pdf Some RTOS build in handling of multi-core systems, while others don't. For example PXROS if configured correctly will run a microkernel on each core to provide system common services and automate communication of messages between processes even if they are on different cores. ThreadX requires you to set up and run an instance on each core and handle creating an inter-core message system yourself. Security PrinciplesGreen Hills Integrity has been certified at the highest security rating: https://www.eetimes.com/document.asp?doc_id=1169789 Program SpaceFor example, a ThreadX application program might look like this in memory: ROM/Flash instruction area (machine code) constant area (used to set up the RAM initialized data) ... RAM initialized data unintialized data stack The two RAM data areas contain all the global and static variables. I/OGeneral PracticesISRs can be part of the peripheral driver, which allows non-blocking communication which does not stall the send/receive threads. ISRs could also be associated with the task (application code) instead. Latency Estimation Chart
QNX Intro
VxWorks
ThreadXHere is an example application showing use of timers and queues: ThreadX Example Application SMP solution was approved in 2018 for critical applications with 100% code coverage. This certification ensures compliance with the industrial safety standard IEC 61508 and all standards derived from it, including IEC 61508 SIL 4, IEC 62304 Class C, ISO 26262 ASIL D, and EN 50128 SW-SIL 4. Thread Init
MessagingThreadX also uses messages and queues/mailboxes for IPC, with notifications. Mailbox = single message Q. Create a Q to hold messages. Each message queue is a public resource. ThreadX places no constraints on how message queues are used. The memory area for buffering messages is specified during queue creation. Like other memory areas in ThreadX, it can be located anywhere in the target’s address space. > How did ThreadX add SMP support?This review only covers the headers/interface, not the code body. Add time slice measure for each core, and a global interrupt active count TIMER_DECLARE ULONG _tx_timer_time_slice; /*to*/ TIMER_DECLARE ULONG _tx_timer_time_slice[TX_THREAD_SMP_MAX_CORES]; /* Define count to detect when timer interrupt is active. */ TIMER_DECLARE ULONG _tx_timer_interrupt_active; Add a series of SMP specific variables /* Define all internal SMP prototypes. */ void _tx_thread_smp_current_state_set(ULONG new_state); UINT _tx_thread_smp_find_next_priority(UINT priority); void _tx_thread_smp_high_level_initialize(void); void _tx_thread_smp_rebalance_execute_list(UINT core_index); /* Define all internal ThreadX SMP low-level assembly routines. */ VOID _tx_thread_smp_core_wait(void); void _tx_thread_smp_initialize_wait(void); void _tx_thread_smp_low_level_initialize(UINT number_of_cores); void _tx_thread_smp_core_preempt(UINT core); /* Define the ThreadX SMP scheduling and mapping data structures. */ THREAD_DECLARE TX_THREAD * _tx_thread_smp_schedule_list[TX_THREAD_SMP_MAX_CORES]; THREAD_DECLARE ULONG _tx_thread_smp_reschedule_pending; THREAD_DECLARE TX_THREAD_SMP_PROTECT _tx_thread_smp_protection; THREAD_DECLARE volatile ULONG _tx_thread_smp_release_cores_flag; THREAD_DECLARE ULONG _tx_thread_smp_system_error; THREAD_DECLARE ULONG _tx_thread_smp_inter_core_interrupts[TX_THREAD_SMP_MAX_CORES]; System stack pointer, current and next execution thread pointers added for additional cores THREAD_DECLARE VOID * _tx_thread_system_stack_ptr; /*to*/ THREAD_DECLARE VOID * _tx_thread_system_stack_ptr[TX_THREAD_SMP_MAX_CORES]; etc current_ptr, execute_ptr Added an option to remap the system state struct and current thread pointer to be function calls instead. ThreadX vs. ThreadX SMP in a multicore environmentTheir advice is to use the standard, single core ThreadX on as many of the cores necessary if it is reasonable for the application to distribute the processing load. This mode of use is typically called Asymmetric Multiprocessing (AMP). If automatic load balancing is required because of the dynamic nature of the application, then ThreadX SMP (Symmetric Multiprocessing) is better. ThreadX SMP has more overhead and is more complicated in general. SMP processing also requires that all processors share the same cache coherent memory space and there are suitable inter-core locks and interrupts available. As for how to communicate with the cores in AMP, most customers write their own shared memory mechanism for inter-core communication. We have also done integration with OpenAMP, but certification is an issue. We are not sure how to do this since a majority of OpenAMP is open source and not under our control. There might also be coding issues and testing issues that would make it very difficult to certify. As for ThreadX SMP, it isn't available on the TriCore. It would likely require NRE and considerable schedule as well. Again, if your application can load the processors, then we would recommend sticking with standard ThreadX anyway. What's a BSP anyway?https://www.windriver.com/products/bsp_web/what_is_a_bsp.pdf The above is a nice write-up, explaining how this is a very imprecise term.
TimingJG: A little bit of C code that looks quite deterministic probably makes calls to the black hole that is the runtime library, which is generally uncharacterized (in the time domain) by the vendor. Does that call take a microsecond or a week? No one knows. |