Glossary

CCI = cache coherent interface
CCS =
DCD = device configuration data
DTR =
ITR = instruction transfer register, allows debugger to put instructions into the core when in debug state
IVT = image vector table

What is the ARM ABI?

Application Binary Interface is a suite of specifications and broken into several components, such as C++ ABI, ARM ELF standard, procedure call standard, etc. There is a 32-bit and 64-bit version. ARM publishes these specs in PDF format. The PCS for example defines how separately compiled and separately assembled routines can work together. It defines things like type byte sizes and natural alignments, use of memory and the stack, etc. Someone summed up ABI by saying "an ABI is a standard that defines a mapping between low-level concepts in high-level languages and the abilities of a specific hardware/OS platform's machine code."

Notes

armcc is ARM's commercial compiler, which is different from the GNU GCC based open source compilers (arm-none-eabi-gcc)

For the iMX6 Freescale SDK, the debug UART port selection is made in board/common/hardware_modules.c

Memory map in sdk/include/mx6sdl/soc_memory_map.h

Good write up on interrupts: http://www.cadence.com/Community/blogs/sd/archive/2011/07/22/arm-generic-interrupt-controller-architecture-howto.aspx

Looks like ARM uses PL0, PL1, PL2 to refer to Privilege Level. PL0 is considered user level, while PL1 is a system mode for IRQ/FIQ, supervisor, abort. Monitor mode is on PL1 of Secure State. Secured OS is on PL1 of Secure State. Secured apps run on PL0 of Secure State. Normal apps run on PL0 of Non-Secure State. Normal OS runs on PL1 of Non-Secure State.

Uboot has a series of start.S assembly code files that do preliminary set up of various key registers, like in arch/arm/cpu/armv7/start.S. The Freescale SDK has something similar. Among other things, these files set up the exception vector table. The ROM vector table at 0x0 is set to point to the RAM vector table put in by the startup code.

SCU Power Status register is unused by Freescale in the iMX6 design.

The exception vector table must be 32-byte aligned since the last 5 bits of the VBAR are reserved for the exception type. There is an ARM assembler directive .align that can handle that.

Image Vector Table (IVT) is the data structure that the ROM reads from the boot device supplying the program image containing the required data components to perform a successful boot. The IVT includes the program image entry point, a pointer to Device Configuration Data (DCD) and other pointers used by the ROM during the boot process.The ROM locates the IVT at a fixed address that is determined by the boot device connected to the Chip.

Uboot note: The difference between u-boot.bin and u-boot.imx is the IVT header. That means after u-boot.bin is build, we will also add a IVT header and put it in front of u-boot.bin. This IVT header is for our boot ROM to identify the u-boot's location & function etc...

Device Configuration Data (DCD) is configuration information contained in a Program Image as a big endian byte array of commands, external to the ROM, that the ROM interprets to configure various peripherals on the Chip.

Families

Most chips are SoCs built around an ARM core. The modern 2019 core is called the Cortex, with designations like Cortex-M. The ARM families are the M (microcontrollers), R (lockstep or real-time), and A (application) in increasing size.

The ARM7, ARM9, ARM11 are older chips.

The Cortex M3 series has an MPU rather than an MMU. (Is this true of all M series? Don't know, but likely.)

USB on iMX6

The PORTSC1 register CCS field will be set to 1, indicating device connect, for the on-board USB hub. Does not mean a device is actually connected.

Warm Start

For ARM, warm start may mean that pre-reset signals are asserted to peripherals to prep for reset by finishing transactions, and SW state may be saved before resetting. A warm start means that a lot of register settings are good on boot and don't need to be re-written.

ARM Core Registers

The core regs are set up in the SDK using an assembly file cortexa9.S which has a series of functions for setting and clearing bits in the registers. Those are called directly from the platform init code in many cases, or through other startup modules.

There's also a set of bytes near the front of the .bin called Device Configuration Data which configures the MMDC for the external memory. In the SDK, you can edit the DCD with the board/mx6sdl/nitrogen6s/dcd.c file.

Serial Port

The iMX family calls serials ports ttymxc instead of ttyS.

Toolchain

The -nm tool works on an ELF file and displays the addresses of the symbols. You can also get a lot of info from the -objdump tool, such whether the object files are stripped or not.

When doing a make menuconfig for an embedded target, make sure you specify the target arch with make ARCH=arm menuconfig or else you will be setting up a config for the local host.

iMX6 SPI

One issue I’ve found is that our spi slave requires the chipselect to be held low for the entire transfer. The SPI dev driver however, is releasing the chip select after each word. Ive set cs_change=0 in the SPI_IOC_TRANSFER struct however it seems to have no effect. Is this a bug in the driver or is there another way to force the CS to stay low for the entire length of the buffer being transferred?

This is a common question, cs_change is not used in our SPI driver. You will need to modify the driver to control the SS_CTLx bits in the ECSPIx_CONFIGREG register. The other option will be to use a GPIO to drive the CS.

The following posts in the community talk about similar issues:
https://community.nxp.com/message/626019?commentID=626019#comment-626019
https://community.nxp.com/thread/309866
https://community.nxp.com/message/824459

understanding DTB

Buildroot puts the .dtb in the output/images/ folder. The name is given in the menuconfig option of 'device tree source' under Kernel. This points to the .dts file in arch/arm/boot/dts in your Linux source tree. This is the primary device tree source, but it also includes various .dtsi files sometimes. These are just more source for the DTB. Typically the .dtsi has SoC-level information, while the .dts has board-level information. The device drivers are consumers of this information, reading needed attributes of the device.

You can see what your DTB configuration is on your target by looking in /sys/firmware/devicetree/base/

Great primer:
https://events.linuxfoundation.org/sites/events/files/slides/petazzoni-device-tree-dummies.pdf
The compatible string is used to bind a device with the driver.

Device trees now required (2016) for new ARM board support.

MMU

Enabling and disabling the MMU:
The MMU can be enabled and disabled by writing the M bit (bit[0]) of register 1 of the System Control coprocessor. On reset, this bit is cleared to 0, disabling the MMU. When the MMU is disabled, memory accesses are treated as follows:

All data accesses are treated as uncacheable and strongly ordered. Unexpected data cache hit behavior is IMPLEMENTATION DEFINED.
If a Harvard cache arrangement is used then all instruction accesses are cacheable, non-sharable, normal memory if the I bit (bit[12]) of CP15 register 1 is set (1), and non-cacheable, non-sharable normal memory if the I bit is clear (0). The other cache related memory attributes (for example, Write-Through cacheable, Write-Back cacheable) are IMPLEMENTATION DEFINED. If a unified cache is used, all instruction accesses are treated as non-shared, normal, non-cacheable.
All explicit accesses are strongly ordered. The value of the W bit (bit[3], write buffer enable) of CP15 register 1 is ignored.
No memory access permission checks are performed, and no aborts are generated by the MMU.
The physical address for every access is equal to its modified virtual address (this is known as a flat address mapping).
The FCSE PID (see Register 13: Process ID on page B4-52) Should Be Zero (SBZ) when the MMU is disabled. This is the reset value for the FCSE PID. If the MMU is to be disabled, the FCSE PID should be cleared. The behavior is UNPREDICTABLE if the FCSE is not cleared when the MMU is disabled.
Cache CP15 operations act on the target cache whether the MMU is enabled or not, and regardless of the values of the memory attributes. However, if the MMU is disabled, they use the architected flat mapping. CP15 TLB invalidate operations act on the target TLB whether the MMU is enabled or not.
Instruction and data prefetch operations work as normal.
Accesses to the TCMs work as normal if the TCM is enabled. Before the MMU is enabled all relevant CP15 registers must be programmed. This includes setting up suitable translation tables in memory. Prior to enabling the MMU, the instruction cache should be disabled and invalidated. The instruction cache can then be re-enabled at the same time as the MMU is enabled.

Note: Enabling or disabling the MMU effectively changes the virtual-to-physical address mapping (unless the translation tables are set up to implement a flat address mapping). Any virtually tagged caches, for example, that are enabled at the time need to be flushed (see Memory coherency and access issues on page B2-20). In addition, if the physical address of the code that enables or disables the MMU differs from its modified virtual address, instruction prefetching can cause complications (see PrefetchFlush CP15 register 7 on page B2-19). It is therefore strongly recommended that code which enables or disables the MMU has identical virtual and physical addresses.

Recall that you don't have to use the MMU. Typically there are some MMU init and enable functions. Not using it means every call is out to physical memory, so it will probably slow you down. But sometimes the MMU init code is a few marbles short and might need to be debugged at a later time.

For safety critical worst-case optimized systems, it seems that a write-back cache policy is preferred to a write-through policy for efficiency. A big reason that safety critical systems don't normally use multicore processors is because of the increase in worst-case execution times.

Write-through: write is done synchronously both to the cache and to the backing store.
Write-back (also called write-behind): initially, writing is done only to the cache. The write to the backing store is postponed until the modified content is about to be replaced by another cache block.

The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy of the line. So when a read is done, main memory can always reply with the requested data.

If write-back is used, sometimes the up-to-date data is in a processor cache, and sometimes it is in main memory. If the data is in a processor cache, then that processor must stop main memory from replying to the read request, because the main memory might have a stale copy of the data. This is more complicated than write-through.

Write allocate (also called fetch on write): data at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses.
No-write allocate (also called write-no-allocate or write around): data at the missed-write location is not loaded to cache, and is written directly to the backing store. In this approach, data is loaded into the cache on read misses only.

Cache Attributes

In the Arm architecture, the inner attributes are used to control the behavior of the L1 caches and write buffers. The outer attributes are exported to the L2 or an external memory system. These attributes are set in the MMU translation table (a .S file) that is defined for both bootloaders and applications.

ARM uses the terms clean and invalidate instead of flush.

Invalidation of a cache or cache line means to clear it of data. This is done by clearing the valid bit of one or more cache lines. The cache must always be invalidated after reset as its contents will be undefined.

Cleaning a cache or cache line means writing the contents of dirty cache lines out to main memory and clearing the dirty bit(s) in the cache line. This makes the contents of the cache line and main memory coherent with each other.

A dirty cache line is one that is most up to date but still needs to be written back to main memory. A cache line which is out-of-date and needs to be updated is typically referred to as stale.

Instructions

DSB = data synchronization barrier which stalls execution until all outstanding explicit memory accesses have completed ISB = instruction synchronization barrier flushes pipeline and pre-fetch buffers so that all following instructions are fetched from cache or memory; is required when synchronizing between data and instruction caches

Links

http://elinux.org/ARMCompilers
http://www.embedded.com/design/mcus-processors-and-socs/4026080/Building-Bare-Metal-ARM-Systems-with-GNU-Part-3
memtool: https://gist.github.com/mike0/2910170