immap is the memory map struct given by the processor vendor.

To software, accesses to memory and registers are transparent, meaning it doesn't know the difference.

/dev/mem

  Provides access to the computer's physical memory.

/dev/kmem

  Provides access to the virtual address space of the operating system kernel,

excluding memory that is associated with an I/O device. /dev/allkmem

  Provides access to the virtual address space of the operating system kernel,

including memory that is associated with an I/O device.

Freescale's memtool for physical memory accesses: https://gist.github.com/mike0/2910170
May need su privilege to run it

It seems like mmap is for user space applications, while ioremap is for kernel modules...??... http://stackoverflow.com/questions/10928978/mmap-slower-than-ioremap

mmap and ioremap

mmap is a great tool - as far as drivers are concerned, memory mapping can be implemented to provide user programs with direct access to device memory. Mapping a device means associating a range of (virtual) user-space addresses to device memory. Whenever the program reads or writes in the assigned address range, it is actually accessing the device. Not every device lends itself to the mmap abstraction; like serial ports and other stream-oriented devices. Another limitation of mmap is that mapping is PAGE_SIZE grained. The kernel can manage virtual addresses only at the level of page tables; therefore, the mapped area must be a multiple of PAGE_SIZE and must live in physical memory starting at an address that is a multiple of PAGE_SIZE.

When a user-space process calls mmap to map device memory into its address space, the system responds by creating a new virtual memory area to represent that mapping. A driver that supports mmap (and, thus, that implements the mmap method) needs to help that process by completing the initialization of that VMA. The virtual memory area (VMA) is the kernel data structure used to manage distinct regions of a process’s address space. A VMA represents a homogeneous region in the virtual memory of a process: a contiguous range of virtual addresses that have the same permission flags and are backed up by the same object.

For example, every character driver needs to define a function that reads from the device. The file_operations structure holds the address of the module's function that performs that operation. Here is what the definition looks like for kernel 2.4.2:

    struct file_operations {
       struct module *owner;
       loff_t (*llseek) (struct file *, loff_t, int);
       ssize_t (*read) (struct file *, char *, size_t, loff_t *);
       ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
       unsigned int (*poll) (struct file *, struct poll_table_struct *);
       int (*r-eaddir) (struct file *, void *, filldir_t);
       int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
       int (*mmap) (struct file *, struct vm_area_struct *);
       int (*open) (struct inode *, struct file *);
       int (*flush) (struct file *);
       int (*release) (struct inode *, struct file *);
       int (*fsync) (struct file *, struct dentry *, int datasync);
       int (*fasync) (int, struct file *, int);
       int (*lock) (struct file *, int, struct file_lock *);
       ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
       ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);

What are alternatives to mmap? read/write? is this true? In short, mmap() is great if you're doing a large amount of IO in terms of total bytes transferred; this is because it reduces the number of copies needed, and can significantly reduce the number of kernel entries needed for reading cached data. However mmap() requires a minimum of two trips into the kernel (three if you clean up the mapping when you're done!) and does some complex internal kernel accounting, and so the fixed overhead can be high. read() on the other hand involves an extra memory-to-memory copy, and can thus be inefficient for large I/O operations, but is simple, and so the fixed overhead is relatively low. In short, use mmap() for large bulk I/O, and read() or pread() for one-off, small I/Os

Accessing I/O Memory

The main mechanism used to communicate with devices is through memory-mapped registers and device memory; both are called I/O memory. I/O memory is simply a region of RAM-like locations that the device makes available to the processor over the bus. This memory can be used for a number of purposes, such as holding video data or Ethernet packets, as well as implementing device registers that behave just like I/O ports. Once equipped with ioremap (and iounmap), a device driver can access any I/O memory address, whether or not it is directly mapped to virtual address space.

#include <asm/io.h>
void *ioremap(unsigned long phys_addr, unsigned long size);
void *ioremap_nocache(unsigned long phys_addr, unsigned long size);
void iounmap(void * addr);

The addresses returned from ioremap should not be dereferenced directly; instead, accessor functions provided by the kernel should be used. The proper way of getting at I/O memory is via a set of functions (defined via <sys/io.h>, although this is not found for ARM) provided for that purpose.

To read from I/O memory, use one of the following:

unsigned int ioread8(void *addr);
unsigned int ioread16(void *addr);
unsigned int ioread32(void *addr);

Here, addr should be an address obtained from ioremap (perhaps with an integer offset); the return value is what was read from the given I/O memory.

Never use the code/headers in /usr/include/asm. Use the headers in /usr/include/sys instead. What you are doing by using /usr/include/asm/ is building your code against a specific revision of the Kernel headers. This is subject to breakage when the kernel headers change. By linking to the other location, you will link to a more stable form of the headers in glibc, which will refer to the kernel headers as needed.

What about proc?

cat /proc/meminfo will give memory use stats/counts
cat /proc/iomem will dump a platform memory map, but it looks not to be comprehensive. I think it's supposed to show how physical memory is pre-allocated. When you use the reserved-memory node, the results are reflected in this list.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt

Page Table

A page table is the mechanism for translating virtual addresses into physical addresses. It's a tree-structure array containing V to P mappings and flags.

http://www.makelinux.net/ldd3/chp-15-sect-1

Reserved Memory

You can describe and reserve specific portions of physical memory using the reserved-memory node described here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt

An example use is that you can set aside some shared memory with an FPGA, declare it here, and then pass the phy address to the driver on init to use as base address.

MMUs and MPUs

Software uses "logical" addresses and hardware uses "physical" addresses. With no MMU, these have the same value. An MMU provides translation or mapping between the two. An MMU translation can be disabled, for example if paging is turned off in some processors. In that case it's basically a pass-through to physical memory. The program counter register is no exception and also goes through the MMU - the MMU sits on the address lines between the CPU and memory.

MMU implementation in HW much more complex than that of MPU. That’s why many computer systems (such as real time embedded systems) which don’t need Virtual Memory but need memory protection have much simpler MPU instead of full blown MMU. MPU monitors transactions, including instruction fetches and data accesses from the processor, which can trigger a fault exception when an access violation is detected. The main purpose of memory protection is to prevent a process from accessing memory that has not been allocated to it.

A common use of an MMU is to implement an operating system using process model – like Linux. In this case, each task has one or more dedicated areas of memory for its code and data. When a task is made current by the scheduler, the MMU maps these physical addresses onto a logical address area starting from 0. All the physical memory belonging to other tasks [processes] and to the OS itself is hidden from view and, thus, protected. Each process behaves as if it has free use of the entire CPU. Although this mechanism is safe and elegant, it has the drawback that there is an overhead – the MMU remapping – on every context switch.

Typically for RTOSs, MMUs have not been used, but it seems with many ARM chips having MMUs and running RTOS this isn't quite so true anymore.

Note that just because you have an address that is above the available RAM memory space, does not mean the MMU must be enabled to access it. It depends on the system. For example, you may have an FPGA IP block that is accessed through the AXI bus at address 0x8000.0000 even though there is only 0x4000.0000 of physical RAM. The IP block may translate down to the right place.

Translation Table

For some ARM cores, this is a file called translation_table.S that is generated for the BSP. It contains MMU init information in the form of entries for each segment of a certain size of the 4GB addressable memory space. These entries are grouped in regions, like DDR, flash, FPGA blocks, memory mapped devices, peripheral registers, etc. Each entry might cover a 1MB section of the memory for example, so for 4GB you'd have 4096 entries. An entry consists of control bits that define how the region is used with attributes like bufferable, cacheable, cache mode, permissions, section base address, etc.

The table is loaded as a section in the ELF file with a header like .mmu_tbl. You can use objdump to verify it has the right size. During the boot sequence before a program starts, the base address of the table is loaded into a register so the MMU can use it and then the MMU is enabled.

You can use this table to set aside a region of memory as invalid and cause a fault if it is every accessed, for example.

ARM Assembly Memory Test Loop

MemTest:
	movw r0, #(0x0000)
	movt r0, #(0x801F)
	movw r1, #(0x9876)
	movt r1, #(0xBEEF)
MemLoop:
	str r1, [r0]
	ldr r2, [r0]
	sub r2, r1, r2
	cmp r2, #0
	bne MemTestFail
	add r0, r0, #4
	movw r2, #(0x0008)
	movt r2, #(0x801F)
	sub r2, r0, r2
	cmp r2, #0
	beq L2Work
	b MemLoop

MemTestFail:

L2Work: