Thoughts from Jack Ganssle and friends.

"After over 40 years in this field I've learned that "shortcuts make for long delays" (an aphorism attributed to J.R.R Tolkien). The data is stark: doing software right means fewer bugs and earlier deliveries. Adopt best practices and your code will be better and cheaper. This is the entire thesis of the quality movement, which revolutionized manufacturing but has somehow largely missed software engineering. Studies have even shown that safety-critical code need be no more expensive than the usual stuff if the right processes are followed." -JG

"When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind." -Lord Kelvin

Coding

I was going over some code recently and found a password stored in malloc-ed memory. That was quickly freed, so the password was deleted. But, of course, it wasn't. The password was still in the heap, potentially visible if those locations were ever returned to fulfill an allocation request. The code's author should have scrubbed those locations first, and then freed the memory.

Michael Pollan's advice about eating is concise: "Eat food. Not too much. Mostly plants." In a similar vein, I'd suggest about C code: "Use parentheses. More than you'd think. Clarify expressions." The company's firmware standard should mandate that all expressions using operators be enclosed in parentheses. That is:
if(x && 3 == y)....
...should be illegal. Better:
if((x && 3) == y)...
And do use tools that will capture these sorts of mistakes. Lint, for instance, will throw a warning when it sees a construct like:
if(rap & dissonance == 0)return 0;
Lint warns that the result of the expression is not only something the developer probably doesn't want, but that the expression always resolves to false.

At 100K LOC it's almost impossible, or at least extremely expensive, to get better than 97% test coverage. The one shining exception seems to be projects done to the DO-178C avionics standard which, for the most critical systems, requires complete testing traceability. The cost? By some reports, about 65% more than code not done to that standard. The upside? Code done to that standard has never caused a plane to crash.

These days I like to use variables rather than #defines to get a bit of extra type checking. I also run my code on both Windows and the target - this gives more chances to find obscure bugs (e.g. Visual Studio finds stack corruption "for free"; it can be difficult on target). So I wrote the following:

const int size = 5;

char buffer[size];

Visual Studio worked as I expected - it generated a fixed-sized array on the stack. The Texas Instruments ARM compiler wasn't smart enough to detect that "size" is constant, so instead generated a variable-length array which is allowed in C99 (no idea why!) and then decided to place it on the heap (the location of VLAs is not defined by C99). My application doesn't have a heap, so the code crashed. VLAs are prohibited by MISRA (rule 18.8) but my MISRA checker (Gimpel's PC-Lint) agreed with Visual Studio that the array is fixed-sized and didn't complain. Moral: stick to #defines for array sizes.

Tips

"Regarding bare-metal : a simple task scheduler controlling state machines, all tasks required to run to completion, all errors considered fatal, is still my "go to" approach for hard real time control." - contributor

"Assume you are writing comments for an intern because that may very well represent the skill level of the next person to work on your code, or your skill level for this piece of code a year from now." - contributor

"I am 72, retired and wrote code in the 70's and 80's because I had to. There was no one else to do it. And I have written self documenting code - and lived to understand how incredibly stupid it is. As a result of years of software / firmware and hardware design and maintenance, I now know self documenting code is self documenting. It is documenting who needs to be told it is company policy, as I believe it is everywhere that counts, or they will be replaced. Their code should be randomly examined and if they don't comply they should be replaced before they can do more harm." - contributor

New Year's software resolutions

(found on a napkin at some watering hole) in 1996:

I will learn to touch type
I will never, ever write my own OS ever again
I will stop reinventing the wheel and use libraries
I will swear off complex pointers
I will stop betting that "I can do that in one line of C"
I will learn to use in-line assembly sparingly
I will learn to use a real debugger instead of printf's
I will stop saying "It's gotta be the hardware"
I will restrict version numbers to 5 decimal digits
I will go to "Beta" with something that actually works
I will stop saying "We can always go back and optimize"
I will think first before committing a "quick fix"
I will stop patching the unfixable
I will remember that small is beautiful and restrict my functions to 1 or 2 screens
I will stop turning everything into an object
I will really learn C++
I will stop trying to beat the source control system
I will design first, then code

Development Process

Faster... Better... Cheaper... Pick any two.

Yes we document our code, but an even better approach is to "code the document" as in the SWDD.

Work Estimating

According to Capers Jones, a very rough guide to estimating the number of people needed on a project, and the project's duration, is:

    Number of developers = (function points)/150
    Calendar months = (function points)^0.4

One function point is somewhere around 130 lines of C code.

Firmware is hideously expensive. Most commercial firmware costs around $20 to $40 per line, measured from the start of a project till it's shipped. When developers tell you they can "code that puppy over the weekend" be very afraid. When they estimate $5/line, they're on drugs or not thinking clearly. Defense work with its attendant reams of documentation might run upwards of $100 per line or more; the space shuttle code was closer to $1000 per line, but is without a doubt the best code ever written. (2023 dollars)

Auditing Firmware Teams

From time to time companies have me come in to examine their firmware engineering practices. That involves poking around, interviewing team members, reviewing documents, and conferences with managers. In case you want to audit your own team, here are my most common findings, in no particular order:

Developers are unaware of the company's software development lifecycle, if there is one.
For firmware developed to an IEC/ANSI standard, developers aren't familiar with the standard.
Non-conformance with the company's own firmware standards.
Or, a complete lack of such standards.
Manually checking the code against the standard rather than using automated tools.
Inadequate testing.
Test is the only procedure used to identify errors.
No, or inadequate, code inspections. The good news: Teams are generally getting better at inspections.
Few metrics generated. Often those that are go to /dev/null.
Optimistic programming: The default assumption is that everything will be peachy, despite bitter experience.
Weak managers and/or team leads who don't enforce the rules.
Unrealistic schedules.
Lousy tools. Most common: clumsy build processes.
Poor elicitation of requirements. I can't stress this enough. While getting to 100% is tough to impossible, too many teams practically abdicate their responsibility to do a good job at this. The following chart shows what typically happens. LOC is the size of the program in lines of code, the second column lists typical number of pages of the requirements document, and the last shows the document's completeness:

When it comes to software, I view the data as an impressionist painting, where the outlines might be fuzzy, but one can still make out the general shape of things.

LOC Requirements (in pages) Requirements Completeness

LOC	Requirements (in pages)	Requirements Completeness
1000	14	97%
10,000	115	95%
100,000	750	80%
1,000,000	6,000	60%

Adapted from The Economics of Software Quality, Capers Jones

Here's another take on this. It's adapted from Joel Spolsky's software team quality test:

Do you use source control?
Can you make a build in one step?
Do you make daily builds?
Do you have a bug database?
Do you fix bugs before writing new code?
Do you have an up-to-date schedule?
Do you have a spec?
Do programmers have quiet/private working conditions?
Do you use the best tools money can buy?
Do you have testers?
Do new candidates demonstrate code/documentation understanding during their interview?
Do you do hallway usability testing?

Cost of Good Code

Finally, did you know great code, the really good stuff, that which has the highest reliability, costs the same as cruddy software? This goes against common sense. Of course, all things being equal, highly safety critical code is much more expensive that consumer-quality junk.

But what if we don't hold all things equal? O. Benediktsson (Safety Critical Software and Development Productivity, conference proceedings, Second World Conference on Software Quality, Sept 2000) showed that using higher and higher levels of disciplined software process lets one build higher-rel software at a constant cost. If your projects march from low reliability along an upwards line to truly safety-critical code, and if your outfit follows increasing levels software discipline, the cost remains constant.

Capers Jones showed that the best people excel on small (one man-month) projects, typically being 6 times more productive than the worst members of the team. That advantage diminishes as the system grows. On an 8 man-month effort the ratio shrinks to under 3 to 1. At 64 man-months it's about 1.5 to 1, and much beyond that the best do as badly as the worst. Or the worst as well as the best. Whatever.

Rants

One of my top ten reasons software projects fail is when teams don't resist the urge to start coding. Coding is just one part of the field of software engineering, one that's similar to paving a bridge deck. Without the pavement none of us will drive, but bridge construction requires careful engineering, zoning, funding and a host of other activities far more complex than paving, and far more important to the final result. Bad pavement can be replaced or patched, but a badly designed bridge will collapse. Bad software engineering insures a project will fail no matter how well the coders - the programmers - do their job.

Engineering is the art of solving problems, which is what building firmware is all about.

"Since we started letting the developers use the [chaos, Capability Maturity Model, Personal Software Process, Test-Driven Development] process they have been much happier and more productive." Really? I can't tell you how many managers have told me some version of this story. It's almost always complete nonsense.

Reliability

"On the same die, does not fly": Among other things, this meant we couldn't rely on or even use the hardware watchdog timer built into the die of the microprocessor, primarily because they were WAY too easy to defeat. Indeed, it had to be external and a totally independent entity (also rad hardened, etc.). Indeed, I use to demonstrate just how easy it was to defeat an internal WDT by exposing an 8751 (EPROM) micro to the overhead fluorescent lighting. With the WDT enabled, the overhead lights would cause the micro to freeze-up within seconds of removing the opaque erase window cover.

"The embedded system needs to serve a specific purpose and be stable. It does not need to interoperate with every digital device in the world!" - Michael Covington

"Testing has ever been a thorny problem in this industry. The agile community uses test to get their code right; my philosophy is to get the code right and then use test to prove correctness." - JG

Safety-Critical SW

Software doesn't run in isolation. It's merely a component of a system. Watchdogs are not "software safeties." They're system safeties, designed to bring the product back to life in the event of any transient event that corrupts operation, like cosmic rays. Xilinix, Intel, Altera and many others have studied these high energy particles and have concluded that our systems are subject to random single event upsets (SEUs) due to these intruders from outer space.

Chips

"The hardware engineer I work with asks me for basic Requirements (number of gpio pins needed, needed interfaces I2C, UART,..). He then goes off and decides which chip to use based on price, package,.. This sometimes means a different CPU from a different vendor with a different architecture for each project. (with very exotic requirements and high volume projects this is the only way to go)." - contributor

Top 10 Reasons for Project Failures

http://www.ganssle.com/tem/tem381.html

Not enough resources for firmware/software team
Coding before designing
Misuse of C
Misunderstanding the science
Poorly defined process
Vague requirements
Weak management
Inadequate testing
Writing optimistic (rather than defensive) code
Unrealistic schedules

Hardware vs Software

John Sloan
As a software engineer, the projects I've worked on over the past couple of decades have been based on RTOSes like VxWorks and more recently Linux. True, a lot of the software developers on those projects didn't have to know that much about the hardware. But in my experience that has to be at least one software engineer that surprises the hardware team by insisting on getting a copy of the hardware schematics and bill of materials so they can look up most of the data sheets. And that person will inevitably end up not only writing most of the device drivers and other low level software, but will also be crucial in assisting in hardware/firmware/software integration, systems testing, and even sometimes debugging when the higher level developers get caught by some low level weirdness like a bug that sends you down to the assembly code to figure it out. Sure, those sorts of systems people are expensive in the short run. But in my experience they are highly cost effective in the long run. (Yeah, I'm typically that sort of systems person.) I like to say "I'm not a hardware person, but I have a hardware person on speed dial."

John Kougoulos
A friend of mine was referring to these people as "self-oscillating"; they don't need someone to tell them what to do all the time, they find ways to keep themselves busy by learning new things and they are more on the hands-on side of things. There is a certain percentage of people thinking and working this way. Can we have more in our field? Maybe, but it sounds difficult because it needs long term planning. If the companies pay for specialization, available immediately, how do you convince someone to grow his knowledge wide but deep enough instead of very deep (that probably pays more)? And when is it wide enough to hire him as a "systems person" and more importantly how do you interview them. Again this guy was telling me.. the most important interview question to hire one of these guys eg in the IT field, is to ask them if they have a car and know how a 4-stroke engine works. This would give you a clear hint that the guy was curious enough to learn how something works, despite his enthusiasm for digital things. But it does not sound like a legitimate interview question and it does not mean that he is ready yet.

Team Function Analysis

Here are Jack's questions he asks when consulting on team performance: http://www.ganssle.com/tem/tem445.html

The idea is to get a sense of team pressures, what it's like to work there, what documentation is produced, what development process is used, how good is product quality, how are issues tracked, etc.