Author: Hüseyin KUTLUCA, Principal SW Architect
Defensive programming is one of the main topics in software engineering that differentiates school project style coding from real life robust software development. It is all about writing code to detect, isolate and if possible, recover from the failures. Defensive programming practices are essential for embedded systems as high availability, safety and security is critical quality factor for embedded systems. Therefore, you should decide on a general approach to detect, isolate and recover from failures.
Failures mostly arise from the problems listed below:
- Bugs generated by developers themselves
- Data from external sources
- Hardware malfunctions
- Errors in protocol or standard documents etc.
Asserts, exceptions and error handling code are main approaches for detection of failures. Here I write down some best practices;
- Use assertions to document pre-conditions and post-conditions
- Use assertions for conditions that should never occur
- Use error handling code for conditions you expect to occur
Establish “barricades” or “safe zones” or “trust boundaries” – everything outside of the “boundary” is dangerous, everything inside of the boundary is safe. In the barricade code, validate all input data: check all input parameters for the correct type, length, and range of values.
You may use time or space partitioning techniques for isolating failures to certain modules of your system.
Error recovery is another important aspect of defensive programming. You should decide what to do in case of failures:
- Substitute the next piece of valid data. If the data is corrupted in a file just pass through corrupted data and return next valid data.
- Return the same answer as the previous time.
- Substitute the closest legal value. Suppose that pressure value between 0-50 valid in your system. If you are reading negative value return 0 value is bigger than 50 then return 50.
In embedded systems, you should make a choice between “correctness” and “robustness”.
- Correctness means never returning an inaccurate result; even a no result is better than an inaccurate result.
- Robustness means always trying to do something that will allow the software to keep operating, even if that leads to results that are sometimes inaccurate.
Safety critical applications tend to favor correctness to robustness whereas consumer applications tend to favor robustness to correctness.
While designing embedded systems that integrate sensors, actuators or subsystems, you should not assume that interface definition document is correct. Probably,
- It is not updated for some changes in message definitions,
- Equipment firmware version and document submitted to you are not consistent,
- There are extra messages which are not documented as you do not need them,
- You misunderstand and miscode some of the pre/post conditions described in the document.
This may be the case for standard documents as well. I have encountered multiple times that some equipment vendors are misinterpreting the standard and their implementation is not compatible with other implementations and the standard itself.
Protocol definition standards that are supposed to enable interoperability between different vendors might be confusing too. Thus, many times, interoperability problems still occur in different implementations of a protocol. Companies perform interoperability workshop to reduce such risks but most of the time they do not execute extensive tests with many different scenarios.
If you are implementing such standards, you should always use defensive programming techniques. Your protocol implementation should be robust in terms of messages that are compatible with the document or the messages that are not expected in the current state.
Best practice is to log a warning message and ignore the received message. If you do not log a warning/error then the user of the protocol library will just complain that your implementation is incorrect and you will insist that you have tested your module and it is working right.
Since interoperability is one of the main enablers for IoT (Internet of Things) and Industrial IoT, standards that will enable integrated systems should not just contain happy path sequences. Standard documents should describe exceptional cases and the proper behavior in these cases. They should include robust solutions for integrating different versions of the same standard as well.
- Code Complete: A Practical Handbook of Software Construction, Second Edition 2nd Edition, Steve McConnell (ISBN: 079014519670)5
- Image via [“https://tr.wikipedia.org/wiki/Zilkale] (copyright-free)