Part 3 of this 4-part series analyzes methods and tools involved in debugging software at different layers of the software stack.
Software debugging involves identifying and resolving issues ranging from functional misbehaviors to crashes. The essential requirement for validating software programs is the ability to monitor code execution on the underlying processor(s).
Software debugging practices and tools vary significantly depending on the layer of the software stack being addressed. As we move up the stack from bare-metal software to operating systems and finally to applications, three key factors undergo significant changes:
- Lines of Code (LOC) per Task: The number of lines of code per task increases substantially as we move up the stack.
- Computing Power (MIPS) Requirements: The computing power needed to execute software within a feasible timeframe for debugging grows exponentially.
- Hardware Dependency: The dependency on underlying hardware decreases as we ascend the software stack. Bare-metal software is highly hardware-dependent, while applications are typically hardware-independent.
Additionally, the skills required of software developers vary considerably depending on the specific software layer they are working on. Lower-level software development often necessitates a deep understanding of hardware interactions, making it well-suited for firmware developers. In contrast, operating system (OS) development demands the expertise of seasoned software engineers who should collaborate closely with the hardware design team to ensure seamless integration. At the application software layer, the focus shifts toward logic, user experience, and interface design, requiring developers to prioritize user interaction and intuitive functionality.
Table I below summarizes these comparisons, highlighting the differences in software debugging requirements across various layers of the software stack.
Effective software debugging is a multidimensional challenge influenced by a variety of factors. The scale of the software program, the computational resources available for validation, and the specific hardware dependencies all play critical roles in determining the optimal tools and methodologies for the task.
Software Debug at the Bottom of the Software Stack
The bare-metal software layer sits between the hardware and the operating system, allowing direct interaction with the hardware without any operating system intervention. This layer is crucial for systems that demand high performance, low latency, or have specific hardware constraints.
Typically, the bare-metal layer includes the following components:
- Bootloader: Responsible for initializing the hardware and setting up the system to ensure that all components are ready for operation.
- Hardware Abstraction Layer (HAL): A comprehensive set of APIs that allow the software to interact with hardware components. This layer enables the software to work with the hardware without needing to manage low-level details, providing a simplified and consistent interface.
- Device Drivers: These software components initialize, configure, and manage communication between software and hardware peripherals, ensuring seamless interaction between different system parts.
Prerequisites to Perform Software Validation at the Bottom of the Software Stack
When validating software at the lower levels of the software stack, two key prerequisites must be considered.
First, processing software code that goes beyond simple routines requires a substantial number of clock cycles, often numbering in the millions. This can be efficiently handled by virtual prototypes or hardware-assisted platforms, such as emulators or FPGA prototypes.
Second, the close interdependence of hardware and software at this level necessitates a detailed hardware description, typically provided by RTL. This is where hardware-assisted platforms excel. However, for designs modeled at a higher level than RTL, virtual prototypes can still be effective, provided the design is represented accurately at the register level.
Processor Trace for Bare-Metal Software Validation
Processor trace is a widely used method for software debugging that involves capturing the activity of a CPU or multiple CPUs non-intrusively. This includes monitoring memory accesses and data transfers with peripheral registers and sending the captured activity to external storage for analysis, either in real-time or offline, after reconstructing it into a human-readable form.
In essence, processor trace tracks the detailed history of program execution, providing cycle counts for performance analysis and global timestamps for correlating program execution across multiple processors. This capability is essential for debugging software coherency problems. Processor trace offers several advantages over traditional debugging methods like JTAG, including minimal impact on system performance and enhanced scalability.
However, processor trace also presents some challenges, such as accessing DUT (Device Under Test) internal data, storing large amounts of captured data, and the complexity and time-consuming nature of analyzing that data.
DUT Data Retrieval to External Storage
Retrieving DUT internal data in a hardware-assisted platform can be achieved through an interface consisting of a fabric of DPI-based transactors. This mechanism is relatively simple, it does not add overhead and marginally impacts execution speed. The state of any register and net can be monitored and saved to external storage. As the design grows larger and the run-time extends, exponentially more data gets retrieved.
Despite efforts to standardize the format of the collected data, there is currently no universal format, which poses a challenge to performing analysis. However, we must also acknowledge that DUT architectures like x86, RISC-V, ARM are fundamentally different to ever allow standardization.
In summary, even with the challenges, processor trace has been in use for many years and is broadly adopted by most modern processors from major vendors such as Arm, RISC-V, and others. With ARM, since it’s a single vendor – standardization has been easier to come by. On the other hand, RISC-V is open source and multi-vendor.
Arm TARMAC & CoreSight
Arm TARMAC and CoreSight are complementary Arm technologies for debugging and performance analysis.
TARMAC is a post-execution analysis tool capturing detailed instruction traces for in-depth investigations. It records every executed instruction, including register writes, memory reads, interrupts, and exceptions in a textual format. It generates reports and summaries based on the trace data, such as per-function profiling and call trees. This allows developers to replay and analyze the sequence of events that occurred during program execution.
CoreSight is an on-chip solution providing real-time visibility into system behavior without halting execution. It provides real-time access to the processor’s state, including registers, memory, and peripherals, without stopping the CPU. Table II compares Arm TARMAC vs CoreSight.
In essence, CoreSight is the hardware backbone that enables the generation of trace data, while Arm Tarmac is the software tool that makes sense of that data.
RISC-V E-Trace
E-Trace is a high-compression tracing standard for RISC-V processors. By focusing on branch points rather than every instruction, it significantly reduces data volume, enabling multi-core tracing and larger trace buffers. This is especially beneficial to trace multiple cores simultaneously and store larger trace histories within fixed-size buffers. E-Trace is useful for debugging custom RISC-V cores with multiple extensions and instructions, ensuring that all customizations work correctly. It also supports performance profiling and code coverage analysis.
Synopsys Verdi Hardware/Software Debug
Verdi HW/SW Debug provides a unified view of hardware and software interactions. By synchronizing software elements (C code, assembly, variables, registers) with hardware aspects (waveforms, RTL, assertions), it enables seamless navigation between the two domains. This integrated approach facilitates efficient debugging by correlating software execution with hardware behavior, allowing users to step through code and waveforms simultaneously and pinpoint issues accurately. See Figure 1.
Synopsys ZeBu® Post-Run Debug (zPRD)
ZeBu Post-Run Debug (zPRD) is a comprehensive debugging platform that supports efficient and repeatable analysis. By decoupling the debug session from the original test environment, zPRD accelerates troubleshooting by allowing users to deterministically recreate any system state. It simplifies the debugging process by providing a centralized control center for common debugging tasks like signal forcing, memory access, and waveform generation. Leveraging PC resources, zPRD optimizes waveform creation for faster analysis.
Moving up of the Software Stack: OS Debug
Operating systems consist of a multitude of software programs, libraries, and utilities. While some components are larger than others, collectively they demand billions of execution cycles, with hardware dependencies playing a crucial role.
For debugging an operating system when hardware dependencies are critical, the processor trace method is still helpful. However, this approach, while effective, becomes more complex and time-consuming when dealing with the largest components of an OS.
GNU Debugger
Among the most popular C/C++ software debugging tools in the UNIX environment is GDB (GNU Debugger). GNU is a powerful command-line tool used to inspect and troubleshoot software programs as they execute. It’s invaluable for developers to identify and fix bugs, understand program behavior, and optimize performance.
The GNU key features Include:
- Setting breakpoints: Pause program execution at specific points to inspect variables and program state.
- Stepping through code: Execute code line by line to understand program flow.
- Examining variables: Inspect the values of variables at any point during execution.
- Backtracing: Examine the function call stack to understand how the program reached a particular point.
- Modifying variables: Change the values of variables on the fly to test different scenarios.
- Core dump analysis: Analyze core dumps to determine the cause of program crashes.
- Remote Debugging: GDB can debug programs running on a different machine than the one it is running on, which is useful for debugging embedded systems or programs running on remote servers.
GDB can be employed to debug a wide range of issues in various programming languages. Among common use cases are:
- Segmentation faults: These occur when a program tries to access memory it doesn’t own. GDB can help pinpoint the exact location where this happens.
- Infinite loops: GDB can help you identify code sections that are looping endlessly.
- Logical errors: By stepping through code line by line, you can examine variable values and program flow to find incorrect logic.
- Memory leaks: While GDB doesn’t have direct tools for memory leak detection, it can help you analyze memory usage patterns.
- Core dumps: When a program crashes unexpectedly, a core dump is generated. GDB can analyze this dump to determine the cause of the crash.
- Performance bottlenecks: By profiling your code with GDB, you can identify sections that are consuming excessive resources.
- Debugging multi-threaded programs: GDB supports debugging multi-threaded applications, allowing you to examine the state of each thread.
GDB is an effective debugging tool for software developers, especially those working with low-level or performance-critical code.
At the Top of the Software Stack: Application Software Debug
Application software spans a wide range of complexity and execution time. Some applications may execute within few millions of cycles, other all the way to billions of cycles. All demand efficient development environments. Virtual prototypes offer a near-silicon execution speed, making them ideal for pre-silicon software development.
A diverse array of debuggers serves different application needs, operating systems, programming languages, and development environments. Popular options include GDB, Google Chrome DevTools, LLDB, Microsoft Visual Studio Debugger, and Valgrind.
To further streamline development, the industry has adopted Integrated Development Environments (IDEs), which provide a comprehensive platform for coding, debugging, and other development tasks.
IDEs: Software Debugger’s Best Friend
An Integrated Development Environment (IDE) is a software application that streamlines software development by combining essential tools into a unified interface. These tools typically include a code editor, compiler, debugger, and often additional features like code completion and version control integration. By consolidating these functionalities, IDEs enhance developer productivity, reduce errors, and simplify project management. Available as both open-source and commercial products, IDEs can be standalone applications or part of larger software suites.
Further Software Debugging Methodology and Processes
Error prevention and detection are integral to software development. While debugging tools are essential, they complement a broader range of strategies and processes aimed at producing error-free code.
Development methodologies such as Agile, Waterfall, Rapid Application Development, and DevOps offer different approaches to project management, each with its own emphasis on quality control.
Specific practices like unit testing, code reviews, and pair programming are effective in identifying and preventing errors. Unit testing isolates code components for verification. Code reviews leverage peer expertise to catch oversights. Pair programming fosters real-time collaboration and knowledge sharing.
By combining these strategies with debugging tools, developers can significantly enhance software quality and reliability.
Conclusion
Debugging is an integral part of the software development process that spans the entire software stack, from low-level firmware to high-level application software. Each layer presents unique challenges and requires specialized tools and techniques.
In low-level debugging, understanding hardware interactions and system calls is crucial. Tools like processor trace help developers trace issues at this foundational level. This is where users tend to be comfortable with register models, address maps, memory maps etc. Moving up the stack, debugging becomes more abstract, involving memory management, API calls, and user interactions. Here, debuggers like GDB and integrated development environments (IDEs) with built-in debugging tools prove invaluable. The user in this space is more comfortable with the APIs provided by the OS or the application. They are dependent on hardware or firmware engineers to identify issues in the lower levels of the stack.
During the pre-silicon phase all software debugging tools all rely on the ability to execute the software on a fast execution target being a virtual prototype, emulation or FPGA-based prototyping. Beside the performance of the underlying pre-silicon target, the flexibility and ease of use to extract different types of debug data for the different software stack levels drive debug productivity. With more and more workloads moving to emulation and prototyping platforms, the user community is placing an even bigger ask to help debug their environments and system issues. However, there is this delicate balance between debuggability and performance of such a platform. There is an inverse relationship between debuggability and performance.
Looking forward, the evolution of debugging tools and methodologies is expected to embrace machine learning and AI to predict potential bugs and offer solutions, thereby transforming the landscape of software debugging.