The vibe coding environment is growing.
(Editor's Note: Vibe coding is used in software development to build apps or websites by describing what a developer wants wants in natural language and letting an AI assistant write, test and debug the code. Developers guide the process using "vibes" and big-picture goals without usually needing to manually write or read the code line-by-line.)
Over the last few years, we have all become aware of the impact that AI has had on the software industry, including indications of doom among college students being steered away from traditional software development roles and into more of a design role in software engineering. Indeed, just over the past year we have seen stories in mainstream and social media forecasting the doom of many software developers and the reduction of staff in companies that produce large and powerful software packages.
Higher level languages such as Python and JavaScript lend themselves well to the analysis and then synthesis of new code given the sheer amount of both prompts and code revolving in a giant feedback book that continually trains the LLM models.
A question fed to the AI universe confirms that Python is indeed at the top of the list given its straightforward syntax, large number of libraries and incredible rate of growth in public repositories. The popular Python repository pypi.org has nearly doubled in size from the beginning of 2024 to now and it expected to reach a total of 1 million projects published by the end of this year.
That also presents an incredible amount of training data to use, which includes previous code with error corrections causing the code to be continuously refined. I have used prompt engineering on multiple occasions to create small Python based utilities to help gather collections of design files, hardware libraries and data sheets of various PCB design projects I have done over the years, with very few modifications to the AI generated code needed now that my prompting skills have improved.
Vibe Coding Achilles Heel – Data Sheets
The warm and fuzzy comfort of vibe coding success fades quite a bit when software engineers get “closer to the metal” as we embedded systems engineers like to say. Embedded firmware engineering is a wonderful yet stressful combination of both pure software coding skills and enough hardware knowledge to not trust anything that’s happening unless verified by another piece of hardware, usually a trusted oscilloscope or logic analyzer. Most of the code created in this space directly relies on the information provided by the manufacturer on the target device and example code provided either on the vendor’s website or through specific target repos scattered throughout the web.
In the soft fuzzy environment of high-level language vibe coding, the target hardware is essentially allocated between the x86-64 CPU architecture that essentially owns the cloud spread across three operating systems: Windows, Linux and macOS and the counterparts in the mobile world, Android and IOS. These two target environments provide a wonderfully homogeneous development space where the hardware is far removed from code executing.
However, the embedded development space is a completely different environment since the target device is, for the most part, a unique take on an architecture that may use a proprietary processor core attached to custom and proprietary hardware interfaces.
Although common bus interfaces such as I2C, SPI and USB smooth the path to both the outside world and other devices in the local system, the key to the castle is the device data sheet which in today’s world is now present in a multi-thousand-page PDF document.
The PDF parsing problem
A critical translation layer is required of every SOC or embedded controller target and that is the translation of the information contained in the data sheet into plain text, readable information that does a few critical tasks.
It translates the memory map of the device past the basic information about where on-board Flash and RAM is located into a detailed numeric description of the memory mapped register space of all the peripherals present on the target device. These translations result in the creation of header files which are consumed by the C/C++ compilers to then translate the information back into raw numeric values required by the code generators.
- Internal memory addresses, for example a USART Transmission Register, gets defined something like:
#define USART0_RXDATAL _SFR_MEM8(0x0800)
- This translation allows the software engineer to use symbol terms when writing code instead of the actual memory address, in this case 0x0800. Even at assembly language level these translation definitions are used. For other registers, bit fields are further defined which symbolically describe control and status functions required by the peripherals on the target device to function.
The creation of these header files is still mostly done manually with increasing assistance from software utilities. Apart from occasional typographical errors, or indeed verified design flaws in the device itself, the result is a very accurate description of the target.
So where does the AI vibe coding problem get introduced?
A critical part of the data sheet is not only to confirm the mapping of the various registers but also describe how these registers are to be used in the proper order for correct operation.
In a vibe coding situation, the software engineer will need to properly identify the target device, i.e. SAMD21G18, the target peripheral (SERCOM1) and perhaps some desired function names, functions and return values. The underlying LLM will then need to properly extract the matching text from the datasheet and the matching direct text from the header files and interpret the text describing the process, configuration data and configuration order for these registers to be used.
Many of the configuration parameters in datasheets are presented in table format, and in some cases values to be written into configuration registers are calculated using a simple formula requiring knowledge of additional system parameters such as the CPU core or peripheral clock speed.
Parsing tables in PDF format remains a particular weakness in LLM accuracy, usually related to errors in determining field width of columns and wrapped lines.
Additionally, data sheets may split sequential procedures among various top-level options that may result in the wrong branch of this decision tree being followed.
This process of extracting incorrect information and incorrect procedural flow eventually causes incorrect example code to be generated.
Embedded code hallucinations
Over the past several months I have experimented with both internal and external chatbots by submitting code development requests from very basic LED blinking to more complex state machine control systems. There are a few patterns that began to show up as noted below.
Close, but still no
Even in some of the basic cases the resulting code was not only incorrectly constructed in terms of basic register operations, but failed the procedural functional test. The submitted code that was copied and pasted into the target development environment did compile correctly but did not provide a functionally correct solution. Either device configuration was incomplete or in a few cases incorrect. This situation has improved somewhat over the past year as AI engines have learned from their mistakes. Noted: over the past few months was the ability to correctly create a LED blinky example app across multiple development boards that had the same target MCU.
Target devices that still had data sheets in the old two column research/academic paper format tended to have higher failures as interpretation between column splits and hyphenated words proved more challenging.
Incorrect Extrapolation
In a few cases, some of incorrect results were because of an accidently incorrect prompt the AI engine tried its best to provide a solution. In one specific case an incorrect pin assignment was specified that prevented a correct solution for being possible. However, code was submitted that used statements using header file definitions that did not actually exist, usually in the form of bit field definition, but were extrapolated from patterns observed in the header files and extended to meet the requested configuration.
These errors were easily caught even before a build attempt because of the auto parsing / indexing functions of modern IDEs which will immediately identify undefined registers and bit fields within those new registers. It was interesting that the LLM engines did not do a reverse sanity check against the newly defined attributes, but instead went forward with the code submission.
Practice makes perfect
As noted previously, a large disadvantage of AI assistance in this space is the limited amount of available good training material. Large embedded firmware projects that did not start from the open-source environment tend to stay that way.
The bulk of available code base is through vendor specific example code and exposure to large open-source projects. It will take some time for the training code base in the bare metal environments to reach a nominal level, but the situation will always improve.
A bright spot already over the horizon is that Zephyr RTOS is gaining popularity and is open-source which allows for unrestricted access to the core code repositories. Recent AI interactions have proven very productive in getting example projects pulled together very quickly, although occasionally a project using deprecated code is generated; however, in most cases the content generated has been of excellent quality.
The PDF file parsing problem will continue to reduce in size, so asking your favorite AI engine to summarize a certain part of a data sheet again and again will be a good way to gauge improvement firsthand.
For the embedded space, I will continue to rely on the tools I am familiar with along with experience gained over the past several years. But I will always entertain AI construction code … with a skeptical eye.
Bob Martin is senior staff engineer at Microchip Technology where he is also known as Wizard of Make.
Author's Disclaimer: Although various popular AI engines were used to collect and summarize data presented above, the language, text and phrasing in this article is mine. I looked for consistent results across multiple LLM portals. Some engines were better in accuracy in results than others due to differences in available training data. This was expected especially in some of the more specific cases; however, the overall results were consistent in theme.