Adventures in the RISC-V land
RISC-V is here to stay. With the big boys joining alliance and companies like Renesas pushing out RISC-V based MCUs and SiPs, it is safe to say hardware hackers will start to see more and more RISC-V in devices. Unfortunately, this is not all good news. Things will get considerably more proprietary and custom, even if the ISA is open-source.
This post explores my adventures in the RISC-V land and how I’ve had to contort my usual reverse-engineering workflow to cope with a whole new pile of unknowns
Introduction
If you’re new to the acronym soup, RISC-V (pronounced “risk-five”) is an open, royalty-free instruction-set architecture (ISA). That single sentence is why VCs hyperventilate and why it keeps getting dragged into geopolitical headlines: anyone can design a CPU that understands RISC-V, add whatever proprietary extensions they fancy, and ship silicon without paying Arm-style licence fees.
Since the spec went public in 2015 the ecosystem has exploded – from hobby-grade FPGAs to full-fat Linux-capable SoCs and, of course, a fresh crop of microcontrollers. Renesas, for example, rolled out a general-purpose 32-bit RISC-V MCU family in March 2024, complete with its own in-house core and a pretty respectable toolchain. Add the fact that everyone from Intel Foundry Services to the CHIPS Alliance is now a dues-paying member of RISC-V International and you get the picture: RISC-V is not going anywhere, anytime soon.
Great! So everything’s open now, right? Yeah…
Building understanding – or not
The usual hardware hacking starting point – check the package markings, grab a datasheet and reference manuals – falls apart fast in RISC-V land.
The silicon behind the, most of the time, unidentifiable package markings might be a SiFive, Andes or Allwinner core, a tuned clone of these, or something that started up as a vendor’s employees’ personal side project. Anything really. Unless the part number leaks, good luck guessing which IP the chip is using and which optional ISA extensions (Zicsr? Zfinx? Custom crypto?) are actually implemented.
A lot of the RISC-V gadgets, some from reputable brands, I cracked open arrived in wafer-level CSPs or weird SiP packages. All with markings that yield nothing with search engines, nor tell anything that’s inside. There was only one case where the FCC documents showed different, likely a pre-production model, that revealed Allwinner logos on the SoC.
That is a good example why it is necessary to turn all the stones. This then pointed me to the right direction, grabbing some Linux images built for Allwinnner RISC-V SoCs revealed some commonalities with the dump I had at hand, meaning the vendor likely used Allwinner IP on the chip, and as such, the Allwinner SDK.
It often takes hours or days of GitHub archaeology and Google-fu just to confirm basic facts like “Is this a SoC or MCU”. Compared to non-RISC-V hardware hacking, most likely you just have to accept that you will not be able to definitely say what is inside the package.
This means a lot of work needs to be put in to work around the lack of information. Be prepared for some serious head scratching. It is good idea to make this a more iterative process than normally. When you come across any new information later, make sure to run through ALL of these steps again. There might be some stuff you missed earlier.
Libraries and RISC-V core IPs
Yes, the ISA is open, but the actual core RTL, the debug module, the clock gating fabric, the DMA engines – those can be pay-walled, NDAd or wrapped in encrypted bitstreams or firmware blobs. Many SoCs license Andes, Codasip, Sispeed or build internal cores and then bolt on custom extensions for DSP, AI or DRM.
The libraries that accompany them (HALs, CMSIS-style headers, even GCC multilib variants) often only leak via eval boards, prototypes or accidentally shared git repos. Keep a running collection; even a half-complete header for custom0_csr.h is gold when your disassembly hits a csrr with unknown bits.
Dumps, logs and more
Because the silicon is a black box, everything external to the die becomes extra precious. Before you even think about Ghidra, grab everything the board will cough up.
Start with the obvious, the external flash. Nine times out of ten the RISC-V MCU or SoC is booting straight out of an off-chip flash device. Even if the payload turns out to be encrypted, often there are still some clear text artefacts and remainders of the manufacturing or development process.
The partition tables or proprietary headers might give clues of the SDK being used. Good idea is to search the internet for various RISC-V firmwares and images, download all of them and then see if there are any similarities.
If you got an unencrypted flash dump, do all of the normal steps like strings, binwalk, grep, scrolling in hex editor. Be extra mindful of any path strings with .cpp, .c, .ld. Section names, vendor prefixes or suffixes in filenames or paths, strange names like Codasip, Andes, T-Head, Sispeed all which are RISC-V vendors. Anything you come across, cross-reference it to the stuff you have downloaded for example from GitHub. This may reveal what you are dealing with.
If the device has external wireless chips, like most of them do, dump the comms from the bus. Most of the wireless or other external chips have public datasheets so it is possible to find for example HCI bus pins. Capturing these buses might yield the SoC vendors SDK name or reveal some other details.
Also, by looking at where these pins connect on your unknown RISC-V chip, enables you to start slowly build, at least partial, pinout.
Talking of Wi-Fi, BT or Ethernet, MAC addresses might yield some information too. Check if the first three octets (OUI) reveals information about the original chip vendor, maybe it was already burnt on the chip at the fab.
When all the high-tech methods dry up, go old-school. Take a thin steel brush (yes, the hardware-store 2€ variety) and lightly brush it across exposed signal vias, pads or pins while the board is running.
The resulting shorts can corrupt an SPI read or I²C register and crash the device. If you’re lucky the dump prints a back-trace with absolute addresses or, better yet, some debug information and function names. The picture above is a great example. This crash resulted from brushing over the SD card socket pins.
This of course requires you to have access to e.g. UART console or the device to record some logs you can download via UI.
There is of course a possibility that you will nuke the board with this method, but if the device is not very expensive, it is worth a try when all else fails.
Proprietary image formats
Most commercial RISC-V SoCs and IP have similarities, while still being very different: often a tiny always-on core, number of performance clusters, maybe an FPGA fabric, and a handful of peripherals. With this firmware images, unsurprisingly, are complex:
- Proprietary vendor headers and magic bytes.
- Multi-stage bootloaders. In my research there has been roughly 50:50 split. With half of the initial bootloaders being encrypted, half not. Overwhelmingly, all having some kind of signatures.
- Various blobs, often encrypted, housing the firmwares and drivers for various IP and peripherals.
- The actual firmware, split into multiple sections. Loaded at randomly seeming addresses.
No universal “bin + dtb” here. You’ll spend quality time carving chunks out with a hex editor, checking for embedded ELF headers or LZ4 markers, then verifying offsets by hand. Pattern editors, like the one in ImHex are a valuable companion and makes life a lot easier.
The proprietary image header above is a great example of what RISC-V images can look like.
At first glance, and even if looking at it a bit more carefully, it just comes out as incomprehensible garbage. But with access to multiple firmware versions and diffing them you will start to notice patterns that emerge.
Looking at these further and cross-referencing with the other collected information, the data starts to make sense. The normal things appear, such as image part size, load address, length, hash/signature.
What really put me off in the wrong direction with this particular example was the load addresses, which initially looked very strange. 0xb0278400, 0xb02ba000… Normally, you have load addresses like 0x20000000 or something similar. Having been preconditioned to what a load address should look like, I missed these and subsequently wasn’t able to parse the header.
Few tips:
- Ditch any assumptions you have of what firmware image should look like or contain. These can be literally anything.
- If possible try to get as many different firmware versions as possible, dump the devices before updating them, as most of the time, devices ship with old firmware versions that you cannot necessarily download from anywhere
- Entropy gradients can point out plaintext blocks (config tables, IVT, etc.). Ensure you have sufficiently fine resolution. Tools like Veles and binvis.io are very helpful
- The image most likely consists of multiple parts that you need to somehow carve out. Speciality cores like image processing, DSP, AI, crypto might have their own firmware images. It is not uncommon to have proprietary image, inside another proprietary image, inside which, you have an IP vendors binary blob or a FPGA bitstream.
Will it import into Ghidra?
Without a public memory map, loading the binary into your favourite disassembler is going to be challenging.
Here’s one trick I have found to work reasonably well across different firmwares I have come across. Guessing the load-base with an AUIPC + ADDI pair. There is one precondition for this to work. You need to have some understanding of the various parts of the firmware as you cannot run the script for the whole firmware blob.
RISC-V position-independent code often uses a 32-bit absolute address with the following two-instruction combination.
auipc rd, imm20 ; rd ← pc + (imm20 << 12)
addi rd, rd, imm12 ; rd ← rd + sign_extend(imm12)
or in pseudocode representation
target_address = pc_base + (imm20 << 12) + imm12
If you can see the immediates (imm20, imm12) in the raw opcodes but you don’t know the pc_base the code was linked for, you can treat that equation as a single unknown and brute-force it.
Here’s a simple brute-force script for that.
Collect several AUIPC + ADDI pairs scattered through the block in question. Run the brute-force for each, then intersect the result sets.
The genuine base address survives. AUIPC always uses the current PC as its origin, the compiler bakes in an absolute link-time delta. That delta never changes at runtime – shift the whole binary up or down in memory and the AUIPC sequence over or undershoots.
This leads to a finite set of valid possibilities for load-addresses which then can be tried in Ghidra to see which provides best results.
Leverage linker waste. Some toolchains pad sections with ASCII strings or version banners – align those to spot the correct segment offsets.
Properly encrypted firmware
If the image is a single, high-entropy blob and the boot ROM verifies a signature… you’re probably toast. At that point silicon attacks (fault injection and side-channels) are the only way forward – and even then, secure enclaves are getting more challenging to beat every year.
Emulation?
If the device happens to use one of the common RV32 or RV64 cores with fairly standard PMP/MMU, QEMU will usually boot such an ELF. This is very low effort so try out all the available QEMU options with –machine. Even if it doesn’t fully work, watching the firmware panic in emulation can still reveal some strings, syscalls or error codes.
Summary
RISC-V’s promise is openness, but the practical reality for hardware hackers is a paradox: more freedom at the ISA level, less transparency everywhere else. In practice you’ll face:
- An explosion of vendor-specific cores and extensions.
- Sparse or nonexistent datasheets.
- Multi-layer proprietary firmware formats
- Toolchains and debug that may change per chip.
Treat everything you can capture – from UART dumps to Github sourced SDK headers – as forensic evidence. Cross-reference much more than normally, assume nothing, and keep digging. With enough work you can still get something from nothing – even in RISC-V land.
Some tools
Here are some simple Python tools I have created to help me with the analysis.