VPR: Nordic's First RISC-V Processor

(danielmangum.com)

154 points | by hasheddan2 days ago

5 comments

Scene_Cast21 day ago
I was looking into these recently. The current batch of the nRF54L15's was recalled, so I wonder when mass availability will happen. It looks like an interesting upgrade to the nRF52 though.
The reason why I was looking at it is because I'm trying to hook up a 1kHz sampling ADC while streaming the data over BLE, and I need either a good DMA engine that can trigger on an external pin without a CPU interrupt, or a second core. I went down the dual core route, but I'd love to hear people's experience with DMA on the nRF52's and esp32-h2's and if it's finicky or not, and it's worth investing time into.
- dwnw1 day ago
  DMA will work fine. I remember a time when we were rewarded for making do with what we had rather than putting a handful of CPU cores on a sensor. These do not sound interesting to me, honestly. It sounds more like they can't figure out what they need to make.
  gonzo7 hours ago
  That was a time when gates were not as plentiful as they are now. Both the new Nordic SoCs are designed at 22nm.
  It can be a challenge as to what to use all the gates for.
vardump2 days ago
So many different cores.
Dual M33s, VPR, consisting of PPR (a low power 16 MHz RISC-V RV32E? for external I/O) and FLPR (a RISC-V RV32E running at 320 MHz, for software defined peripherals).
I wonder what the max square wave GPIO toggle rate is with FLPR. Can it compete with for example RP2040/2350 PIO?
- crest13 hours ago
  They would have to fuck up massively for an I/O co-processor to lack single cycle GPIO access which would at least allow driving a 320/2 50% duty-cycle square wave, but sustaining that deterministically through priority access to memory and DMA is something else. At 150MHz the RP2350 is able to sustain that bandwidth with hard realtime guarantees allowing it to do things which are hard to impossible on other chips in the same price class e.g. glitchfree 720p video output "in software".
- petra13 hours ago
  I wonder if that will be the effect of the rp2350, to push the whole market toward software defined peripherals?
  MisterTea5 hours ago
  I would say not really as software defined peripherals have been around for some time. There is the Parallax Propeller and Greenarrays chips which have no hard IO blocks, everything is done in CPU cores. The Ti PRU is similar.
hasheddan1 day ago
Author here. I’ve got a few more posts on VPR coming in the next couple of weeks. If you have any requests for deep dives into specific aspects of the architecture, feel free to drop them here!
- hn3er1q20 hours ago
  Thank you so much for asking, I have oh so many requests...
  Personally, I'm mostly interested in the ARM vs RISCV compare and contrast.
  - I'd be very interested in comparing static memory and ram memory requirements for programs that are as similar as you can make them at the c-level using whatever toolchain Nordic wants you to use.
  - Since you're looking to do deep dives I think looking into differences in the interrupt architecture and any implications on stack memory requirements and/or latency would be interesting, especially as VPR is a "peripheral processor"
  - It would be interesting to get cycle counts for similar programs between ARM and RISCV. This might not be very comparable though as it seems the ARM architectures are more complex thus we expect a lower CPI from them. Anyway I think CPI numbers would be interesting.
  I could go on but I don't want to be greedy. :)
- rwmj17 hours ago
  Why did they go with the 64 bit Arm core instead of an RV64 core? (Or an alternative question: why go with the 32 bit RISC-V core instead of an Arm M0?)
  Does having mixed architectures cause any issues, for example in developer tools or build systems? (I guess not, since already having 32 vs 64 bit cores means you have effectively a "mixed architecture" even if they were both Arm or RISC-V)
  What's the RISC-V core derived from (eg. Rocket Chip? Pico?) or is it their own design?
  crest13 hours ago
  They haven't gone with a 64Bit ARM core. The ARMv8*M* isn't 64bit unlike ARMv8R and ARMv8A (the nomenclature can get confusing). The differences between ARMv7M (especially with the optional DSP and FPU extension) and ARMv8M mainline are fairly minor unless you go bigger with an M55 or M85 which (optionally) adds the Helium SIMD extension. At he low end ARMv8M baseline adds a few quality of life features over ARMv6M (e.g. the ability to reasonably efficiently load large immediate values without resorting to constant pool). Also the MPU got cleaned up to make it a little less annoying to configure.
  pm2159 hours ago
  ARMv8A and ARMv8R can both be pure 32 bit as well, incidentally -- e.g. Cortex-A32 and Cortex-R52. v8A added 64 bit, but it didn't take away 32 bit. It's not until v9A that 32 bit at the OS level was removed, and even there it's still allowed to implement 32 bit support for userspace.
  rwmj11 hours ago
  Thanks for the clarification. Confusing terminology!
  als014 hours ago
  > Why did they go with the 64 bit Arm core
  ARM Cortex-M33 is a 32-bit core, not 64-bit.
- crest13 hours ago
  Will open-source developers unable or unwilling to sign an NDA get access to a toolchain to run their own code on the RISC-V co-processors? Is the bus matrix documented somewhere? Does the fast co-processor have access to DMA engines and interrupts?
  janice199913 hours ago
  FYI Nordic said on their YouTube channel that the RISC-V toolchain that already ships with Zephyr's SDK will support the cores. See around 00:56:32.520 [1]
  [1] https://www.youtube.com/watch?v=ef87Gym_D5c
  hasheddan12 hours ago
  Indeed. It is used in this post to compile the Zephyr Hello World example for the PPR.
- SV_BubbleTime22 hours ago
  All of this wreaks of complexity crisis to me. That you need to know much and do do so much work - just in order to do the work you want to do.
  Explain why I’m wrong, please.
  fidotron13 hours ago
  You are wrong.
  When more general purpose hardware (i.e. CPU cores) are added to chips like this it is to replace the need for single purpose devices. True nightmarish complexity comes from enormous numbers of highly specific single purpose devices which all have their own particular oddities.
  There was a chip a while back which took this to a crazy extreme but threw out the whole universe in the process https://www.greenarraychips.com/
  awjlogan11 hours ago
  Not wrong, especially for microcontrollers where micro/nanosecond determinism may be important - software running on general purpose cores is not suitable for that. They can also be orders of magnitude more energy efficient than running a full core just to twiddle some pins.
  I’ve got a project that uses 4 hardware serial modules, timers, ADC, event system etc all dedicated function. Sure, they have their quirks but once you’ve learnt them you can reuse a lot of the drivers across multiple products, especially for a given vendor.
  Of course there is some cost, but it’s finding the balance for your product that is important.
  fidotron11 hours ago
  > They can also be orders of magnitude more energy efficient than running a full core just to twiddle some pins.
  This used to be true, but as fabrication shrinks first you move to quasi FSMs (like the PIO blocks) and eventually mini processors since those are smaller than the dedicated units of the previous generation. When you get the design a bit wrong you end up with the esp32 where the lack of general computation in peripherals radically bumps memory requirements and so the power usage.
  This trend also occurs in GPUs where functionality eventually gets merged into more uniform blocks to make room for newly conceived specialist units that have become viable.
  awjlogan10 hours ago
  No, still true - you’re never going to beat the determinism, size, and power of a few flops and some logic to drive a common interface directly compared to a full core with architectural state and memory. E.g., just to enter an interrupt is 10-15 odd cycles, a memory access or two to set a pin, and then 10-15 cycles again to restore and exit.
  Additionally, micros have to be much robust electrically than a cutting edge (or even 14 nm) CPU/GPU and available for extended (decade) timespans so the economics driving the shrink are different.
  Small, fast cores have eaten the lunch of e.g. large dedicated DSP blocks for sure but those are niche cases where the volume is low so eventually the hardware cost and cost to develop on weird architectures costs more than running a general purpose core.
  fidotron9 hours ago
  > No, still true - you’re never going to beat the determinism, size, and power of a few flops and some logic to drive a common interface directly compared to a full core with architectural state and memory.
  But you must know what you intend to do when designing the MCU, and history shows (and some of the questioning here also shows) that this isn’t the case. As you point out expected lifespans are long, so what is a designer to do?
  The ESP32 case is interesting because it comes so close, to the point I believe the RMT peripheral probably partly inspired the PIO, thanks to how widely it has been used for other things and how it breaks.
  The key weakness of the RMT is it expects the conversion of the data structures to be used to control it to be prepared in memory already, almost certainly by the CPU. This means that to alter the data being sent out requires the main app processor, the DMA and the peripheral to be involved, and we are hammering the memory bus while doing this.
  A similar thing occurs with almost any non trivial SPI usage where a lot of people end up building “big” (relatively) buffers in memory in advance.
  Both of those situations are very common and bad. Assuming the tiny cores can have their own program memory they will be no less deterministic than any other sort of peripheral while radically freeing up the central part of the system.
  One of the main things I have learned over the years is people wildly overstate the cost of computation and understate the cost of moving data around. If you can reduce the data a lot at the cost of a bit more computation that is a big win.
  awjlogan6 hours ago
  > But you must know what you intend to do when designing the MCU, and history shows (and some of the questioning here also shows) that this isn’t the case. As you point out expected lifespans are long, so what is a designer to do?
  Designers do know that UARTs, SPIs, I2C, timers etc will be around essentially forever. Anything new has to be so much faster/better, the competition being the status quo and its long tail, that you would lay down a dedicated block anyway.
  I think we'll disagree, but I'm not convinced by many of the cases given here (usually DVI on an RP2040...) as you would just buy a slightly higher spec and better optimised system that has the interface already built in. Personal opinion: great fun to play with and definitely good to have a couple to handle niche interfaces (e.g. OneWire), but not for majority of use cases.
  > A similar thing occurs with almost any non trivial SPI usage where a lot of people end up building “big” (relatively) buffers in memory in advance.
  This is neither here nor there for a "PIO" or a fixed function - there has be state and data somewhere. I would rather allocate just what is needed for e.g. a UART (on my weapon of choice, that amounts to a heady 40 bits local to the peripheral written once to configure it, overloaded with SPI and I2C functionality) and not trouble the memory bus other than for data (well said on data movement, burns a lot and it's harder to capture).
  > Assuming the tiny cores can have their own program memory they will be no less deterministic than any other sort of peripheral while radically freeing up the central part of the system.
  Agreed, only if it's dedicated to a single function of course otherwise you have access contention. And, of course, we already have radically freed up the central part of the system :P
  Regardless, enjoyed the conversation, thank you!
  fidotron5 hours ago
  > Regardless, enjoyed the conversation, thank you!
  Likewise, very much so!
  kragen9 hours ago
  If you have a programmable state machine that's waiting for a pin transition, it can easily do the thing it's waiting to do in the clock cycle after that transition. It doesn't have to enter an interrupt handler. That's how the GA144 and the RP2350 do their I/O. Padauk chips have a second hardware thread and deterministically context switch every cycle, so the response latency is still less than 10–15 cycles, like 1–2. I think old ARM FIQ state also effectively works this way, switching register banks on interrupt so no time is needed to save registers on interrupt entry, and I think the original Z80 (RIP this year) also has this feature. Some RISC-V cores (CH32V003?) also have it.
  An alternate register bank for the main CPU is bigger than a PWM timer peripheral or an SPI peripheral, sure, but you can program it to do things you didn't think of before tapeout.
  AlotOfReading21 hours ago
  The article goes into more detail than it strictly needs to because the purpose is educational. However, a lot of what it's presenting is simplified interfaces and relevant details rather than the true complexity of the whole.
  Modern hardware is just fundamentally complex, especially if you want to make full use of the particular features of each platform.
bfrog22 hours ago
I kind of wonder why Nordic bothered sticking with arm cores at all. The competition isnt
- eschneider15 hours ago
  Ok, I'm on a project that just picked N54 processors for a set of new designs. W/O arm cores it wouldn't even be considered. The RISC-V cores are useful and will be put to work, but they aren't exactly a selling point.
  Why did we end up with N54s? Price, Performance, BLE6 support, PRICE.
- janice199913 hours ago
  A lot of software stacks are still only optimised for ARM. TrustZone is also far more mature and supported than the RISC-V equivalents.
  explodingwaffle5 hours ago
  What would you consider the “RISC-V equivalent” of TrustZone? Last time I was curious I didn’t find anything.
  (FWIW I agree with the other commenter that these ""security"" features are useless, and feel to me more like check-box compliance than anything else (Why does TrustZone work with function calls? What’s wrong with IPC! Also, what’s wrong with privileged mode?). Just seems like a bit of a waste of silicon really.)
  jdietrich3 hours ago
  MultiZone, OpenMZ, Keystone, maybe Penglai or ProvenCore, I can't really keep up. That answer goes a long way to explaining the appeal of TrustZone.
  IshKebab5 hours ago
  There are some examples listed here:
  https://en.wikipedia.org/wiki/Trusted_execution_environment
  bfrog5 hours ago
  Is TrustZone really all that useful? I haven't had a need for anything like it myself!
  esp-c6 module parts are like $2-3 and already have an fcc cert, low power, and support all the same protocols I believe. Are nrf54 modules really that cheap? Can I program them natively with Rust yet?
  jdietrich3 hours ago
  If you care at all about security, then yes, it is tremendously useful.
  bfrog3 hours ago
  Care about security in what way? The radio and protocols are all written in C still with the endless CVEs that bring about aren’t they?
- IshKebab17 hours ago
  They'll probably replace them but given they had no RISC-V expertise I imagine they made these small coprocessors first in order to gain experience. You can't magically do everything all at once.
marsven_42214 hours ago
[dead]