Give bare-metal multicore processing a try

Multicore processing boosts performance and energy efficiency in many coding situations. Bare-metal algorithms further enhance these benefits. The post Give bare-metal multicore processing a try appeared first on EDN.

Electronics Apr 15, 2026 0 47 Add to Reading List

Give bare-metal multicore processing a try

Multicore processing boosts performance and energy efficiency in many coding situations. Bare-metal algorithms further enhance these benefits.

Many embedded firmware engineers have seemingly not yet tried multicore processing. Using processors with more than one core can actually make your architecture and coding easier. And the part you may find surprising is that setting up and using multicores processors is very easy to do.

Wow the engineering world with your unique design: Design Ideas Submission Guide

Typically, multicore programs are used in systems with an OS, but if you’re like me, my projects are typically bare-metal. I have long used multicore under an RTOS but have historically avoided multicore on bare-metal systems. But ever since I discovered how easy it was to use multicore on bare-metal, it has become part of my go-to design architecture.

Let’s look at how this is accomplished. The examples and discussions that follow are based on using an RP2040 two-core microcontroller with code developed on an Arduino IDE. RP2040 development boards can be found for around $5 USD. Also, although I will discuss a two-core setup, expanding to larger core count processors will use the same concepts.

So why didn’t I use multicore designs sooner? I had some concerns that there were difficulties that I wasn’t ready to take on. Some of these were:

How to keep each core’s code separate
How to start multiple cores
How the cores talk to each other (i.e., how to transfer data among cores)
What peripherals each core can use; do they need to be checked out or registered, for example

It turns out that all these issues are actually very easy to deal with. Let’s look at them one at a time.

First, how do you separate the code for each core? In a single-core Arduino C program there are two major sections: the setup section (which begins like this: void setup()) and the loop section (which begins like this: void loop()). If you are using two cores, the first core, core 0, will use these sections just as used in a single-core design.

The code for the second core, core 1, will have a function defining its main loop. Let’s name it core1_main. Then in the core 0’s setup section, enter the line multicore_launch_core1(core1_main);. That line will start the function, called core1_main, running on the second core. (Note: I find it much cleaner to put the core 1 code in a separate tab in the Arduino IDE.) Unlike the main loop in an Arduino C program, you will need to wrap the code in core 1 in a while(1); loop. Another item to include is the line #include "pico/multicore.h" at the top of the code.

Be aware that there are other approaches for setting up code in the second core. They include methods that allow core 1 to use its own setup function. Use your favorite AI or other research tool to discover other methods of setting up code and executing code on the second core.

Here’s a very simple example having each core blinking its own LED:

#include 
#include "pico/multicore.h"

// -----------------------------
// Core 1 code
// -----------------------------
void core1_main() {
    pinMode(14, OUTPUT);

    while (1) {
        digitalWrite(14, HIGH);
        delay(500);
        digitalWrite(14, LOW);
        delay(500);
    }
}

// -----------------------------
// Core 0 code
// -----------------------------
void setup() {
    pinMode(15, OUTPUT);

    // Start core 1
    multicore_launch_core1(core1_main);
}

void loop() {
    digitalWrite(15, HIGH);
    delay(300);
    digitalWrite(15, LOW);
    delay(300);
}

This example gives you an idea of how to get the two cores executing their own tasks. Typically, though, you would want to have some sort of communication between the cores. This can be achieved very simply by using a variable that each core can see and modify. It turns out that the entire memory space of the microcontroller can be seen and modified by either core. So, if you define a global variable at the top of the code (just below the #include statements), it can be used to transfer data between cores.

Make sure that the variable is tagged as volatile as it can change at any time. Also remember that the RP2040 is a 32-bit microcontroller and reading and writing 64-bit values is not atomic, so care must be taken to not read a transferred 64-bit value before both halves have been transferred. A simple flag can help here. This simple method of using shared memory to transfer data is easy but can be dangerous if you’re not careful—similar to global variables—but bare-metal developers typically like this tight control over resources.

This method of transferring data is good for simple tasks, but you may want to use FIFOs to handle more data and to remove some syncing issues. These are not difficult to write, and you’ll also find pre-written packages online. For more sophisticated programs, you can investigate mailboxes, semaphores, flags, etc. from various sources…but now we’re getting into RTOS functions.

Now let’s look at sharing peripherals between cores. In our bare-metal architecture, the best explanation is that any core can use any peripheral at any time. This situation is both good and bad. Good because there are no flags to set, no checkouts that need to happen, and no negotiations to be made: just use the peripheral. Bad in the sense that without some form of coordination the two cores can attempt to set up the same peripheral at the same time, in different configurations.

What I have found useful in my designs is that I have typically separated the code in the two cores such that each core always uses peripherals that are not used by the other core. If that not the case in your designs, you may want to implement a resource lock method using flags. Related is the interesting fact that both cores can use the serial port (only configured by one core) without any necessary handshaking or flags. Do note, though, that the serial communications will be interleaved. That said, I find this very handy since I can Serial.print() from either core during debugging.

Let’s answer one last question: why do I want to use more than one core? The first reason is the obvious one: you get more computing power. But more than that, by separating tasks from each other you may find coding easier and cleaner. That’s because there are no concerns about the multiple tasks fighting for cycles, especially for time-sensitive tasks. Also, if you are using multiple interrupts, separating these tasks between cores can remove the complexity of interrupts occurring at the same time and thereby holding off one or the other. Another benefit is that you may have faster response to events happening as you can essentially monitor and respond to twice as many events.

Here’s another code example using some of the concepts discussed earlier. This code uses core 1 to monitor the serial port looking for a G or an R. If it sees a G, it sets the shared variable led_color to 1. Core 0 continuously monitors led_color and turns on the green LED if the led_color is 1. Similarly, if core 1 sees a R it changes led_color to 0 and core 0 then then turns on the red LED:

#include 
#include "pico/multicore.h"

// ----------------------------
// LED pin assignments
// ----------------------------
#define RED_LED_PIN    14
#define GREEN_LED_PIN  15

// ----------------------------
// Shared variable between cores
// 0 = RED, 1 = GREEN
// ----------------------------
volatile int led_color = 0;

// ======================================================
// Core 1: Serial monitor
// ======================================================
void core1_entry() {
    while (!Serial) { delay(10); }

    while (1) {
        if (Serial.available() > 0) {
            char c = Serial.read();

            if (c == 'G' || c == 'g') {
                led_color = 1;
                Serial.println("Set LED = GREEN");
            }
            else if (c == 'R' || c == 'r') {
                led_color = 0;
                Serial.println("Set LED = RED");
            }
        }
        delay(2);
    }
}

// ======================================================
// Core 0 setup
// ======================================================
void setup() {
    Serial.begin(115200);

    pinMode(RED_LED_PIN, OUTPUT);
    pinMode(GREEN_LED_PIN, OUTPUT);

    // Launch Core 1
    multicore_launch_core1(core1_entry);
}

// ======================================================
// Core 0 loop — LED logic now lives here
// ======================================================
void loop() {
    if (led_color == 1) {
        digitalWrite(GREEN_LED_PIN, HIGH);
        digitalWrite(RED_LED_PIN, LOW);
    } 
    else {
        digitalWrite(RED_LED_PIN, HIGH);
        digitalWrite(GREEN_LED_PIN, LOW);
    }
    delay(5);
}

It may now be becoming clearer to you where the benefits lie in using more than one core. Think of something more complex, say, a program that monitors the serial port for modifications to settings, along with a high-speed ADC being read with a tight tolerance on jitter. Having the serial port code running on one core and the ADC code in another core would make this combination much easier to get working cleanly.

Give multicore code design a try! It’s easy, I think you’ll find lots of uses for it, and you’ll also find it makes coding easier and more organized.

p.s. Both pieces of code shown in this article were initially written by CoPilot per author instructions. The author subsequently only made minor modifications.

Damian Bonicatto is a consulting engineer with decades of experience in embedded hardware, firmware, and system design. He holds over 30 patents.

Phoenix Bonicatto is a freelance writer.

Related Content

The post Give bare-metal multicore processing a try appeared first on EDN.

Read Original