Home > Free Sessions > Theatre Talks >

Distributed PubSub for Microcontrollers

Matt Liberty - Joulescope - Watch Now - Duration: 43:32

Do you waste time “plumbing” firmware to connect a new feature?  Worry about managing dependencies between modules?  Struggle to manage state when things go wrong?  

In this session, we discuss real-world software architecture, dependencies, and state.  We examine the publish-subscribe (PubSub) design pattern, what problems it solves, and what challenges it creates.  We discuss how to create a distributed, reliable PubSub implementation that can span multiple microcontrollers with state recovery on failure.  We explore one solution, Fitterbap, a new open-source C library with host Python bindings, which includes:

  • An efficient, distributed PubSub implementation with simple metadata that allows you to quickly add, remove, and modify firmware controls.
  • A small, high-reliability data link layer, suitable for local data streams including UART.
  • Multiplexed, fast, reliable data streams, such as for sample waveform data.
  • A UI (python + Qt + PySide) that runs on your host computer.  The UI automatically instantiates controls from the metadata and plots streaming data.
M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

Matt_LibertySpeaker
Score: 1 | 4 months ago | no reply

Thank you to everyone who checked out the presentation, and thank you for the excellent feedback. Based upon the feedback I received, I made a few significant changes, and I just released 0.4.0. Visit the link to see the full list of changes.

I also now have a fitterbap_example_stm32g4 that targets the ST NUCLEO-G491RE, a $21 dev board. Each dev board supports 5 comm ports so that you can really play with the distributed PubSub implementation.

If you have any additional questions, feel free to post here. If you run into problems or have feedback on the code, please feel free to create an issue over on GitHub.

  • Matt
RyanMac
Score: 0 | 5 months ago | 1 reply

Good stuff Matt!

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Thanks, @RyanMac!

RaulPando
Score: 0 | 5 months ago | 1 reply

Hi Matt,
Thanks for the presentation. I was wondering whether you haven given any thought to how Fitterbap would fit on top of multidrop deterministic protocols like CANbus. The distributed nature of pub/sub would add an interesting twist to a topology capable of containing numerous nodes, all of them sharing the same bus which could potentially leverage a single message publication reaching all the subscribers. I suspect such approach would have some areas to consider for a successful integration since in the case of CAN 2.0 there are only 8 bytes of payload data available for a transmitted frame.
Thanks

Matt_LibertySpeaker
Score: 0 | 5 months ago | 1 reply

Hi @RaulPando!

I am not that familiar with CANbus, but I just read csselectronics.com and Wikipedia. It seems that CANbus is just a robust many-to-many protocol, where all nodes receive all messages. According to Higher Layer Protocols by Kvaser, the data format is assigned by the application. You could certainly create a PubSub layer over CANbus, if one doesn't already exist. However, check out UAVCAN and the C implementation libcanard.

Before I stumbled upon UAVCAN, I wrote this: Since you only have 8 bytes of payload, you probably want to be more efficient than topic strings. For example, you could construct topic integers where you assign a certain number of bits for your topic hierarchy. Alternatively, you could create some type of dynamic global registration with one node assigning new integers to publishers based upon topic metadata. You could then have the remaining payload for integer data. You may also need to reserve some bits to allow for segmentation & reassembly if you want to send larger data over CANbus frames.

The Fitterbap PubSub implementation is definitely not optimized for CAN, but it looks like UAVCAN is. What do you think?

Best regards,
-- Matt

RaulPando
Score: 0 | 5 months ago | 1 reply

Thanks Matt for the thorough response and reference to UAVCAN, another quality project. Looks great for CAN, Ardupilot leverages it for node peripheral communication.

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

I took a quick look through the AUVCAN spec, and it looks very interesting.
It appears they recommend CAN FD, likely for the increased performance. It appears to make some different tradeoffs than Fitterbap but has many of the same goals.

If you get a chance to play with UAVCAN, I'd love to hear what you think! You should be able to post here even after the conference, or you can email me (see the last slide of the presentation).

remco_at_itsonlyaudio.com
Score: 0 | 5 months ago | 1 reply

Great work Matt. I've been working with a client framework that is 100x more complicated than this and pretty much does the same. Liking this a lot more: you chose the right compromises.

  1. The way I see it when a message is NACK'ed and gets retransmitted the packets go out of order. Sometimes it would be necessary to restart the correct sequence from the missed packet. Is there a way to force that or is it easier to just NACK every packet on the receiver side until the oldest gets retransmitted?
  2. Would it be possible to set timeouts on ACK? For some fleeting states, an ACK after a certain time is meaningless or even disastrous.
    / Remco
Matt_LibertySpeaker
Score: -1 | 5 months ago | 1 reply

Hi Remco,

Thanks! If you find something that does not seem like the right tradeoff, let me know! I'd love to discuss it.

  1. I didn't spend much time discussing network retransmission approaches in the talk, but there are three very common methods:
    a. STOP & WAIT
    b. GO BACK N
    c. SELECTIVE REPEAT

STOP & WAIT only has a single frame outstanding at a time. If the sender gets a NACK or times out, it retransmits. The problem is throughput as you only have the one frame outstanding and have to wait for the processing delay plus the ACK delay. The upside is that you only need a single frame buffer on both the transmit and receive side.

GO BACK N fixes the throughput issue in the normal case. Instead of one outstanding frame, you can have N. The upside is the we can utilize the full bandwidth, assuming that N is large enough to account for the processing and 2 x transmit delay. The buffer requirements grow to N on the transmit side and 1 on the receive side. However, on error, the transmitter has to repeat every frame since the errored frame. While we (meaning I) would like errors to be normally (Gaussian) distributed, in reality, we often have an underlying normal distribution of errors with some additional errors that "group". The "grouping" is usually due to other events that increase the error probability, such as ESD. So, GO BACK N has a much higher chance of data loss than the normally distributed error assumption often leads us to believe.

With Selective Repeat, the sender only repeats errored frames. The receiver records all correct frames and plays them out in order when possible. This gives the best throughput and error resiliancy at the expense of increased buffering. Both the transmitter and receiver must have an N frame buffer.

Fitterbap implements (c), selective repeat. So, it only needs to repeat the errored frames.

For further research, you can start with Wikipedia.

  1. Yes, fitterbap includes timeouts, which are necessary for selective repeat to work reliably. Fitterbap allows you to configure the timeout for your data link. The longer the timeout, the more effect a lost packet has on your system and the more buffering you need to keep full throughput. Fitterbap is designed to be quick since I am considering the case of UARTs, SPI and I2C where propagation times are short and we want to keep the buffers small. My reference implementation has a 16 millisecond timeout.

Did I answer your questions? Any additional questions?

Best regards,
-- Matt

remco_at_itsonlyaudio.com
Score: 0 | 5 months ago | 1 reply

Thanks Matt for the answers and for sharing your work! I think I can work with that and have a nice project in mind that could put this framework to work.

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Awesome! Look forward to your feedback, and please contact me if you have questions or run into any issues!

tcmichals
Score: 0 | 5 months ago | 1 reply

Good talk, I have been using nanopb/google protocol buffers for routing messages for desktop to microcontrollers/FPGA.

Matt_LibertySpeaker
Score: 0 | 5 months ago | 1 reply

Hi @tcmichals,

Thanks for sharing! If I understand correctly, nanopb is a microcontroller-friendly version of Google protocol buffers. I have a few questions regarding how you are using it:

  1. How are you communicating between your microcontroller and the desktop computer? For example, USB-CDC? USB-CDC uses USB bulk mode which provides reliable USB frame delivery. As long as you don't drop USB frames, you don't need the error detection and retransmission features in the fitterbap datalink.

  2. You mentioned routing. Did you actually mean network-style routing, or do you really have a one-to-one message queue in each direction, where the messages are Google protocol buffers? If you mean network-style routing, could you elaborate?

  3. How reliable has your setup been? Would you recommend it?

Thanks for checking out the talk, and look forward to hearing more about your experiences!

Best regards,
-- Matt

tcmichals
Score: 0 | 5 months ago | 1 reply

<1.How are you communicating between your microcontroller and the desktop computer?
CDC-Serial with a simple escape sequence for serial and TCP. Yes, fitterbap has retries which is nice.
<2. Routing with in the application, google protocol buffer you can have a message type,, which is your message ID. Then use switch to forward the packet within the app.
<3 The reason I like google protocol buffers and nanopb is the host code, can be python, Java, C, C++ etc. So for testing I can do a HTML app on the PC with GUI and javascript to do simple testing and check out. Goggle protocol buffer is also provides documentation, on what the messages are. Also, the protocol is compressed.

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Thanks for the detail!

  1. Understood. USB Bulk also has error detection and retries. I have seen problems where either the microcontroller software or host software drops frames due to software issues or overflows. As long as your buffers are big enough and both sides are responsive enough, USB-CDC works reliably.

  2. You can think of PubSub as a message dispatcher, which sounds like what you have. Likewise, if your dispatcher currently has hard-coded recipients based upon message type, you could add a subscribe feature to break that dependency. Then, you would have a nanopb-based PubSub implementation with content-sensitive (not topic sensitive) filtering.

The fitterbap comm stack really passes messages. The messages can be anything. Fitterbap includes a transport layer with ports, but you don't need that for your current usage. If you want to reliably pass your protobuf messages between microcontrollers over normal UART, the datalink layer can easily do this for you!

  1. Cool. Yes, protobufs have lots of language bindings. With Fitterbap, the development UI creation is automatic ;) However, it sounds like you have a good solution that is working for you!
hdonahue
Score: 0 | 5 months ago | 1 reply

Awesome talk Matt! I ran into exactly this problem when working on a product that communicated over two radios and a UART where data could come in over any one of those three and change the state, and then had to be reflected back over the other two. Thanks!

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Thanks Harrison, and great to hear that you enjoyed the talk! Feel free to contact me if you have questions!

nathancharlesjones
Score: 0 | 5 months ago | 2 replies

Thanks for the talk, Matt! Fitterbap seems like a really neat project and I'm excited to see where it goes. Is there a README or other documentation that lists which modules are currently available and stable enough to use? I can't find anything on the Git repo that lays out what, exactly, is include like on the slide you show at 23:30. Also:

  1. At 5:05, you mention that the technique in languages like C to encapsulate state "only works until you have a large-scale interaction". Could you elaborate on that? What is it that starts to fail?
  2. At 7:45, you mention that dependency injection "doesn't really work if that state changes over time." Is that because when state changes, the modules don't inherently have a way of "notifying" the other parts of the system that depend on it?
  3. It's seems to me that many of the examples in the beginning of mutable global state have to do with system settings or device state. Having certain settings or threads be active or inactive under certain conditions (at startup or as the device is in operation) seems to lend itself naturally to an overarching FSM. Would I be correct in saying that's a form of "context manager" and I'd still have the problem of notifying certain parts of the system when a state is entered/exited, so something like Observer/PubSub is still needed?
Matt_LibertySpeaker
Score: 0 | 5 months ago | 1 reply
  1. Yes. To change the state of the those modules, you would need to continue to talk to them from other parts of the system. This creates a dependency and new interactions that reduces the benefit of the initial dependency injection.

  2. If you have a finite state machine (FSM) controlling your system, then the current FSM state is the global state, at least with respect to what it's controlling. As you mention, if you need to communicate that state to other parts of the system or other parts of the system need to behave differently based upon the current FSM state, then you have shared global state. You can easily use PubSub to carry the events to your FSM. The FSM can also publish it's state to keep other parts of the system in sync. You probably would not want other parts of the system publishing the FSM state, though, so you would have to enforce that by design.

Does this answer your questions? Any follow up questions?

nathancharlesjones
Score: 0 | 5 months ago | no reply

Thanks! I'm still digesting what you've written, so no follow-ups at the moment.

Matt_LibertySpeaker
Score: 1 | 5 months ago | no reply

Hi @nathanchalesjones!

The Fitterbap project does not currently have a list of modules that are available with their maturity. The project does use doxygen, but I have not published it. Something else for the list! For now, you can peruse the include directory. With the exception of pubsbu and the communication that were in the talk, everything else is in production code for products I have built.

  1. This is a tough one to answer. I know at least one functional programming convert that would say it immediately by design! In practice, shared state tends to generate code smells like tightly coupled code, code that is difficult to test, intertwined dependencies, and that piece of code that no one wants to touch. As an consultant, I took on extending a C/C++ project where the entire codebases was effectively a single giant object. It was actually multiple objects, but they all had public members that other objects modify at will. It was a mess. My solution was to encapsulate the mess (the customer did not want to pay to fix it!) and extend around it. Last I heard, the customer was very happy with the new code, but unable to successfully make changes to that old code. Shared state has a way of getting out of hand over time, so any choices we can make to keep it under control during the initial design is valuable. I personally think PubSub is one option.
EEngstrom77
Score: 0 | 5 months ago | 1 reply

Top notch demo and all the source on Git Hub! A logo for Fitterbap I am imaging an Apple Fritter with boxing gloves. Also, great advice using an Electrostatic Discharge Generator to generate bit errors over your serial connection. I had never considered that before. Have you tried that also with physical layer that use differential signals and also any advice on preventing damage your board?

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Thanks @EEngstrom77! I like the logo visual image!

I have used the ESD generator with USB, both high-speed and super-speed, which use differential signals. With USB, common-mode is also a huge problem, so the ESD generator works great for injecting errors. You don't have to hit your signals that hard with ESD to create bit errors. A discharge nearby is often enough, which many devices can handle without special ESD protection. If you have an expensive target board, you may only want to use this approach on signals that have ESD protection.

MatthewZaleski
Score: 0 | 5 months ago | 1 reply

Great talk! I've used PubSub on the desktop but always considered it too heavy and memory intensive for microcontrollers. I like your approach to getting most of the heavy-weight PubSub benefits on resource-constrained systems, especially for comms between 2 micros.

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Great to hear that you enjoyed the talk, Matthew! While this PubSub implementation is not going to fit on the small ATtiny's, it is definitely suitable for many microcontrollers including Cortex-M family parts.

Erwin
Score: 1 | 5 months ago | 1 reply

A really interesting approch! As noted in prior comments I also intend to try it out. As I didn't had much time to dive deeper into PubSub so far I have some more (maybe silly) questions:

  • Is this a mechanism that works inside one uC (e.g. between different tasks of an RTOS) and also across multiple controllers conected via arbitrary interfaces (like UART in your demo)?
  • Is there some kind of permanent communication (like keep alive signals) needed or is data transfer only event based?
  • Are all topics available/visible to all nodes or better how is metadata shared across nodes?
  • In you video you mentioned exceptional circumstances like watchdog restarts can be handled be repopulating state. If I have some hirarchy like A -> B -> C and B does restart, is every topic retransmitted as soon as B subscribes again to the so called link subscriber?
  • What happens with topics from C subscribed by A if B is temporary unavailable?

Maybe you have some good resources about the fundamentals of PubSub you can share and which are worth reading.
Hopefully I can manage to join your Q&A as i get more and more questions in my mind while writng :-)

Matt_LibertySpeaker
Score: 3 | 5 months ago | 1 reply

Hi Erwin.

  1. PubSub at its simplest is a design pattern that you can implement within a single program or single microcontroller. Networked PubSub protocols, like MQTT, and PubSub brokers allow publishers & subscribers over a network. The distributed PubSub that I am talking about is a simplified version of network PubSub that works both within a single microcontroller and across multiple microcontrollers.

  2. At the API of the data-link layer, you send a message and it is transferred to the microcontroller on the other side of the connection. The PubSub port packs & unpacks data-link layer messages that contain the topic and value. The data-link does not currently have a keep-alive, but it does use timeouts. This means that it can determine loss of connection, but only when it has something to say. I have considered adding an optional keep-alive for more guaranteed loss of connection detection.

  3. Topics are visible up the hierarchy, but not down. Metadata is shared whenever a publisher posts $ or my/topic/$. Metadata is shared just like any other PubSub message, but with the reserved $ topic suffix.

  4. Yes, state is repopulated. Check out this explanation.

  5. C will continue operating, and potentially publish new topics. Normal non-retained messages will be passed to subscribers local to C, but A will never know. Retained messages will also be passed to subscribers local to C. When C establishes connection with B, then it will update B with the retained values. Likewise, A's retained values will be passed to B. If the retained values on both A and C changed while B was down, then the value will resolve, but the winner is not well-defined with the existing implementation. It could be the value from either A or C depending upon the order in which the connections come up.


PubSub itself is relatively simple. Anyone can publish a message. Subscribers receive the messages for which they subscribed. That's really about it conceptually! Does the figure on slide 12 make sense to you?

Now, implementation details vary greatly. You can check out Wikipedia. There are lots of variations, especially for how the PubSub instance decides how to forward which messages to which subscribers.

I hope to chat live during the Q&A!
-- Matt

Erwin
Score: 0 | 5 months ago | no reply

Thanks for this fast and detailed response! This makes it a lot clearer although I still need some more time rethinking and maybe playing arround a bit.

Steve_Wheeler
Score: 0 | 5 months ago | 1 reply

Impressive, and I can definitely see the appeal and intend to try it out, but I have to admit that I'm unclear on parts of it. It sounds to me as though topics are ways of referring to I/O devices, and publishers and subscribers refer to either tasks in an RTOS or separate microcontrollers. Is that a good understanding, or is it more generic than that?
Then, when discussing publishing and subscribing and the associated traversals of the trees, it sounds as though each publish message can potentially be sent many times - at least once per each subscriber to a topic. I think I'm missing something, because that sounds like an obvious inefficiency to avoid. In most of the message-passing schemes I've used in the past, each message went to one destination. In a couple of them, the messages went everywhere, and the receivers determined if the message was for them. There was no subscription involved, though.

Matt_LibertySpeaker
Score: 1 | 5 months ago | no reply

Hi Steve,
The top-level of the topic must match the "owner" PubSub instance, which is typically one per microcontroller. The remaining topic hierarchy is up to you. I typically think of the hierarchy in terms of matching my module architecture, but it doesn't have to. In the case of retained values, the last part of the hierarchical name is a feature, which that module exposes so that other parts of the system can control it. That feature can do anything, including writing hardware registers, changing software behavior, and starting a waveform stream. Basically, you are free to assign the topics and meaning of values to fit your system.

The top-level topic allows some limited routing. As you mentioned this is not a typical network stack, but it is not that inefficient either. Let's take the case of the PubSub instances A, B, and C, where A is the top of the tree, B is in the middle and C is at the bottom. A module connected to PubSub instance C publishes C/enable = 1. The PubSub instance C forwards that to all matching subscribers, including the link subscriber for B. PubSub instance B receives the message and forwards it to all matching subscribers except the publisher, the PubSub instance C's link subscriber. One subscriber is the link subscriber for A. PubSub instance A receives the message and forwards it to all subscribers except the publisher. Any and all subscribers have now received the message.

Now a module connected to PubSub instance A can issue C/enable = 0. PubSub B subscribers to PubSub A with both top-level topics B & C. Upon publish, A forwards the message to all matching subscribers, including PubSub B. Likewise, B forwards the message to all matching subscribers, which includes PubSub C.

Now, let's say that a module connected to PubSub C publishes C/enable = 1, but there are no external subscribers at B or A. What happens? Well, the message is still forwarded to B and the only subscribers is A. A receives the message and discards it because it has no other matching subscribers. This is somewhat wasteful of bandwidth. We could architect the system so that C would know that it's parents have no subscribers. However, this requires more coordinatation and RAM, so the existing implementation just forwards the message regardless. We do have the "_" topic, which is never forwarded. This is a very simple way to keep more frequent communications within a single microcontroller from taking up distribution bandwidth unnecessarily.

Yes, the PubSub approach allows for many-to-many communications. By design, many-to-many allows multiple subscribers to receive the same message. However, it can also support one-to-one communication if a topic has one publisher and one subscriber. You do this by adding a topic with one and only one publisher and one and only one subscriber. However, if you need two subscribers at a later date, the PubSub approach makes this trivial.

If you have two modules that always communicate, especially at high bandwidth, using a one-to-one message-passing queue is a great solution. Nothing about this stops you from doing that. If you want to communicate directly over the data-link layer without pubsub, the extra unassigned ports are perfect for this. You saw a very brief example with the waveform.

Does this help? The live Q&A is tomorrow (Mon) at 10:30 am EDT. I'd love to chat more and answer any questions!

Best regards,
-- Matt

WD
Score: 0 | 5 months ago | no reply

Great talk, awesome project.

Miro
Score: 0 | 5 months ago | 1 reply

Hi Matt,
I enjoyed your talk very much. Your library seems very useful. In my work with event-driven systems, people are asking a lot about distributing the "active objects" among different CPUs...

It would be a great help for people like me to have a self-contained example project that I could just build. For example project for KEIL uVision of IAR EWARM (the free eval versions, if possible)... If the software build works, I could just buy an evaluation board you chose to move to the next step of trying it on real hardware.

I hope that my comments make sense to you.

I'm looking forward to live Q&A on Monday.

All the best,
Miro

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Hi Miro,
Distributed PubSub is definitely a great fit for sending events (messages) around your system! It may be a good way for QP to encompass a larger system of microcontrollers.

I definitely agree that I need to create an example project. I have been thinking of targeting Raspberry Pi Pico since no one can argue with the price. I started, but have not had enough time to really dive in.
Would the Pico (RP2040) work for you?

Best regards,

  • Matt
sergio_prado
Score: 1 | 5 months ago | 1 reply

Nice talk Matt, thanks!

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Thank you, Sergio. Looking forward to your talk, too!

Miro
Score: 0 | 5 months ago | 1 reply

Hi Matt,
I'm trying to run your example on my Windows PC. I've downloaded fitterbap from GitHub, but I get a problem right at the beginning. When I launch the Python UI:
python -m pyfitterbap comm_ui
I first got the missing numpy. I installed it with: pip install numpy.
But then, I get ImportError: cannot import name 'crc' from 'pyfitterbap'
I tried: pip install crc, and it did install something, but I still get the import error.
I must be missing something, I know. I'm running Windows 10 with Python 3.9.1. It's not the absolute latest, but not ancient either.
Any help would be appreciated,
--Miro

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

Hi Miro,
Thanks for attempting to try out pyfitterbap! The project definitely needed the finishing touches for python packaging. I just pushed an updated version 0.3.1 to GitHub that will make things much easier. You can now follow the normal python install process using pypi:
pip3 install -U pyfitterbap
You can then run the Fitterbap Communications UI:
fitterbap comm_ui
or
python -m pyfitterbap comm_ui
If you would prefer to run from the source, you do need to build the native code using Cython, which is a little more work to configure on Windows. First, you need Visual C++ or MinGW. Once the tools are installed, you can build pyfitterbap. Here are the instructions to run in place:
cd c:\path\to\fitterbap
git pull
pip3 install -U -r requirements.txt
python setup.py build_ext --inplace
python -m pyfitterbap comm_ui

Does this work for you?
-- Matt

acarvalho
Score: 1 | 5 months ago | no reply

Great talk and great toolkit. Thank you!

esaias
Score: 1 | 6 months ago | 2 replies

Hi,
Is the fitterbap GitHub repo supposed to be public? I get a 404 message when I click on the link.

Matt_LibertySpeaker
Score: 0 | 5 months ago | no reply

The fitterbap GitHub repo is now available:
https://github.com/jetperch/fitterbap

Matt_LibertySpeaker
Score: 0 | 6 months ago | no reply

Hi @esaias and thanks for the question. The fitterbap GitHub repo is not yet public, but it will be before the conference. While the code is already being used, I want to take another pass through the API, which gets much harder to change after making it public.

OUR SPONSORS