Wednesday, January 15, 2025

Taking Up Space – The Daily WTF

Programming LanguageTaking Up Space - The Daily WTF


April Fool’s day is a day where websites lie to you or create complex pranks. We’ve generally avoided the former, but have done a few of the former, but we also like to just use April Fool’s as a chance to change things up.

So today, we’re going to do something different. We’re going to talk about my Day Job. Specifically, we’re going to talk about a tool I use in my day job: cFS.

cFS is a NASA-designed architecture for designing spaceflight applications. It’s open source, and designed to be accessible. A lot of the missions NASA launches use cFS, which gives it a lovely proven track record. And it was designed and built by people much smarter than me. Which doesn’t mean it’s free of WTFs.

The Big Picture

cFS is a C framework for spaceflight, designed to run on real-time OSes, though fully capable of running on Linux (with or without a realtime kernel), and even Windows. It has three core modules- a Core Flight Executive (cFE) (which provides services around task management, and cross-task communication), the OS Abstraction Layer (helping your code be portable across OSes), and a Platform Support Package (low-level support for board-connected hardware). Its core concept is that you build “apps”, and the whole thing has a pitch about an app store. We’ll come back to that. What exactly is an app in cFS?

Well, at their core, “apps” are just Actors. They’re a block of code with its own internal state, that interacts with other modules via message passing, but basically runs as its own thread (or a realtime task, or whatever your OS appropriate abstraction is).

These applications are wired together by a cFS feature called the “Core Flight Executive Software Bus” (cFE Software Bus, or just Software Bus), which handles managing subscriptions and routing. Under the hood, this leverages an OS-level message queue abstraction. Since each “app” has its own internal memory, and only reacts to messages (or emits messages for others to react to), we avoid pretty much all of the main pitfalls of concurrency.

This all feeds into the single responsibility principle, giving each “app” one job to do. And while we’re throwing around buzzwords, it also grants us encapsulation (each “app” has its own memory space, unshared), and helps us design around interfaces- “apps” emit and receive certain messages, which defines their interface. It’s almost like full object oriented programming in C, or something like how the BeamVM languages (Erlang, Elixir) work.

The other benefit of this is that we can have reusable apps which provide common functionality that every mission needs. For example, the app DS (Data Storage) logs any messages that cross the software bus. LC (Limit Checker) allows you to configure expected ranges for telemetry (like, for example, the temperature you expect a sensor to report), and raise alerts if it falls out of range. There’s SCH (Scheduler) which sends commands to apps to wake them up so they can do useful work (also making it easy to sleep apps indefinitely and minimize power consumption).

All in all, cFS constitutes a robust, well-tested framework for designing spaceflight applications.

Even NASA annoys me

This is TDWTF, however, so none of this is all sunshine and roses. cFS is not the prettiest framework, and the developer experience may ah… leave a lot to be desired. It’s always undergoing constant improvement, which is good, but still has its pain points.

Speaking of constant improvement, let’s talk about versioning. cFS is the core flight software framework which hosts your apps (via the cFE), and cFS is getting new versions. The apps themselves also get new versions. The people writing the apps and the people writing cFS are not always coordinating on this, which means that when cFS adds a breaking change to their API, you get to play the “which version of cFS and App X play nice together”. And since everyone has different practices around tagging releases, you often have to walk through commits to find the last version of the app that was compatible with your version of cFS, and see things like releases tagged “compatible with Draco rc2 (mostly)”. The goal of “grab apps from an App Store and they just work” is definitely not actually happening.

Or, this, from the current cFS readme:

Compatible list of cFS apps
The following applications have been tested against this release:
TBD

Messages in cFS are represented by structs. Which means when apps want to send each other messages, they need the same struct definitions. This is just a pain to manage- getting agreement about which app should own which message, who needs the definition, and how we get the definition over to them is just a huge mess. It’s such a huge mess that newer versions of cFS have switched to using “Electronic Data Sheets”- XML files which describe the structs, which doesn’t really solve the problem but adds XML to the mix. At least EDS makes it easy to share definitions with non-C applications (popular ground software is written in Python or Java).

Messages also have to have a unique “Message ID”, but the MID is not just an arbitrary unique number. It secretly encodes important information, like whether this message is a command (an instruction to take action) or telemetry (data being output), and if you pick a bad MID, everything breaks. Also, keeping MID definitions unique across many different apps who don’t know any of the other apps exist is a huge problem. The general solution that folks use is bolting on some sort of CSV file and code generator that handles this.

Those MIDs also don’t exist outside of cFS- they’re a unique-to-cFS abstraction. cFS, behind the scenes, converts them to different parts of the “space packet header”, which is the primary packet format for the SpaceWire networking protocol. This means that in realistic deployments where your cFS module needs to talk to components not running cFS- your MID also represents key header fields for the SpaceWire network. It’s incredibly overloaded and the conversions are hidden behind C macros that you can’t easily debug.

But my biggest gripe is the build tooling. Everyone at work knows they can send me climbing the walls by just whispering “cFS builds” in my ear. It’s a nightmare (that, I believe has gotten better in newer versions, but due to the whole “no synchronization between app and cFE versions” problem, we’re not using a new version). It starts with make, which calls CMake, which also calls make, but also calls CMake again in a way that doesn’t let variables propagate down to other layers. cFS doesn’t provide any targets you link against, but instead requires that any apps you want to use be inserted into the cFS source tree directly, which makes it incredibly difficult to build just parts of cFS for unit testing.

Oh, speaking of unit testing- cFS provides mocks of all of its internal functions; mocks which always return an error code. This is intentional, to encourage developers to test their failure paths in code, but I also like to test our success path too.

Summary

Any tool you use on a regular basis is going to be a tool that you’re intimately familiar with; the good parts frequently vanish into the background and the pain points are the things that you notice, day in, day out. That’s definitely how I feel after working with cFS for two years.

I think, at its core, the programming concepts it brings to doing low-level, embedded C, are good. It’s certainly better than needing to write this architecture myself. And for all its warts, it’s been designed and developed by people who are extremely safety conscious and expect you to be too. It’s been used on many missions, from hobbyist cube sats to Mars rovers, and that proven track record gives you a good degree of confidence that your mission will also be safe using it.

And since it is Open Source, you can try it out yourself. The cFS-101 guide gives you a good starting point, complete with a downloadable VM that walks you through building a cFS application and communicating with it from simulated ground software. It’s a very different way to approach C programming (and makes it easier to comply with C standards, like MISRA), and honestly, the Actor-oriented mindset is a good attitude to bring to many different architectural problems.

Peregrine

If you were following space news at all, you may already know that our Peregrine lander failed. I can’t really say anything about that until the formal review has released its findings, but all indications are that it was very much a hardware problem involving valves and high pressure tanks. But I can say that most of the avionics on it were connected to some sort of cFS instance (there were several cFS nodes).

[Advertisement]
Continuously monitor your servers for configuration changes, and report when there’s configuration drift. Get started with Otter today!

Check out our other content

Check out other tags:

Most Popular Articles