Home > On-Demand Archives > Q&A Sessions >

Live Q&A - Mars Perseverance Software

Steve Scandore - Watch Now - Duration: 58:22

Live Q&A with Steve Scandore for the keynote session titled Mars Perseverance Software
M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

iitgrad
Score: 2 | 3 years ago | 1 reply

Fabulous presentation and Q&A Steve. You mentioned that wrt to coding you do not allow any developers to use semaphores. Just curious why that is the case. I have a hunch :-)

Steve SSpeaker
Score: 0 | 3 years ago | 1 reply

That's correct, the general use of semaphores (task locks in general) in the software is not allowed. This avoids some classic misuse and unexpected task dependencies (e.g.: inversions, deadlocks) in an architecture where we want tasks to be as independently operating and deterministic as possible. Using semaphores also complicates runtime analysis and testing in a system with processing deadlines. In short, we remove or cautiously use conventions which may question the operation of the code. In many cases we have easily redesigned code to avoid the casual use of a semaphore. Having said all that, we do have cases where waivers to this rule are granted. For example, IPC waits on message are implemented using a semaphore. By not allowing them, and then using waivers in the few places where they are really required helps ensure their overall safe use and operation of the system.
As I mentioned in the talk, we have learned a lot over the years. Here's a link to a related dependency problem from our past: https://www.youtube.com/watch?v=C2xKhxROmhA

iitgrad
Score: 0 | 3 years ago | no reply

Thanks Steve. I used to work on the avionics software for the F-16 at General Dynamics in the late 80's. Back then we had no RTOS, just a homegrown "cyclic executive". We weren't even allowed to pass parameters to a function because it took too long to push and pop data from the stack so everything was in global data. There was even a complete software team just to manage the global data. I am really glad to hear that VxWork is now used. I use to work with them at various contractors when I work for Rational Software. The Ada days. :-)

Yunus
Score: 3 | 3 years ago | 1 reply

Thanks for great presentation;
You mentioned "compression and data streaming" in Mars Perseverance FSW slide. Which compression formats do you use? Are they custom made or public/known protocols?
I remember NASA's articles about TTEthernet and how they use it. Did you also use (TT)Ethernet? Why?

Steve SSpeaker
Score: 0 | 3 years ago | no reply

Yunas, see related comment from DavidKnight below. We do not use time-triggered (TT) Ethernet on this mission. Unfortunately, it takes many years to get new technology (in space terms) introduced into the mission avionics baseline. I do hope it happens.

mgaron
Score: 4 | 3 years ago | 1 reply

I often wondered what kind of state machine implementation was used in these mission critical SW. I was pretty happy to learn that Dr. Samek's awesome QP framework was used along side traditional RTOS. This framework deeply changed my ways of programming embedded SW.
Also, thank you so much for giving us an insight of the SW used for the mars rover missions: it provides us with a light feeling of having been part of this big adventure.

Miro
Score: 2 | 3 years ago | 1 reply

Yes, I was also happy to hear that "Samek's hierarchical state machines were used". (So they are apparently on Mars now! Awesome!). But Steve didn't actually say that the whole QP framework was used in this mission and from my understanding this was actually NOT the case. But the system was clearly event-driven, which is sufficient to apply at least the state machine part...

ChrisP
Score: 0 | 3 years ago | no reply

Steve, in the past there have been papers published from JPL and NASA about software engineering techniques employed in various subsystems (e.g. in MRO?s radios where QP was, at least initially, used and verified using SPIN/Promela).

I would be interested in learning more about how your team designed, implemented and tested the software to achieve this level of reliability. Are there any resources from which we can learn more details about this mission?

MateusHercules
Score: 5 | 3 years ago | no reply

Very inspiring, thanks for the presentation Steve. The amount of redundancy and reliability needed in mission on this scale is crazy. It's also astonishing how much one can achieve on such a limited processor with a good architecture.

burak.seker
Score: 3 | 3 years ago | 1 reply

Perfect Presentation, thanks a lot for your time and your effort.
Can you give info about UnitTest Coverage in this magnificent project??

Steve SSpeaker
Score: 1 | 3 years ago | no reply

Thank you for the comment. We require 100% code coverage in unit testing. There are waivers to the 100% rule allowed in specific code cases where the coverage is not possible (e.g.: intentional spin-loops). We use gcov to measure, and report the coverage. It's not perfect. We can't easily measure code path coverage and rely more on test reviews to ensure the right tests exists. We can then use gcov to see what parts of the code have not been tested, then fill in those test gaps.

DavidKnight
Score: 4 | 3 years ago | 1 reply

Hey Steve, Thank you so much for this great presentation.
At one point in the Q&A I think you mentioned having a custom version of gzip for compression. I was wondering if this is the only compression algorithm used or does the rover use other algorithms like huffman coding or rice encoding?
I'm also wondering how the downlink works, does the rover use the CCSDS space packet protocol or some custom packetization protocol? If CCSDS is used do you still use the custom gzip compression for downlink or do you use the standard rice encoding that CCSDS recommends?

Steve SSpeaker
Score: 1 | 3 years ago | no reply

My initial response was originally focused on the engineering data and science data aspects of compression. The other types we use in this area are: lzo (data), jpg (image), icer (image), loco (image). All these have their own encoding algorithm methods. Some of these originated from JPL/NASA missions.
For downlink, I would say we use a tailored, but compliant version of the CCSDS space packet protocol. The data in the space packets are compressed using gzip or one of the others mentioned above. The CCSDS transfer frames are then streamed through additional telecom specific encoders for reliability, not really bandwidth management. This can be a Reed-Solomon encoding, but we more commonly use Turbo encoding methods

dcomer
Score: 2 | 3 years ago | no reply

Excellent presentation. Wonderful insight into how this national treasure was constructed, tested, successfully launched, landed. I learned a great deal about how embedded software development/software modeling is carried out by one of the nation's brightest! :Thank you so much for your time!!! 73, Dave Comer, NM5DC

Andrii
Score: 0 | 3 years ago | 1 reply

So cool system! Thanks for the presentation. But, by the way, why did you choose PowerPC750? Is this the best RAD processor on the USA space market now?

Steve SSpeaker
Score: 1 | 3 years ago | no reply

The Rad750 is a space qualified radiation hardened processor from the early 2000s. The qualification process is long and expensive. It was the best choice for this mission (Perseverance) given the reuse directive, and implementation timeline. There are newer versions and options which were not fully qualified in the early 2010 time frame for this mission.

Taki
Score: 4 | 3 years ago | no reply

Hi Steve, Thank you great presentation!
I'm embedded system engineer, and my career was started from simple cubesat. So I was glad to hear the Perseverance architecture. Perseverance is one of the largest embedded system. Probably you may be able to talk about more than one hour for each topic (e.g. customized processor, cruise software, fail-safe for radiation tolerance, tempature,...), but I couldn't hear even a part of them. Today I could hear. Thank you Steve and EOC2021 team for giving me this opportunity.

RaulPando
Score: 2 | 3 years ago | no reply

Steve, great presentation. The work that you are guys doing is simply awe-inspiring to human civilization.
Thank you

PhilKasiecki
Score: 3 | 3 years ago | no reply

Thank you, Steve - this was excellent and very informative. There was also plenty to think about, seeing the complexity of the system and even how the flight software had 1.2 million lines of flight code and over 50 percent more (1.9 million) lines of unit test code.

12:40:54	 From  Dave Nadler : Could you tell us a bit about how you tested the software?
12:41:10	 From  Keith J : Thanks for that Steve.  That was awesome to see how advanced things have become from a compute standpoint... I'm old enough to remember Apollo - although as a young kid.
12:41:18	 From  Tom.Davies : Awesome presentation
12:41:28	 From  Matjaž Finc : There is no room for error on such missions. How do you cope with the stress of "what if my code goes wrong" while developing and also during the mission? Which mission stage makes you the most nervous?
12:42:27	 From  Jeremy Schreiber : Awesome talk!  How large is the development team?  What type of development process (agile, waterfall, etc) do you follow to pull off a project of this size and complexity?
12:42:30	 From  Raul Pando : How much do you rely on Over The Air (Space) updates :)?
12:42:31	 From  Alex Burka : Can you comment any more on what went wrong with the first Ingenuity flights and what was the fix that "works 85% of the time"?
12:42:43	 From  Radu Pralea : C only? C++? All 100 tasks handled by a single core running at 200  MHz (<1% of computing power of a Raspberry Pi)?
12:42:47	 From  Matjaž Finc : Which QP kernel did you use? QKX?
12:42:48	 From  David : Was all imaging and other high data components passed over RS422 or 1553 or were there additional highers speed buses?
12:43:18	 From  Radu Pralea : *10%
12:43:18	 From  afwaanquadri : What framework did you use for state machines ?
12:44:13	 From  Jonnyvb : Following on from the stress of "what if my code goes wrong" question from Matjaz - what sort of processes do you go through when something does go wrong to stop that kind of issues happening again and to learn the lessons from it?
12:44:18	 From  David Potter : Can you talk about your top down architecture and associated documentation process?
12:44:47	 From  Dave Comer : my apologies in advance for the naïve question. I worked on the Galileo mission back in the 1980's How, or did, that mission help the current efforts on Mars?
12:46:12	 From  Radu Pralea : Do you use TDD? :)
12:46:36	 From  ken H : can you tell us how many person-hours went into software development? What percent was test/validation?
12:48:23	 From  Miro Samek : Very interesting that you mention the following practices used by NASA: event-driven architecture, threads structured as event-loops, blocking in one place only, NO blocking during message processing. These best practices are collectively known as the "Active Object" design pattern. Do you use this name ("Active Object") to quickly reference to your architecture?
12:48:23	 From  Davy Baker : If you could start over,  what would you do differently ?
12:49:35	 From  Alex : What was the biggest enabler (e.g. test bed/automated builds) for the firmware development?
12:49:51	 From  David Kanceruk : I imagine you use a build server. How long does it take to compile the code?
12:50:48	 From  afwaanquadri : Did you have any User-Interface to test specific modules of the software?
12:51:35	 From  Meenal Burrows : How big is the flight software team?
12:53:55	 From  Simon Voigt Nesbo : Was there anything that didn't work? That we wouldn't know just watching the news
12:54:06	 From  Gopinath : What caught my eye is how low the frequencies are in the system - processor running at 135 kHz, buses at 8 Hz, 64 Hz, etc. Is there a reason for this? EMI?
12:55:53	 From  Miro Samek : For anyone interested in the NASA software architecture used originally on the Pathfinder, which apparently is still very much influencing the current missions, there is a paper: "Managing Concurrency in Complex Embedded Systems" by Dr. David Cummings (you can google for it).
12:56:33	 From  Simon Voigt Nesbo : The slides said 132 MHz for the CPU, not 135 kHz
12:59:30	 From  Tim Michals : Are the checklists and design methodology open source or available?
13:00:24	 From  David Potter : What code analysis tools? 
13:01:30	 From  Gopinath : Correct, my bad. But even 132 MHz is low.
13:01:52	 From  Leopy : Is everything human-coded, or some functions on the boad computer are dealt with ML/AI?
13:05:00	 From  jvillasante : Fantastic! Are you hiring? :)
13:06:15	 From  David Potter : Are your software design rules available to the public?
13:09:56	 From  Dave Comer : Is there a SysML talk or material that the public can access?
13:09:56	 From  Michael Kirkhart : https://yurichev.com/mirrors/C/JPL_Coding_Standard_C.pdf
13:15:34	 From  Tom.Davies : What tools do you use to autogenerate the code?
13:16:10	 From  Radu Pralea : How do you deal with real-time stuff in the sw simulation environment (I guess the models of the "peripherals" could be slower than the actual hardware) so how do you test the actual software (which I assume would depend on real timings on the real system), in a fully simulated environment, Do you have some timing abstraction layer taking care of this?
13:16:25	 From  Tom.Davies : Now that Perseverance is on the surface, how long will you remain on the project before moving onto the next project?
13:17:26	 From  Kurtovic, Tarik (1.59) : On-target unit testing is mandatory in some industries. Do you (need to) do on-target unit testing?
13:18:10	 From  Dave Nadler : What caused the resets during transit to Mars?
13:19:21	 From  Dave Comer : Alpha particle....This was a key concern in testing SRAMs, EPROMs, FLASH, EEPEOM....
13:23:54	 From  Jay : Can you talk about telemtry and logging? What do you log, how large memory footprint, how do you encode the logs? How do you ensure that you can use these to diagnose unexpected events?
13:30:24	 From  Andrei : Might be a silly question (and I may have missed it).. Are the coms being encrypted with publicly available algorithms?
13:34:30	 From  Dave Nadler : I need a desktop pyro simulation...
13:34:39	 From  Meenal Burrows : :-)
13:35:09	 From  afwaanquadri : This was great! Thanks for your presentation!
13:35:10	 From  Keith J : Thank you very much Steve!  Fascinating stuff. Very much appreciated you taking the time.
13:35:37	 From  Dave Nadler : Thanks Steve - Awesome work and awesome presentation!
13:36:43	 From  Simon Voigt Nesbo : Yeah thanks for the great presentation. And thanks for answering my question :)
13:36:54	 From  Stefan Petersen : Thanks Steve! Great hearing about your work and set ups, great presentation and QnA.
13:37:07	 From  Meenal Burrows : Brilliant keynote and awesome Q&A. Thanks Steve for your time with us all.
13:37:52	 From  Rob Meades : That was excellent, many, many, thanks.
13:38:03	 From  Jay : Great presentation and discussion!
13:38:04	 From  Gopinath : Excellent presentation, Steve. Thank you very much.
13:38:06	 From  mdohring : Wonderful presentation!  Thank you very much!
13:38:08	 From  Yuriy Kozhynov : Thanks a lot!!!
13:38:10	 From  Juan : thank you!
13:38:11	 From  Erwin : Awesome Job you do and a great talk! Fantastic to get this insights on how you work!
13:38:11	 From  Dan Rittersdorf : Great talk and QA, Steve!   Thank you so much.
13:38:17	 From  Tom.Davies : Thank you Steve
13:38:23	 From  David Pastl : Great presentation, thank you Steve!
13:38:25	 From  Jose E. : Thanks for the presentation!
13:38:26	 From  Doug Peters : Great talk!!
13:38:29	 From  Eric : Thank you for a great presentation!
13:38:44	 From  James G : Awesome! Extremely interesting and enlightening, Steve. Thank you!
13:38:45	 From  Andrey Shevelov : Great presentation! Thanks a lot!
13:38:46	 From  PeteMehn : Well done.  Thanks for sharing!
13:38:54	 From  Michael Kafarowski : Thank you!
13:39:00	 From  Sam : thank you.
13:39:06	 From  Leandro Pérez : Thanks
13:39:27	 From  Christopher Long : Thank you

OUR SPONSORS