PyIPND: Fast Prototyping on the Interplanetary Internet Using Python
BLUF: Bottom Line Up Front
A few years ago, NASA realized that for future missions, they will need a standardized way to detect neighboring nodes in resource-constrained computer networks used for space exploration.
At the time, the techniques for neighbor discovery in a resource-constrained network had recently been specified in an Internet Draft, and their aim was to build a working implementation, but not from scratch.
NASA uses an Open Source Software (OSS) software distribution called Interplanetary Overlay Network, or ION-DTN for short, to communicate with network nodes in space and their plan was to extend ION-DTN to add support for neighbor discovery.
In order to achieve their goal quickly, they collaborated with industry and academia to commission a project where Internet strangers were asked to add neighbor discovery to ION-DTN which turned out to be successful.
The rest of this article describes that project in detail.
Part I: A Look at Signal Propagation in Space
Sometimes the Speed of Light is not Enough
The speed of light is plenty adequate for Internet communication here on Earth, but it falls short when exploring other planets.
On Earth, communication signals that travel close to the speed of light are perceived instantly by our senses. This is why it is possible for anyone, anywhere on Earth to communicate over the Internet in real-time, due to the speed of light being so fast that such signals can round-trip the Earth in just a fraction of a second.
But in space, the same is not true.
Interplanetary Internet
There is a huge disparity between Earth-bound and space-bound communications. There are two reasons for this:
- the rotation of planets, differences in planetary orbits, and constraints in transmission power of spacecraft can interrupt the propagation of signals through space. This manifests as a disruption in connectivity;
- the distances between planets that a propagated signal must traverse is so vast that it adds up significantly. This manifests as a time delay in communication.
The delays, which are long and variable, can range from 1.26 seconds for an Earth-to-Moon signal, to tens of minutes (Earth-to-Mars), to hours (Pluto-to-Earth). The delays severely limit wholesale use of the Internet as a conduit for space communications.
To get around these limitations, computer networks designed for space applications use what is called the Interplanetary Internet.
A Network Architecture that can Tolerate Delays
Today, exploration of our solar system via the Interplanetary Internet in the face of the two issues identified above, i.e.:
- connectivity disruptions and;
- communication delays;
is made possible by the development of a network architecture that includes a set of new protocols and techniques that are tolerant to disruptions and delays.
The network architecture is called Disruption/Delay-Tolerant Networking, or DTN, for short, and it uses a store and forward mechanism for routing. DTN is an Internet standard specified in RFC 4838 by the IETF.
There are a number software distributions, many of which are OSS, that implement some or all aspects of the DTN standard. DTN2, ION-DTN, IBR and µPCN are examples of DTN implementations but DTN2 and ION-DTN are the most mature.
DTN2: The Reference Implementation for DTNs
DTN2 is the reference implementation maintained at UC Berkeley that implements the DTN architecture as described in RFC 4838. It is written in C++ and was designed as a research vehicle and is thus available as an OSS software distribution, making it widely used and well supported for terrestrial applications.
The computers used in space flight applications are radiation-hardened and extremely resource-constrained compared to computers used in terrestrial applications. This, combined with the low-bandwidth nature of wireless links used in space communications make DTN2 unsuitable for the needs of space missions.
The constraints imposed by space flight motivated the development of an alternative implementation that can interoperate with DTN2.
ION-DTN: An Alternate Implementation for DTNs
The NASA Jet Propulsion Laboratory (JPL) is one of ten (10) field centers owned by NASA. JPL is a Federally Funded Research and Development Center (FFRDC) managed for NASA by Caltech.
The lab specializes in the construction and operation of planetary robotic spacecraft and it was there an alternative implementation better suited for space flight was developed.
JPL’s alternative implementation is called ION-DTN or simply ION, which means Interplanetary Overlay Network. ION-DTN is a collection of command-line utilities, many of which interoperate with DTN2.
Like DTN2, ION-DTN is also an OSS software distribution but because it needs to work on spacecraft hardware, it is written in C and not C++ like DTN2.
A Product of Collaboration
The Interplanetary Internet is a product of collaboration but this wasn’t always the case. According to the Wikipedia entry on Interplanetary Internet:
“Space communication technology has steadily evolved from expensive, one-of-a-kind point-to-point architectures, to the re-use of technology on successive missions, to the development of standard protocols agreed upon by space agencies of many countries.”
CCSDS: Consultative Committee for Space Data Systems
Today, the development of standard protocols in space-bound communications is done through a multinational forum made up of major space agencies of the world. That multinational forum was formed in 1982 and is called the Consultative Committee for Space Data Systems (CCSDS).
The CCSDS is currently made up of 11 Member Agencies including NASA and their stated goals are to:
- enhance collaboration between government, quasi-government and commercial interests, and;
- reduce risk, development time & project costs.
More Collaboration to Further Space Research
In addition to collaborating with multinational space agencies through CCSDS, NASA also collaborates with industry and academia to develop techniques and protocols that will make end-to-end delivery of information between DTN nodes more reliable.
These collaborations allow DTNs to be useful not just in space but also on Earth.
A few notable instances of NASA collaborating with industry and academia are presented next.
CoECI: Crowdsourcing to Obtain Novel Ideas
The Center of Excellence for Collaborative Innovation (CoECI) was created in 2011 by NASA after a highly successful pilot program in 2009 that evaluated the use of crowdsourcing as a means of obtaining fast solutions to tough, mission-critical problems.
Today, all engagements under CoECI that use crowdsourcing are managed under the umbrella of NASA Tournament Lab (NTL), also created in 2011.
NTL: A Joint Effort with Harvard University
NTL is a joint effort between NASA and Harvard University as is evident from the excerpt below taken from this article:
“NASA’s National Tournament Lab was established in 2011 in a joint effort with Harvard University. The lab was born out of a 2009 TopCoder challenge that developed algorithms designed for medical kits that would one day be used in long-term human space missions.”
NASA would go on to commission many more projects that were managed by Topcoder that used crowdsourcing to obtain solutions.
Topcoder: A Quick Primer on Crowdsourcing
Topcoder offers software development & delivery services to enterprise, government and mid-size clients. The company was founded in 2001 by Jack Hughes and they are essentially a consulting services business, which means they use a mix of full-time employees and independent contractors to deliver software projects.
But their business model differs from traditional consulting services in one aspect. When there is a need to scale up the use of independent contractors for a project, rather than parcel out work directly, they use competitions.
Topcoder relies on their proprietary software platform to organize tournaments where any individual with an Internet connection can participate as an independent contractor in the design and development of complex software projects.
This twist of using tournaments to complete complex software projects attracts a lot of interest from top technical talent looking to earn tournament-style prizes in exchange for working on time-limited tasks very quickly. As a result, the members that make up Topcoder’s community are pretty diverse, in terms of professional qualifications and skill sets.
Topcoder calls this unique aspect of their business crowdsourcing.
Part II: The Challenge
To achieve the goal of extending the ION-DTN software distribution to support neighbor discovery, NASA approached Topcoder to help manage & deliver the software project using crowdsourcing.
Project Brief: DTN Neighbor Discovery
To kick off the project, NASA provided the following problem statement to Topcoder:
The goal of this project is very simple – and that is to update ION-DTN so that it includes IP Neighbor Discovery (IPND).
IPND is a method for otherwise oblivious nodes to learn of the existence, availability, and addresses of other DTN participants. IPND both sends and listens for small IP UDP announcement beacons that are addressed to an IP unicast, multicast, or broadcast address to discover specified remote neighbors, or unspecified local neighbors in the network topology.
Other DTN implementations, such as DTN2 and IBR, already support IPND. We now need to update ION-DTN to enable neighbor discovery capabilities as well. Neighbor discovery is also needed to enable future dynamic routing capability for the ION-DTN implementation.
Project Team
Just as they had done in previous DTN projects, in addition to the project brief, NASA provided Topcoder with access to internal Subject Matter Experts (SMEs) so they could contribute expert guidance as needed.
NASA provided access to two SMEs:
- Scott Burleigh, Principal Engineer, NASA JPL, California Institute of Technology;
- Adam Schlesinger, Communication Systems Engineer, Wireless and Communication System Branch, NASA Johnson Space Center (JSC).
As the technical lead on the project, I already had a bit of experience working with Scott Burleigh and NASA SMEs in general, as I had worked on a few NASA sponsored DTN projects in the past.
Topcoder Competition Playbook
As is standard practice at Topcoder, the project brief of a sponsored challenge is first scoped by breaking it up into manageable chunks of work. The chunks of work are essentially sets of logically related tasks that have been scoped appropriately. The scope of a set of tasks must be small enough such that they can be completed in a competition with a 5- to 7-day span.
Initial Strategy: Playing it Safe
Initially, I followed Topcoder’s standard playbook but hit a snag fairly quickly when the second and third competitions both ended without any submissions—they were failures.
In addition to interfacing with the client (in this case NASA) as the project’s tech lead, my responsibilities also included liaising with the developer community, so I contacted several of the developers that participated in the two failed competitions and overwhelmingly, the feedback was that the project was too complex / large.
The failures forced a strategy reassessment which led to the following findings:
- many developers are unfamiliar with the concepts of a DTN or a DTN distribution like ION-DTN;
- although ION-DTN is an OSS project, even experienced developers found the C code base to be pretty daunting at 200k LoC;
- the IPND spec makes for terse reading for anyone who is new to DTN.
Another important realization was that the bulk of competitions organized by Topcoder use programming languages like JavaScript, TypeScript, Java, Go, C#, Objective-C, Swift & Python—all Object-Oriented Programming (OOP) languages—and sometimes a dash of C++ but almost no C.
This meant that developers that use C are not as plentiful on the Topcoder platform, relative to developers that use mainstream OOP languages.
Armed with this information, I had to come up with a different strategy.
Revamped Strategy: Fast Prototyping
I read the IPND specification carefully and did multiple re-readings to take notes. Looking at my notes, I realized that it was possible to build a working implementation that covers salient aspects of the IPND spec in two passes.
To move quickly, rather than use C for the prototype, I theorized that we could make do with a scripting language like Python.
So I penned following to the NASA team as justification for the new strategy:
We're going to build a working implementation in Python that covers most of the IPND spec over two [development competitions]. Most not all features will be covered, because certain IPND aspects depend on features in ION which would not be viable in a Python only implementation. A successful Python implementation will mean there will not be any need for an architecture [competition].
With a partial IPND implementation, we will gradually integrate IPND into ION using [development competitions]. In a way, the [development] work will be reduced to asking [developers] to figure out the integration points and doing a straightforward port from Python to C.
Scott Burleigh accepted the proposal and remarked: “it's a very interesting strategy.” He added: “As someone pointed out, the IPND Internet Draft is so detailed that it might suffice as a specification in itself, in which case coding directly from that Draft might well be doable (and more fun than writing requirements). So I say sure, let's give it a try.”
Once we switched to a fast prototyping approach, developer participation increased significantly in the remaining competitions, allowing the project to move along smoothly, until the project ended successfully.
Part III: The Proof of Concept
Below are my implementation notes taken from multiple re-readings of the IPND Internet Draft.
The notes are made up of mostly Python-like pseudo-code, spread across seven (7) heavily commented files. The notes were provided to the developers that competed to build the IPND prototype in Python which we called PyIPND.
IPAdmin.py
:
// Will be used to manage IPND.py instances and IPNDService.py instances.
'''
Expected usage:
ipnadmin.py add=service name=cla-tcp-v4 addr=<cla-tcp-address> port=<cla-tcp-port>
This will add a cla-tcp-v4 service entry into each IPND beacon sent by the given agent.
The given CLA address and port will match those of an active CLA in the ION daemon but note that IPND does not check the validity of
the given address and port number for the CLA.
'''
2. IPNDService.py
:
// Will be used to create and update Service definitions for standard ("Constructed") and custom ("Private") services.
3. IPND-service.conf
:
// IPService configuration file.
4. IPND-node.conf
:
// This file should define and set the following properties.
// [IPND Internet Draft: "2.1. Beacon Period", page 5]
beacon_interval_broadcast = 5 # interval in seconds
beacon_interval_multicast = 10
beacon_interval_unicast = 15
// [IPND Internet Draft: "2.2. Unknown Neighbors", page 6]
multicast_discovery_address = 224.0.0.1 # IPv6: ff02::1
multicast_discovery_ttl = 2
// [IPND Internet Draft: "2.3. Enumerated Neighbors", pg 6]
enumerated_neighbors = 10.0.0.2, 10.0.0.3, 10.0.0.4, 192.168.5.2, 192.168.5.3
// [IPND Internet Draft: "2.6.3. Services", Figure 5, pg 13]
advertised_services = CLA-TCP-v4, CLA-UDP-v4, CLA-TCP-v6, CLA-UDP-v6, CLA-TCP-HN, CLA-UDP-HN, NBF-Hashes, NBF-Bits
// [IPND Internet Draft: "2.6.4. Neighborhood Bloom Filter", pages 14 - 15]
bloom_filter_hash_algorithms = md5, Murmur
5. IPND.py
:
// Load configured values from ipnd-node.conf
beaconIntervalBroadcast = 5
beaconIntervalMulticast = 10
beaconIntervalUnicast = 15
enum Address.Type addressType = {BROADCAST, MULTICAST, UNICAST}
beaconIntervals = {} // Python dict.
[beaconIntervals[type] = interval for type in addressType for interval in {beaconIntervalBroadcast, beaconIntervalMulticast, beaconIntervalUnicast}]
enumeratedNeighbors = set(ipnd-node.conf#enumerated_neighbors) // IP strings converted to Address objects.
destinations = {new Address('255.255.255.255'), new Address(ipnd-node.conf#multicast_discovery_address)} // Broadcast and multicast discovery addresses.
destinations = destinations | enumeratedNeighbors // Merge the two Address sets.
// print(destinations)
// >>> {Address('BROADCAST', '255.255.255.255'), Address('MULTICAST', '224.0.0.1'), Address('UNICAST', '10.0.0.2'), Address('UNICAST', '10.0.0.3'),
// >>> Address('UNICAST', '10.0.0.4'), Address('UNICAST', '192.168.5.2'), Address('UNICAST', '192.168.5.2')}
node = new Node(EID, port, destinations)
6. Node.py
:
sentBeacons = {} // Python dict.
now() = datetime.datetime.now() // Short-cut for referring to current time.
main() // Start worker threads.
def sendBeaconWorker():
'''
Send out beacons on node.port at periodic intervals.
'''
while(True)
// In future, beacon intervals may be updated through IPNDAdmin, so re-compute it on each thread run.
beacon_interval = max(1, min(beaconIntervals)) # use the minimum beacon interval.
sleep(beacon_interval)
beacon = new Beacon()
// add ipnd-node.conf#advertised_services to beacon
// ...
beacon.nbf = Helper.computeNBF(neighbors)
for address in destinations:
Beacon sentBeacon = sentBeacons.get(address, new Beacon())
if (now() >= sentBeacon.sendTime + beaconIntervals[address.type]):
diff = beacon - sentBeacon // diff = Helper.diff(beacon1, beacon2)
if (diff != 0):
// advertised services (e.g. 1-hop neighborhood) information has changed.
sendBeacon(...)
else:
// advertised services (e.g. 1-hop neighborhood) information is unchanged.
if (!hasActiveDataConnection(address)):
// ["2.4. Allowing Data to Substitute for Beacons", page 6]
sendBeacon(...)
else:
// suppress beacon messages to this destination.
else:
// move to the next destination address
def sendBeacon(beacon, address, sentBeacon = None):
beacon.EID = node.EID
beacon.sequenceNumber = sentBeacon.sequenceNumber + 1 if sentBeacon != None else 1
beacon.sendTime = now()
beacon.period = beaconIntervals[address.type]
bytes = Helper.encode(beacon)
try:
socket.send(bytes, ...)
except Exception as ex:
print "Beacon sending to {0} failed! {1}".format(address, ex)
else:
print "Beacon sending to {0} succeeded.".format(address)
finally:
sentBeacons[address] = beacon
def hasActiveDataConnection(address):
// Only UNICAST addresses can establish an active "data" connection.
// Another Bool should be used in conjunction with this method to write tests for Test 1.0.
return False if !address.isUnicast else True
def receiveBeaconWorker():
'''
Bind on all addresses (on all network interfaces) but not on the loopback address.
Refer to https://wiki.python.org/moin/UdpCommunication.
'''
while(True)
data, addr = sock.recvfrom(...)
sender = neighbors.get(address, new Neighbor())
// >>> type(sender)
// <class 'Neighbor'>
sender.address = new Address(addr)
sender.link = new Link()
sender.beacon = new Beacon()
sender.beacon.receiveTime = now()
sender.beacon.nbf = None
_beacon = Helper.decode(data)
if (_beacon != None):
// ["2.6. Beacon Message Format", page 7]: "An IPND node MUST use UDP checksums to ensure correctness."
// Verify the UDP checksum.
// ...
// Assign beacon attributes from "_beacon" to "beacon".
// ...
sender.beacon.nbf = _beacon.nbf
if (beacon.nbf != None && beacon.nbf.contains(this.node.EID)):
// ["2.5. Discovering Bidirectional Links", paragraph 2, page 7]
sender.link.isBidirectional = True
else:
// ["2.4. Allowing Data to Substitute for Beacons", paragraph 2, pages 6 - 7]
if (address.isUnicast):
// We might be dealing with a data packet from a unicast address.
sender.beacon.nbf = new NBF()
sender.beacon.nbf.add(this.node.EID)
sender.beacon.period = beaconIntervals[address.type]
// Fill in other attributes like beacon.EID, beacon.sequenceNumber etc.
// ...
else if (address.isBroadcast || address.isMulticast):
oldSender = neighbors[address]
if (oldSender != None):
sender.link.isBidirectional = oldSender.link.isBidirectional
sender.beacon.EID = oldSender.beacon.EID
sender.beacon.sequenceNumber = oldSender.beacon.sequenceNumber + 1
sender.beacon.period = oldSender.beacon.period
sender.beacon.services = oldSender.beacon.services
else:
// No services are associated with the sender.
// ["2.1. Beacon Period", paragraph 3, pages 5 - 6]
// Naive algorithm to determine sender link state. Feel free to suggest improvements.
sender.link.isUp = now() < sender.beacon.receiveTime + beaconIntervals[address.type]
neighbors[address] = sender // Update the neighbor's state.
destinations.append(address) // This should result in a no-op for an existing Address.
linkEvent = new LinkEvent()
linkEvent.state = LinkEvent.UP if sender.link.isUp else LinkEvent.DOWN
if (sender.link.isUp):
oldSender = neighbors[address]
if (oldSender == None):
// Announce the presence of this a newly discovered neighbor.
// ["2.7. IPND and CLAs", paragraph 2, page 16]
notifyCLA(sender, linkEvent)
else:
// Don't bombard dependent CLAs with notifications; this could lead to needless connection attempts by the CLA.
else:
// Notify dependent CLAs that this neighbor's link has gone down.
// ["2.7. IPND and CLAs", paragraph 3, page 16]
notifyCLA(sender, linkEvent)
pass
def notifyCLA(neighbor, linkEvent):
pass
def expireBeaconWorker():
'''
Purge the neighbors set of node whose beacon intervals have expired.
See ["2.8. Disconnection", paragraph 1, page 16]
'''
while (True)
beacon_interval = max(30, max(beaconIntervals)) # maxmimum beacon interval.
sleep(beacon_interval)
for n in neighbors:
if (now() > n.beacon.receiveTime + n.beacon.period):
linkEvent = new LinkEvent(DOWN)
notifyCLA(n, linkEvent)
node.removeNeighbor(n)
if (n.address.isUnicast) node.removeDestination(n.address)
def main():
// Start the 3 worker threads.
beaconSentWorker = new BeaconSentWorker(this.node, target=sendBeaconWorker).start()
beaconReceivedWorker = new BeaconReceivedWorker(this.node, target=receiveBeaconWorker).start()
beaconExpiredWorker = new BeaconExpiredWorker(this.node, target=expireBeaconWorker).start()
7. Helper.py
:
// Provides helper methods for use by other classes.
// This class should leverage the provided SNDV.py and TLV.py files.
def encode(beacon):
'''
To encode a beacon's required fields in bytes, see: ["2.6. Beacon Message Format", pages 7 - 14]
'''
bytes = bytearray()
// ...
// ["2.6.4. Neighborhood Bloom Filter", pages 14 - 15]
if (!address.isUnicast):
// Do not include NBF info for unicast destinations.
diff = beacon.nbf - sentBeacon.nbf // diff = Helper.diff(nbf1, nbf2)
if (diff != 0):
// "An NBF need not be included within every beacon, but one SHOULD be present within at least one
// broadcast or multicast beacon following a change in the 1-hop neighborhood of the node."
bytes += Helper.encodeNBF(beacon.nbf)
else:
// No need to send an unchanged NBF -- save bandwidth.
return bytes
def decode(data):
'''
Best effort parsing of a block of bytes into a Beacon object otherwise return None.
To decode, see ["2.6. Beacon Message Format", pages 7 - 14]
'''
return new Beacon(data) if isBeacon(data) else None
def computeNBF(neighbors):
'''
The import below is for illustration.
If experience trouble with your implementation of this method, please bring it up on the forum.
'''
from bloomfilter import BloomFilter
nbf = new BloomFilter()
nbf.add([n for n in neighbors if n.link.isUp])
return nbf