NRP simulation with SpiNNaker in closed-loop fails


#1

Hi everyone,

I’ve just realized a new proper installation of the NRP locally on my machine to test the capability of running a brain models on a SpiNN-5 board hosted in our lab.

I’ve started with trying out different PyNN8 examples with the board from my machine and it works well. Afterwards, I’ve reinstalled the NRP from bitbucket and run a few basic experiments, like the Husky robot, and it works also correctly!

However, when I try to run the experiment Holodeck Husky Braitenberg experiment with SpiNNaker, I get some errors:

  • First, the file braitenberg_husky_linear_twist_spinnaker.py referred in the .bibi file in the experiment folder does not exists in the bitbucket repository, so just cloned and renamed the braitenberg_husky_linear_twist.py as I assume the syntax of the transfer function should remain the same
  • When that is solved, the experiment can load but if I try to start it, the closed-loop engine stops with this error message:

2017-12-19 18:37:41 INFO: Simulating for 1 1000.000000ms timesteps using a hardware timestep of 1000000us
2017-12-19 18:37:41 ERROR: Error in CLE (General Error): signal only works in main thread
Traceback (most recent call last):
  File "/home/gabs48/src/hbp/NRP/CLE/hbp_nrp_cle/hbp_nrp_cle/cle/ClosedLoopEngine.py", line 257, in __loop
    self.run_step(self.timestep)
  File "/home/gabs48/src/hbp/NRP/CLE/hbp_nrp_cle/hbp_nrp_cle/cle/ClosedLoopEngine.py", line 198, in run_step
    self.bca.run_step(timestep * 1000.0)
  File "/home/gabs48/src/hbp/NRP/CLE/hbp_nrp_cle/hbp_nrp_cle/brainsim/pynn/PyNNControlAdapter.py", line 188, in run_step
    self.__sim.run(dt)
  File "/usr/local/lib/python2.7/dist-packages/spynnaker8/__init__.py", line 515, in run
    return __pynn_run(simtime, callbacks=callbacks)
  File "/usr/local/lib/python2.7/dist-packages/pyNN/common/control.py", line 110, in run
    return run_until(simulator.state.t + simtime, callbacks)
  File "/usr/local/lib/python2.7/dist-packages/pyNN/common/control.py", line 92, in run_until
    simulator.state.run_until(time_point)
  File "/usr/local/lib/python2.7/dist-packages/spynnaker8/spinnaker.py", line 128, in run_until
    self._run(tstop - self.t)
  File "/usr/local/lib/python2.7/dist-packages/spynnaker8/spinnaker.py", line 175, in _run
    AbstractSpiNNakerCommon.run(self, duration_ms)
  File "/usr/local/lib/python2.7/dist-packages/spynnaker/pyNN/abstract_spinnaker_common.py", line 312, in run
    AbstractSpinnakerBase._run(self, run_time)
  File "/usr/local/lib/python2.7/dist-packages/spinn_front_end_common/interface/abstract_spinnaker_base.py", line 781, in _run
    signal.signal(signal.SIGINT, self.signal_handler)
ValueError: signal only works in main thread
2017-12-19 18:37:41 INFO: Initiating transition from state started to state halted...
2017-12-19 18:37:41 INFO: Exited state started
2017-12-19 18:37:41 INFO: Entered state halted
[ INFO] [1513705061.242735566, 1.001000000]: Physics dynamic reconfigure ready.
2017-12-19 18:37:46 ERROR: exception calling callback for <Future at 0x7f071b245790 state=finished raised ValueError>
Traceback (most recent call last):
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/concurrent/futures/_base.py", line 331, in _invoke_callbacks
    callback(self)
  File "/home/gabs48/src/hbp/NRP/ExDBackend/hbp_nrp_cleserver/hbp_nrp_cleserver/server/SimulationServerLifecycle.py", line 120, in __handle_crash
    severity=CLEError.SEVERITY_CRITICAL)
  File "/home/gabs48/src/hbp/NRP/ExDBackend/hbp_nrp_cleserver/hbp_nrp_cleserver/server/ROSCLEServer.py", line 147, in publish_error
    self.lifecycle.failed()
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/extensions/locking.py", line 22, in trigger
    super(LockedEvent, self).trigger(*args, **kwargs)
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/core.py", line 222, in trigger
    return self.machine.process(f)
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/extensions/locking.py", line 15, in __call__
    return self.func(*args, **kwargs)
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/core.py", line 526, in process
    return trigger()
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/core.py", line 247, in _trigger
    if t.execute(event):
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/core.py", line 151, in execute
    machine.callback(func, event_data)
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/extensions/locking.py", line 15, in __call__
    return self.func(*args, **kwargs)
  File "/home/gabs48/.opt/platform_venv/lib/python2.7/site-packages/transitions/core.py", line 516, in callback
    func(event_data)
  File "/home/gabs48/src/hbp/NRP/ExDBackend/hbp_nrp_cleserver/hbp_nrp_cleserver/server/SimulationServerLifecycle.py", line 141, in fail
    self.__cle.stop(forced=True)
  File "/home/gabs48/src/hbp/NRP/CLE/hbp_nrp_cle/hbp_nrp_cle/cle/ClosedLoopEngine.py", line 287, in stop
    raise Exception("The simulation loop could not be completed")
Exception: The simulation loop could not be completed
2017-12-19 18:37:49 GMT+0100 [REQUEST from ::ffff:127.0.0.1] GET /experiments

Does anyone working on the SpiNNaker integration have any idea how to correct this?

Best!
Gabriel


#2

Hi Gabriel,

I don’t have a solution, but since I’ll be leaving the project tomorrow I’ll leave a comment :smile: . This is a common issue in multi-threaded Python applications using signal where a library expects to be the running the main “application” and not run in a separate thread.

In this case, the PyNN/SpiNNaker side expects to be run as part of a standalone application, but the NRP is interacting with it on a dedicated simulation Python thread. The signal library and any calls to it will throw an Exception in this case, it can only be used on the main Python application thread by design. From the stacktrace I’m assuming SpiNNaker is using an interrupt signal to tell the underlying process/interface to pause after the set simulation time - which if true would be a core interaction.

The “right” solution would probably be to spawn the simulation thread as a process instead (using multiprocessing library instead of thread), but that will introduce a lot of other potential errors especially with ROS if I remember correctly. I’m not sure there is a quick workaround or solution that you would be able to try yourself.

I believe there is going to be work moving forward on running SpiNNaker in an asynchronous mode without halting, but this may be an issue in that case as well. @georg.hinkel would know best what the plan is moving forward.

Kenny


#3

Hi Gabriel,

sorry, we are currently very short on documentation for running spinnaker.

Currently, as Kenny said, the NRP basically requires to patch that signal call away. I am adding a new story in our issue log let the NRP do this sort of patching itself. For the moment, what you need to do is to comment out line 781 “/usr/local/lib/python2.7/dist-packages/spinn_front_end_common/interface/abstract_spinnaker_base.py”.

Best,

Georg


#4

Hi,

Your solution worked like a charm @georg.hinkel thanks you! Is it planned to add the SpiNNaker repos in the NRP installation or to patch the user install directly?

Thanks a lot for the explanations @kennysharma, it makes sense! I wish you the best in your new job and hope to see you again later (maybe on a bridge in a touristic city when I’m driving around on the boat, who knows :wink: ) !


#5

Hi Gabriel,

the plan is to override the function to register signal handlers with a function that simply memorizes the signal handlers and when a signal occurs, we simply look up the registered handlers ourselves. That way, the patch would no longer be necessary and it’d make it possible to use libraries that add signal handlers in a TF. Thus, hopefully this patch will soon be obsolete. For now, just patch the user install directly, there are no NRP patches of the spinnaker repos.

Best,

Georg