Core dumped when pressing play


#1

Hi there,

after updating my local install (that was working fine) with

./user_scripts rebase all
./user_scripts build all
./nrp_configure

I have a problem with starting experiments.

It launches fine but after pressing the play button the backend crashes, I get:

2017-09-07 14:27:25,688 [uWSGIWorker1] [hbp_nrp_back] [INFO]  starting State Machines...
2017-09-07 14:27:25 INFO: Initiating transition from state paused to state started...
2017-09-07 14:27:25 INFO: Executing callback 'start' before transition.
2017-09-07 14:27:25 INFO: Exited state paused
2017-09-07 14:27:25 INFO: Entered state started
2017-09-07 14:27:25 INFO: Executed callback '<transitions.extensions.locking.LockedMethod instance at 0x7fb5e837f050>' after transition.
python: ../nestkernel/scheduler.h:774: librandom::RngPtr nest::Scheduler::get_rng(nest::thread) const: Assertion `thrd < static_cast< thread >( rng_.size() )' failed.
python: ../nestkernel/scheduler.h:774: librandom::RngPtr nest::Scheduler::get_rng(nest::thread) const: Assertion `thrd < static_cast< thread >( rng_.size() )' failed.
python: ../nestkernel/scheduler.h:774: librandom::RngPtr nest::Scheduler::get_rng(nest::thread) const: Assertion `thrd < static_cast< thread >( rng_.size() )' failed.
python: ../nestkernel/scheduler.h:774: librandom::RngPtr nest::Scheduler::get_rng(nest::thread) const: Assertion `thrd < static_cast< thread >( rng_.size() )' failed.
bash: line 6: 11186 Aborted                 (core dumped) python $HBP/ExDBackend/hbp_nrp_cleserver/hbp_nrp_cleserver/server/ROSCLESimulationFactory.py
[ INFO] [1504787246.544857988]: Handling Request: /
[ INFO] [1504787246.546497088]: Handling Request: /

I have reinstalled the NRP and gazebo according to the bitbucket guide, but the same problem persist,

any ideas?

Thanks,

Alexander


#2

Hi Alexander,

Two things need to be properly updates for this issue not to appear:

  1. ExDBackend needs to be updated (and you need to launch in a new terminal)
  2. nest-simulator needs to be updated and re-build

If you did rebase and build all, it should have done those two steps, but I seem to remember this happening and the cause being nest not properly being rebuilt on some installs. Just to be sure I’d recommend deleting and re-cloning nest-simulator:

rm -rf $HBP/nest-simulator
cd $HBP/user-scripts
./clone-all-repos
./update-nrp build all

That should fix the issue, but if it doesn’t let me know.

Kenny


#3

Hi Kenny,

It actually gives the same error…

Alexander


#4

Are you launching a standard experiment or something custom?


#5

the standard Husky Braitenberg experiment


#6

Ok, a couple of things to test then, can you report your repo versions for:

nest-simulator
CLE
ExDBackend

just using git log -1 and pasting the latest revision. If nest is cleanly rebuilt, it’s either the CLE or ExDBackend repos that are not in sync.

Also, in the terminal you are running, can you see if

echo $OMP_NUM_THREADS

print anything?


#7

Hi

echo $OMP_NUM_THREADS does not return anything!

here are the logs :

alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP$ cd CLE
    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/CLE$ git log -1
    commit 1d1ee6c88ffed29a7aa9cc58b3fa4f1afb88a2e0
    Merge: 46edb25 92c30dc
    Author: Emmanouil Angelidis <angelidis@fortiss.org>
    Date:   Fri Sep 1 13:03:53 2017 +0200
        Merge "[NRRPLT-5384] Support multiple monitored populations"
    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/CLE$ cd ../ExDBackend/
    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/ExDBackend$ git log -1
    commit 803049ab5440a400ba1c166f2b687739ee147d07
    Merge: 37b1084 893bb42
    Author: Axel von Arnim <axel.vonarnim@fortiss.org>
    Date:   Wed Aug 23 18:03:57 2017 +0200
        Merge "[NRRPLT-5453] Duplicate Population Name Check in the backend"
    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/ExDBackend$ cd ../nest-simulator/
    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/nest-simulator$ git log -1
    commit 907cf6f955d931eb3632d40f9031c1a32ee53ae9
    Author: Kenny Sharma <kenny.sharma@tum.de>
    Date:   Thu Aug 3 17:14:39 2017 +0200
        [NRRPLT-4853] Enable multi-threading when used in the NRP.
        This patch provides a basic solution for:
        https://github.com/nest/nest-simulator/issues/725
        which originally prevented using multiple threads for Nest simulations
        when various methods were invoked by different threads in the NRP.
        It always forces the number of OpenMP threads before any relevant
        '#pragma omp' statements that invoke paralellism, which should ensure
        the proper behavior even though it is not the cleanest solution. The
        performance overhead of this is negligible.
        This patch will not be accepted upstream as they are working on a
        more fundamental solution (see Github issue above).
        Change-Id: Id528e6ce9aed5cee9d16df5eb5a76ef994a8066b

#8

That all looks ok and if nest-simulator repo built correctly then it could be that there is a conflicting version of Nest 2.10.0 somewhere that is being imported with a higher priority than the one that should have been installed by the build, or it was installed in the wrong place.

What’s the output of:

ldconfig -p | grep -i nest which nest

Also if you can cle-kill everything, open new terminals and re-run - can you attach the full output of the backend log (cle-start) terminal here?

Thanks,
Kenny


#9

results in

/home/alexander/.local/bin/nest

and cle-start prints :

    alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/nest-simulator$ cle-start
[1] 13728
... logging to /home/alexander/.ros/log/c64a7ff4-9470-11e7-aca1-672ed88ddcd4/roslaunch-alexander-HP-EliteBook-8760w-13728.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://127.0.0.1:38458/
ros_comm version 1.12.7


SUMMARY
========

PARAMETERS
 * /rosdistro: kinetic
 * /rosversion: 1.12.7

NODES

auto-starting new master
process[master]: started with pid [13740]
ROS_MASTER_URI=http://127.0.0.1:11311/

setting /run_id to c64a7ff4-9470-11e7-aca1-672ed88ddcd4
process[rosout-1]: started with pid [13753]
started core service [/rosout]
[2] 13770
[ INFO] [1504859776.297777179]: Waiting For connections on 0.0.0.0:8081
[3] 13799
registered capabilities (classes):
 - rosbridge_library.capabilities.call_service.CallService
 - rosbridge_library.capabilities.advertise.Advertise
 - rosbridge_library.capabilities.publish.Publish
 - rosbridge_library.capabilities.subscribe.Subscribe
 - <class 'rosbridge_library.capabilities.defragmentation.Defragment'>
 - rosbridge_library.capabilities.advertise_service.AdvertiseService
 - rosbridge_library.capabilities.service_response.ServiceResponse
 - rosbridge_library.capabilities.unadvertise_service.UnadvertiseService
[INFO] [1504859778.598864]: Rosbridge WebSocket server started on port 9090
[4] 13824
2017-09-08 10:36:20,574 [MainThread  ] [hbp_nrp_cles] [WARNING]  Could not write to specified logfile or no logfile specified, logging to stdout now!
2017-09-08 10:36:20,778 [Thread-3    ] [rospy.intern] [INFO]  topic[/rosout] adding connection to [/rosout], count 0
[5] 13843
[uWSGI] getting INI configuration from /home/alexander/.local/etc/nginx/uwsgi-nrp.ini
*** Starting uWSGI 2.0.12-debian (64bit) on [Fri Sep  8 10:36:22 2017] ***
compiled with version: 5.3.1 20160412 on 13 April 2016 08:36:06
os: Linux-4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017
nodename: alexander-HP-EliteBook-8760w
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/alexander/Documents/NRP/nest-simulator
detected binary path: /usr/bin/uwsgi-core
chdir() to /home/alexander/Documents/NRP/ExDBackend/hbp_nrp_backend/
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 63837
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /home/alexander/.local/etc/nginx/nrp-services.sock fd 3
Python version: 2.7.12 (default, Nov 19 2016, 06:48:10)  [GCC 5.4.0 20160609]
Set PythonHome to /home/alexander/.opt/platform_venv
Python main interpreter initialized at 0x10497d0
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 144896 bytes (141 KB) for 8 cores
*** Operational MODE: threaded ***
/home/alexander/Documents/NRP/ExDBackend/hbp-flask-restful-swagger-master/flask_restful_swagger/swagger.py:7: ExtDeprecationWarning: Importing flask.ext.restful is deprecated, use flask_restful instead.
  from flask.ext.restful import Resource, fields
/home/alexander/Documents/NRP/ExDBackend/hbp-flask-restful-swagger-master/flask_restful_swagger/swagger.py:7: ExtDeprecationWarning: Importing flask.ext.restful.fields is deprecated, use flask_restful.fields instead.
  from flask.ext.restful import Resource, fields
2017-09-08 10:36:23,049 [MainThread  ] [hbp_nrp_back] [WARNING]  Application started with uWSGI or any other framework. logging to console by default !
WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x10497d0 pid: 13845 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 13845, cores: 8)
[6] 13879
2017-09-08 10:36:24 GMT+0200 Polling Backend Servers for Experiments, Health & Running Simulations every 5000 ms.
2017-09-08 10:36:24 GMT+0200 Listening on port: 8000
alexander@alexander-HP-EliteBook-8760w:~/Documents/NRP/nest-simulator$ [INFO] [1504859787.905769]: Client connected.  1 clients total.
the rosdep view is empty: call 'sudo rosdep init' and 'rosdep update'
[INFO] [1504859788.012471]: [Client 0] Subscribed to /ros_cle_simulation/status
[INFO] [1504859788.020690]: [Client 0] Subscribed to /ros_cle_simulation/logs
[INFO] [1504859788.025176]: [Client 0] Subscribed to /ros_cle_simulation/cle_error
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDTutorialBaseballExercise'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDNao'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateDockedLauron'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDBraitenbergMouse'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDRetinaICubMockup'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateICub'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDDemoHuskySBC'
2017-09-08 10:36:28 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateManipulation'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateHusky'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDInteractiveManipulation'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'user-avatar_test-env'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateHuskyTimeout'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateMouseV2'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTutorialBaseballSolution'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTigrilloCPG'
2017-09-08 10:36:32 GMT+0200 Image obtained for ExperimentID: 'ExDTigrillo3'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDXMLExampleRobotZip'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDBraitenbergLauron'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDBraitenbergMouseLab'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateICubVirtualLab'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDManipulation'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDBraitenbergHuskySBC'
2017-09-08 10:36:35 GMT+0200 Image obtained for ExperimentID: 'ExDDvsIcub'
2017-09-08 10:36:36 GMT+0200 Image obtained for ExperimentID: 'ExDBraitenbergLauronSBC'
2017-09-08 10:36:38 GMT+0200 Image obtained for ExperimentID: 'ExDTemplateLauron'
2017-09-08 10:36:38 GMT+0200 Image obtained for ExperimentID: 'ScreenSwitchingHuskyExperiment'
2017-09-08 10:36:38 GMT+0200 Image obtained for ExperimentID: 'ExDSpiNNakerExample'
2017-09-08 10:36:39 GMT+0200 Image obtained for ExperimentID: 'ExDDvsRobotHead'
2017-09-08 10:36:39 GMT+0200 Image obtained for ExperimentID: 'ExDVisualTrackingICub'
2017-09-08 10:36:39 GMT+0200 Image obtained for ExperimentID: 'NeuronalRedDetection_Husky'
2017-09-08 10:36:39 GMT+0200 Image obtained for ExperimentID: 'roboy'
2017-09-08 10:36:39 GMT+0200 Image obtained for ExperimentID: 'ExDXMLExample'

Alexander


#10

Hmm, this might be a bit tricky to debug, it really seems like some other nest is being linked against somewhere.

Can you also try this in a terminal:

python -c "import nest; print nest.__file__"

and send me the output?

Are you attending the HBP Young Researchers event or CodeJam next week? If not, you can try a workaround by running export OMP_NUM_THREADS=1 before cle-start and see if the issue is fixed. That would mean that you’re most likely running against an old or completelt separate Nest version than the one built in nest-simulator, but it should at least work.


#11

I am not joining the events next week.

Running export OMP_NUM_THREADS=1 does fixes the issue! (even though it complaints about ‘some assets not found’ first.

alexander@alexander-HP-EliteBook-8760w:~$ python -c "import nest; print nest.__file__"

              -- N E S T --

  Copyright (C) 2004 The NEST Initiative
  Version 2.10.0 Oct 19 2016 10:23:31

This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.

Problems or suggestions?
  Visit http://www.nest-simulator.org

Type 'nest.help()' to find out more about NEST.
/home/alexander/opt/nest/lib/python2.7/site-packages/nest/__init__.pyc

Thanks,

Alexander


#12

Ok that’s the issue, you have another version of nest 2.10.0 installed there, for me the path is:

/home/kenny/.local/lib/python2.7/site-packages/nest/__init__.pyc

I guess the other version is ahead in your PATH or PYTHONPATH than the one that is installed by the NRP to ~/.local.

Do you know why you have the other version or is it something you need? If not, you can probably just move the whole nest directory somewhere else to back it up and see if that fixes the issue (rerun the one line Python command from above to ensure it’s using the proper Nest).

You’ll need to use a new terminal that does not have OMP_NUM_THREADS set to test properly.


#13

Yes it is the Nest I had installed pre NRP,

it works indeed after moving the old directory…

thanks for helping out,

Alexander