Changes between Initial Version and Version 1 of WAM/Troubleshooting


Ignore:
Timestamp:
May 13, 2011, 5:37:39 PM (14 years ago)
Author:
edison
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAM/Troubleshooting

    v1 v1  
     1= WAM Troubleshooting =
     2
     3== Checking the Error Log ==
     4One of the first steps to diagnosing many WAM problems is to examine the error log file /var/log/syslog. Here is a normal startup log for a 7-DOF WAM:
     5{{{
     6Jul 29 20:27:41 WAM WAM: ...Starting btdiag program...
     7Jul 29 20:27:44 WAM WAM: Waking all pucks
     8Jul 29 20:27:44 WAM WAM: getBusStatus(): canReadMsg returned error
     9Jul 29 20:27:45 WAM last message repeated 22 times
     10Jul 29 20:27:45 WAM WAM: getBusStatus: Only status != -1 is shown.
     11Jul 29 20:27:45 WAM WAM: getBusStatus: status[1] = 2
     12Jul 29 20:27:45 WAM WAM: getBusStatus: status[2] = 2
     13Jul 29 20:27:45 WAM WAM: getBusStatus: status[3] = 2
     14Jul 29 20:27:45 WAM WAM: getBusStatus: status[4] = 2
     15Jul 29 20:27:45 WAM WAM: getBusStatus: status[5] = 2
     16Jul 29 20:27:45 WAM WAM: getBusStatus: status[6] = 2
     17Jul 29 20:27:45 WAM WAM: getBusStatus: status[7] = 2
     18Jul 29 20:27:45 WAM WAM: getBusStatus: status[10] = 2
     19Jul 29 20:27:45 WAM WAM: About to allocate space for 8 nodes
     20Jul 29 20:27:45 WAM WAM: getBusStatus(): canReadMsg returned error
     21Jul 29 20:27:45 WAM last message repeated 22 times
     22Jul 29 20:27:45 WAM WAM: getBusStatus: Only status != -1 is shown.
     23Jul 29 20:27:45 WAM WAM: getBusStatus: status[1] = 2
     24Jul 29 20:27:45 WAM WAM: getBusStatus: status[2] = 2
     25Jul 29 20:27:45 WAM WAM: getBusStatus: status[3] = 2
     26Jul 29 20:27:45 WAM WAM: getBusStatus: status[4] = 2
     27Jul 29 20:27:45 WAM WAM: getBusStatus: status[5] = 2
     28Jul 29 20:27:45 WAM WAM: getBusStatus: status[6] = 2
     29Jul 29 20:27:45 WAM WAM: getBusStatus: status[7] = 2
     30Jul 29 20:27:45 WAM WAM: getBusStatus: status[10] = 2
     31Jul 29 20:27:45 WAM WAM: Puck: ID=1 CTS=4096 IPNM=2700.00 PIDX=0 GRPB=1
     32Jul 29 20:27:45 WAM WAM: Puck: ID=2 CTS=4096 IPNM=2562.00 PIDX=1 GRPB=1
     33Jul 29 20:27:45 WAM WAM: Puck: ID=3 CTS=4096 IPNM=2562.00 PIDX=2 GRPB=1
     34Jul 29 20:27:45 WAM WAM: Puck: ID=4 CTS=4096 IPNM=2700.00 PIDX=3 GRPB=1
     35Jul 29 20:27:45 WAM WAM: Puck: ID=5 CTS=4096 IPNM=4961.00 PIDX=0 GRPB=2
     36Jul 29 20:27:45 WAM WAM: Puck: ID=6 CTS=4096 IPNM=4961.00 PIDX=1 GRPB=2
     37Jul 29 20:27:45 WAM WAM: Puck: ID=7 CTS=4096 IPNM=17474.00 PIDX=2 GRPB=2
     38Jul 29 20:27:45 WAM WAM: Actuator data dump:
     39Jul 29 20:27:45 WAM WAM: [0]:Bus-0,ID-1,G-1,O-0,M-0,Off-0,Enc-4096
     40Jul 29 20:27:45 WAM WAM: [1]:Bus-0,ID-2,G-1,O-1,M-0,Off-0,Enc-4096
     41Jul 29 20:27:45 WAM WAM: [2]:Bus-0,ID-3,G-1,O-2,M-0,Off-0,Enc-4096
     42Jul 29 20:27:45 WAM WAM: [3]:Bus-0,ID-4,G-1,O-3,M-0,Off-0,Enc-4096
     43Jul 29 20:27:45 WAM WAM: [4]:Bus-0,ID-5,G-2,O-0,M-0,Off-0,Enc-4096
     44Jul 29 20:27:45 WAM WAM: [5]:Bus-0,ID-6,G-2,O-1,M-0,Off-0,Enc-4096
     45Jul 29 20:27:45 WAM WAM: [6]:Bus-0,ID-7,G-2,O-2,M-0,Off-0,Enc-4096
     46Jul 29 20:27:45 WAM WAM: Data for Bus 0
     47Jul 29 20:27:45 WAM WAM: Bus Data: There were 7 Pucks sorted by ID
     48Jul 29 20:27:45 WAM WAM: [0]: Actuator 0 Puck 1
     49Jul 29 20:27:45 WAM WAM: [1]: Actuator 1 Puck 2
     50Jul 29 20:27:45 WAM WAM: [2]: Actuator 2 Puck 3
     51Jul 29 20:27:45 WAM WAM: [3]: Actuator 3 Puck 4
     52Jul 29 20:27:45 WAM WAM: [4]: Actuator 4 Puck 5
     53Jul 29 20:27:45 WAM WAM: [5]: Actuator 5 Puck 6
     54Jul 29 20:27:45 WAM WAM: [6]: Actuator 6 Puck 7
     55Jul 29 20:27:45 WAM WAM: Bus Data: There were 2 Groups
     56Jul 29 20:27:45 WAM WAM: Group 1: A0 P1 A1 P2 A2 P3 A3 P4
     57Jul 29 20:27:45 WAM WAM: Group 2: A4 P5 A5 P6 A6 P7 A-1 P0
     58Jul 29 20:27:45 WAM WAM: device_name=WAM7
     59Jul 29 20:27:45 WAM WAM: wam->name=WAM7
     60Jul 29 20:27:45 WAM WAM: bus=0, num_actuators=7
     61Jul 29 20:27:45 WAM WAM: OpenWAM(): for link from 0 to 7
     62Jul 29 20:27:45 WAM WAM: parseGetVal: key not found [WAM7.link[7].rotorI]
     63Jul 29 20:27:45 WAM WAM: Motor[0] is joint 0
     64Jul 29 20:27:45 WAM WAM: Motor[1] is joint 1
     65Jul 29 20:27:45 WAM WAM: Motor[2] is joint 2
     66Jul 29 20:27:45 WAM WAM: Motor[3] is joint 3
     67Jul 29 20:27:45 WAM WAM: Motor[4] is joint 4
     68Jul 29 20:27:45 WAM WAM: Motor[5] is joint 5
     69Jul 29 20:27:45 WAM WAM: Motor[6] is joint 6
     70Jul 29 20:27:45 WAM WAM: WAM zeroed by application
     71Jul 29 20:27:45 WAM WAM: About to set safety limits, VL2 = 46
     72Jul 29 20:27:45 WAM WAM: WAMControl period Sec:0.002000, ns: 2000000
     73}}}
     74
     75== Common Problems ==
     76The symptoms repeated in this section were either generated by Barrett’s own lab WAMs or were reported by Barrett’s customers.
     77
     78 == Problem: Can not log in to the WAM PC    ==
     79  === Reason: Someone changed the password   ===
     80'''Solution(s):'''
     81        a. Ask the people in your lab for the new password
     82  === Reason: The drive is full (External PC)   ===
     83'''Solution(s):'''
     84        a. If the drive is full, the system is not able to record your login event, and the login will fail. It is likely that the /var/log/syslog files are very large, especially if you are logging data or errors from within the WAM control loop. Boot from a bootable CD (like sysresccd.ora., and delete the syslog files (or other files) to make some room.
     85  === Reason: The network connection is not active (logging in over the network)   ===
     86'''Solution(s):'''
     87        a. Make sure the Ethernet cable is plugged into the PC
     88        a. Make sure your DHCP server is active (if using DHCP)
     89        a. Check that the Ethernet driver is loaded (from local terminal): lspci |grep –i eth; dmesg |grep –i eth; lsmod
     90  === Reason: The drive is corrupted or damaged due to jarring, poor ventilation, or a software bug.   ===
     91        a. Try a new hard drive (external PC) O/S installation instructions are on wiki.barrett.com.
     92        a. Try a new !CompactFlash (internal PC) Contact Barrett for a replacement.
     93
     94----
     95
     96 == Problem: WAM control application crashes/segfaults    ==
     97  === Reason: The WAM PC can not communicate with the Safety Board at program launch   ===
     98'''Solution(s):'''
     99    a. Turn on the WAM power supply.
     100    a. Make sure that the CANbus cable is securely plugged in.
     101    a. Make sure that the CANbus cable is plugged into the correct port on the PC (external WAM PCs)
     102    a. Make sure that the CANbus cable is properly terminated- there is a wrist or blank outer link attached.
     103    a. Make sure the CANbus card is properly seated in its PCI slot (external WAM PCs)
     104    a. Make sure the CAN driver is installed and loaded correctly: dmesg |grep –i pcan, cat /proc/pcan, lsmod
     105    a. Check for a broken CANbus or power wire, usually in a connector or at the safety puck.
     106    a. Check for a loose CANbus or power crimp in the connectors.
     107    a. Attach the Puck serial cable to the safety board to verify that the safety puck is functional. Could it have overheated?
     108  === Reason: You are accessing data outside of the program’s memory space (not likely with standard example programs).   ===
     109'''Solution(s):'''
     110        a. Use gdb to determine the offending line of source code.
     111
     112----
     113
     114 == Problem: The pendants do not initialize when the WAM power supply is turned on ==
     115
     116  === Reason: The safety board fuse is blown  ===
     117'''Solution(s):'''
     118        a. Remove, test, and replace the fuse on the safety board
     119  === Reason: There is no power  ===
     120'''Solution(s):'''
     121        a. Check for proper voltage at the source (wall or battery)
     122        a. Check for any custom current-limiting circuits you may be using
     123        a. Check the output voltage of the power supply (should be 48 VDC, nominal)
     124  === Reason: There is some other electrical problem  ===
     125'''Solution(s):'''
     126        a. Contact Barrett for repair
     127
     128----
     129
     130 == Problem: The pendant initialization sequence is incorrect when the WAM power supply is turned on   ==
     131  === Reason: The Data/Clock/Latch signals to the pendants are weak  ===
     132        '''Solution(s):'''
     133        a. Make sure both pendant cables are plugged securely into the WAM
     134  === Reason: There is some other electrical problem  ===
     135'''Solution(s):'''
     136        a. Contact Barrett for repair
     137
     138----
     139
     140 == Problem: Some joints have resistive braking, some do not. The angles that btdiag returns for the joints without resistive braking are incorrect. This is easiest to check for at the home position.   ==
     141  === Reason: The Pucks without resistive braking are not powering up correctly. NOTE: It is normal to have no resistive braking in all joints after turning on the WAM and pressing Shift-Idle, but before you launch a WAM control program.  ===
     142'''Solution(s):'''
     143        a. Check for a broken CANbus or power wire, usually in a connector near the Puck.
     144        a. Check for a loose CANbus or power crimp in the connectors near the Puck.
     145        a. Check for a loose CANbus or power wire at the Puck itself.
     146        a. Check the syslog for clues (compare it with the syslog abova.
     147        a. Contact Barrett for repair
     148
     149----
     150
     151 == Problem: Joint readings “bounce” from reasonable/actual values to values that are not/cannot be true and back again.   ==
     152  === Reason: Realtime control violation  ===
     153'''Solution(s):'''
     154        a. Make sure the control loop avoids system calls that are incompatible with realtime control: No UI such as printf() or getch(), no I/O such as read() or write()- just about anything except for loops and pure math.
     155        a. Make sure your total WAM control loop time, including the WAMCallback(), does not exceed the time allowed by the control rate. Keep in mind that the data time-of-flight on the CANbus is approximately 850 uS for a 7-DOF and 500 uS for a 4-DOF. This takes up a significant amount of the total control loop time.
     156        a. If you built your own WAM PC, a system hardware interrupt might be causing a realtime glitch. Check your installation against: http://www.rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer
     157        a. Extra getProperty() or setProperty() calls could be exceeding the CANbus bandwidth. setProperty() takes 75 uS without verification, getProperty() takes 150 uS. You might try staggering these function calls across multiple control cycles.
     158  === Reason: CANbus communication problem  ===
     159'''Solution(s):'''
     160        a. Use the btutil utility application to verify that the Puck firmware versions match for all motor Pucks. Load matching firmware onto all motor Pucks, if necessary.
     161        a. Ensure that CANbus is properly terminated to minimize signal reflections. There is a 120 Ohm termination resistor in the wrist module (7-DOa. and blank outer link (4- DOa.. If you are using the internal WAM PC, switches 1-2 and 1-3 should be “out” (see Section 1.5). This terminates the CANbus at the Safety Board with a 120 Ohm resistor. If you are using an external WAM PC, the purple CANbus cable has a 120 Ohm resistor in the connector at the PC end. In this case, switches 1-2 and 1-3 should be “in”.
     162        a. Use btutil to ensure that there are no conflicting Puck IDs on the CANbus. Each Puck must have a unique ID. If the Pucks are not enumerating correctly, disconnect them from the CANbus on-by-one until you find the one causing the conflict. Use btutil to set a new Puck ID for that puck.
     163  === Reason: Grounding problem  ===
     164'''Solution(s):'''
     165        a. Eliminate CANbus ground loops. The CANbus shield is connected to Earth ground on the Safety Board. If your PC is also grounded, you should use an opto-isolated CANbus card to prevent ground loops.
     166        a. Eliminate power bus ground loops. The frame of the WAM is connected to Earth ground through the blue DC power cable. The frame of the 48 VDC supply is also Earth-grounded. If there is an additional electrical connection between the frame of the WAM and the frame of the power supply (such as a metal table or mounting bracket), this can cause a ground loop and undesired operation.
     167
     168----
     169
     170 == Problem: Running “sh makeall” causes a number of error messages appear, including “cannot find –lntcan”.   ==
     171  === Reason: The linker is looking for the wrong CAN driver  ===
     172'''Solution(s):'''
     173        a. Edit your btclient/config.mk file to specify the correct CAN driver
     174  === Reason: The CAN driver is not installed ===
     175'''Solution(s):'''
     176        a. Install the correct CAN driver for your CAN hardware
     177
     178----
     179
     180 == Problem: A mechanical cable is fraying, birdnesting, or has come loose   ==
     181  === Reason: The maximum recommended payload was exceeded  ===
     182'''Solution(s):'''
     183        a. Note that the payload specification does not take into account accelerated loads. If you attach a 3 kg load to the 4-DOF and accelerate it at 2 g’s, the WAM experiences a 6 kg load, exceeding the recommended payload. The cable should be replaced before further use.
     184  === Reason: A brass termination slipped off the end of the cable.  ===
     185'''Solution(s):'''
     186        a. This is a manufacturing defect. Contact Barrett to get the cable replaced.
     187
     188----
     189
     190  == Problem: During operation, you hear a “popping” sound accompanied by a distinct nasty smell.   ==
     191  === Reason: An electrical component burned up.  ===
     192'''Solution(s):'''
     193        a. Carefully record all events leading up to the failure. Contact Barrett for repair.
     194
     195----
     196
     197 == Problem: The WAM returns to home position before starting a new trajectory.   ==
     198=== Reason: The first point in the trajectory is the home position. ===
     199'''Solution(s):'''
     200        a. Inspect the trajectory file with a text editor to determine if this is the case. Every trajectory file is just a comma-delimited file with time (in seconds) in the left column and (depending on the moda. the joint positions or the Cartesian positions in the remaining columns.
     201
     202----
     203
     204 == Problem: A slight knocking can be heard when moving a particular cable circuit of the WAM (it may also be felt if backdriving the WAM by hana.. As the knock occurs, a slight backlash may be felt.   ==
     205  === Reason: Debris in a joint bearing.  ===
     206'''Solution(s):'''
     207        a. Apply several drops of oil (3-in-1, or WD-40 in liquid form or another light viscosity mineral oil should work). The debris has about a 50% chance of working itself out eventually.
     208  === Reason: Loose ball-bearing retainer  ===
     209'''Solution(s):'''
     210        a. Eliminated in newer designs, usually does not interfere with operation except for clicking noise.
     211  === Reason: Insufficient motor shaft axial preload/shimming – the rotor/shaft should not be able to move axially inside the motor.  ===
     212'''Solution(s):'''
     213        a. Return to Barrett for repair.
     214
     215----
     216
     217 == Problem: The robot fails to follow scaled trajectories.   ==
     218  === Reason: The WAM does not have the latest OS/firmware/software.  ===
     219'''Solution(s):'''
     220        a. Follow the directions on wiki.barrett.com to update your firmware/software.
     221
     222----
     223
     224 == Problem: Gravity Compensation does not operate correctly.   ==
     225  === Reason: The wam.conf file links to the incorrect configuration file based on the present WAM setup  ===
     226'''Solution(s):'''
     227        a. Change the link in wam.conf to the correct configuration file: WAM4 (4-DOa., WAM7 (4DOF with Wrist), or WAMG (4DOF with gimbals)
     228  === Reason: The parameters in the configuration file do not match your system’s actual configuration.  ===
     229'''Solution(s):'''
     230        a. Check the linked configuration file for the following:
     231                * Correct the tool center point (DH parameters d, a, alpha.. These are offsets from the final link’s frame.
     232                * Correct the tool Center-Of-Mass (COM), relative to the tool center point
     233                * Correct the tool mass
     234                * In pre-2007 WAMs, Motor 4 (M4) was located over M
     235  === Reason: After 2007, M4 was reversed and located over M ===
     236  === Reason: The transformation matrices need to match your WAM’s M4 configuration. Basically, the sign changed from + to – for the M4 transmission. ===
     237                * The latest files in config/ will not necessarily match your WAM. You should always use the config file that shipped with your WAM. Update that file with *new features* from the latest config files, but be careful not to change the kinematic and dynamic constants that are specific to your WAM.
     238  === Reason: Mass parameters are inaccurate due to model errors, unmodeled parts, or machinist tolerances.  ===
     239'''Solution(s):'''
     240        a. On-line calibration (either static or dynamia.can yield better data than the model. Barrett is working on calibration software and will notify the WAM User List when it is released.
     241  === Reason: Electrical wiring stiffness requires additional joint torque to overcome pulling effect near joint limits  ===
     242'''Solution(s):'''
     243        a. This requires complex modeling algorithms that Barrett has not yet developed.
     244  === Reason: We depend on a sampled motor torque constant to be accurate across a batch of motors in a manufacturing run. In fact, the torque constants could vary slightly.  ===
     245'''Solution(s):'''
     246        a. Calibration software could reduce errors due to inaccurate torque constants. Barrett is working on calibration software and will notify the WAM User List when it is released.
     247
     248----
     249
     250 == Problem: Safety Board is randomly rebooting. The pendants reinitialize themselves.   ==
     251  === Reason: There are electrical noise spikes/dips in the power circuitry of the Safety Board.  ===
     252'''Solution(s):'''
     253        a. Contact Barrett for a replacement board.
     254
     255----
     256
     257 == Problem: Pendant lights do not come on (specifically the IDLE light), but Pucks are online anyways, but do not make PWM sound when <SHIFT+ACTIVATE> is pressed. ==
     258  === Reason: The pendant is electrically damaged  ===
     259'''Solution(s):'''
     260        a. Contact Barrett for repair.
     261
     262----
     263
     264 == Problem: One or more joints of the WAM vibrate during trajectory following. ==
     265  === Reason: The PID control gains for the joint are too high. The default PID control gains assume some non-zero payload in gravity. If you have no payload, or the joint is facing the floor or the ceiling (that is, if gravity is not a factor and there is no externally-applied torque on the joint), then the joint control gains could be too high.  ===
     266'''Solution(s):'''
     267        a. Lower the PID gains for that joint (in the wam.conf file) Start by cutting P and D in half.
     268
     269----
     270
     271 == Problem: One or more joints of the WAM oscillate or otherwise go unstable during trajectory following.   ==
     272  === Reason: The PID control gains for the joint are too low.  ===
     273'''Solution(s):'''
     274        a. Raise the PID gains for that joint (in the wam.conf file)
     275  === Reason: The payload is too heavy, the motor torque is saturating at Max Torque (MT), and the control can not keep up.  ===
     276'''Solution(s):'''
     277        a. Reduce the payload. Remember that you must take into account accelerated loads while staying under the max payload specification. For example, the max payload for the 7DOF is 3 kg. This means you can accelerate 1.5 kg at 2G. Or you can hold 3 kg statically at the max reach.
     278        a. Increase the MT parameter for the motors. However, the default MT values are set to stay within the breaking strength of the steel cables. Increasing these values increases the likelihood of cable failure.
     279  === Reason: The simple PID control that ships with the example code is not robust enough to handle loads offset significantly from the tool center point or loads with unusual inertial characteristics.  ===
     280'''Solution(s):'''
     281        a. Design a more robust control method for your application
     282
     283----
     284
     285 == Problem: WAM will not enter Idle mode—the WAM starts with a low voltage fault, and when <Shift+Reset/Idle> is pressed, the pendants reinitialize instead of entering Idle mode.   ==
     286  === Reason: The Puck power bus is shorted together. This could be due to a stuck relay, loose wire, metallic debris on Safety Board, or a damaged Puck.  ===
     287'''Solution(s):'''
     288        a. Contact Barrett for repair