Trouble Engine
The initial version of the trouble engine was written in Perl, and was intended to only be run on RHEL3.
My, how times have changed. We are now working on the second version of the system.
The new version is being written in Perl to take advantage of more rapid development and more centralized libraries.
It will, however, require the Archive-Tar and YAML libraries.
The format of the trouble-modules has shifted to being zipped archives in YAML.
The job of the trouble engine is to select and apply a problem. It does this via the following process:
-
System identification - The engine determines what kind of system it is running on to limit the
the trouble modules from which it may select. This information may be cached for future runs.
At this time, valid systems are specified with --version="string":
-
Trouble selection - The engine chooses a trouble module and applies it to the system. These
modules may be selected with the --selection="string" argument:
-
random - For the purpose of self-testing and gaining experience, this mode randomly selects a
trouble module and applies it to your system. There is no check to prevent this trouble module
from being selected again. This is in line with realistic system scenarios.
-
[file] - For the purpose of strict testing and training, you may predefine a set of trouble-modules in a YAML file.
In this mode, after choosing the first module, and applying it, that module is marked as inactive, and will not
be re-used in the session. Alternately, the argument could refer to a tar file, in which case, that file is loaded as the trouble file
If the --selection="string" command-line argument is not specified, the system will default to random selection.
-
Trouble parsing - After selecting a trouble module, that module is parsed into its components.
These components are sub-selected if required by the module, and a final set of trouble components
are selected.
-
Trouble check - Check the requirements for the trouble module before attempting to run it. If it does not pass, exit and explain why.
-
Trouble application - At this point, the trouble components are executed in order.
-
All files listed are backed up to the backup directory.
The scenario backup script is run to backup key information to the backup directory, passing in the OS version as the sole argument.
This information, as well as a list of files that could not be backed up, are stored in a file named 'BACKUP' in the backup directory.
The backup directory defaults to /tmp/trouble-maker/backup/ but may be overridden with the --backupdir="string" argument.
-
The scenario description is copied into the rescue directory as a file named 'DESCRIPTION'.
The backup directory defaults to /tmp/trouble-maker/rescue/ but may be overridden with the --rescuedir="string" argument.
-
The scenario details are copied into the rescue directory as a file named 'DETAILS'.
The backup directory defaults to /tmp/trouble-maker/rescue/ but may be overridden with the --rescuedir="string" argument.
-
The scenario check script is copied into the rescue directory as a file named 'CHECK'.
The backup directory defaults to /tmp/trouble-maker/rescue/ but may be overridden with the --rescuedir="string" argument.
-
The scenario script is copied into the rescue directory as a file named 'TROUBLE-SCRIPT' and executed, passing in the OS version as the sole argument.
The backup directory defaults to /tmp/trouble-maker/rescue/ but may be overridden with the --rescuedir="string" argument.
-
The scenario description is presented to the user via standard out.
-
The exercise of rebooting the server is left as a manual step for the user.