This is an old revision of the document!
Brewer diagnostic tool
In this document we describe the Brewer diagnostic system. All the code is available in the rbcce subversion repository.
The main component is a parser. This parser extracts information from the B and D files and then stores usefull information in status files and to database as well.
Application Requirements
In order to have a good indicator of the brewer's status, the next error types must be detected and then they should be condensed in a sets of bits that can be decoded again. Every single error type have associated a mask that set or unset a single bit in this status bit sets. In the beginning of the proyect we need 4 bytes to represent 32 error types.
- Number of type error limitations: The more error we detect the more bits we need. At some point we will be run out error types and we need another byte to archive the upgrade. In order to avoid this problem we added a unique error code integer to have more than 2^64 error types. In the future we can get rid of this sets of bits because this requirement is a limitation itself.
| Mask A | Mask B | Mask C | Mask D | Code | Level | Description |
|---|---|---|---|---|---|---|
| 00000001 | 00000000 | 00000000 | 00000000 | 11 | Critical | Azimuth tracker zeroing failure |
| 00000010 | 00000000 | 00000000 | 00000000 | 12 | Critical | Brewer failed to respond |
| 00000100 | 00000000 | 00000000 | 00000000 | 13 | Critical | Filterwheel #3 reference not found |
| 00001000 | 00000000 | 00000000 | 00000000 | 14 | Critical | HG failed |
| 00010000 | 00000000 | 00000000 | 00000000 | 15 | Critical | Lamp not detected |
| 00100000 | 00000000 | 00000000 | 00000000 | 16 | Critical | Lamp not on |
| 01000000 | 00000000 | 00000000 | 00000000 | 17 | Critical | Micrometer is jammed |
| 10000000 | 00000000 | 00000000 | 00000000 | 18 | - | Error code not used |
| 00000000 | 00000001 | 00000000 | 00000000 | 21 | Critical | Micrometer reset attempted |
| 00000000 | 00000010 | 00000000 | 00000000 | 22 | Critical | No intensity detected |
| 00000000 | 00000100 | 00000000 | 00000000 | 23 | Critical | Reset failed |
| 00000000 | 00001000 | 00000000 | 00000000 | 24 | Critical | Micrometer alignment trouble |
| 00000000 | 00010000 | 00000000 | 00000000 | 25 | Critical | Zenith zeroing failure |
| 00000000 | 00100000 | 00000000 | 00000000 | 26 | Warning | Illegal command |
| 00000000 | 01000000 | 00000000 | 00000000 | 27 | - | Error code not used |
| 00000000 | 10000000 | 00000000 | 00000000 | 28 | - | Error code not used |
| 00000000 | 00000000 | 00000001 | 00000000 | 31 | Info | UV scan started |
| 00000000 | 00000000 | 00000010 | 00000000 | 32 | Info | UV scan finished |
| 00000000 | 00000000 | 00000100 | 00000000 | 33 | Warning | UV scan aborted |
| 00000000 | 00000000 | 00001000 | 00000000 | 34 | Info | End of day started |
| 00000000 | 00000000 | 00010000 | 00000000 | 35 | - | Error code not used |
| 00000000 | 00000000 | 00100000 | 00000000 | 36 | - | Error code not used |
| 00000000 | 00000000 | 01000000 | 00000000 | 37 | - | Error code not used |
| 00000000 | 00000000 | 10000000 | 00000000 | 38 | Critical | user break or unbreak |
| 00000000 | 00000000 | 00000000 | 00000001 | 41 | Critical | no data |
| 00000000 | 00000000 | 00000000 | 00000010 | 42 | Critical | UTC date failure |
| 00000000 | 00000000 | 00000000 | 00000100 | 43 | Critical | Julian date failure |
| 00000000 | 00000000 | 00000000 | 00001000 | 44 | Critical | time failure |
| 00000000 | 00000000 | 00000000 | 00010000 | 45 | Error | Manual sighting requested |
| 00000000 | 00000000 | 00000000 | 00100000 | 46 | Info | Total ozone submited |
| 00000000 | 00000000 | 00000000 | 01000000 | 47 | Warning | Unexpected schedule running |
| 00000000 | 00000000 | 00000000 | 10000000 | 48 | - | Error code not used |
There are another set of test that should be done comparing a parsed value against a threshold. This threshold could be obtained from 3 different ways per each test.
- default value: in the database we will store a value +- thereshold.
- brewer dependant value: in the database we will store a value +- threshold per brewer.
- brewer dependant from config: in the database we will get a value from the brewer config +- a threshold.
| Mask A | Mask B | Mask C | Mask D | Code | Level | Description |
|---|---|---|---|---|---|---|
| 00000001 | 00000000 | 00000000 | 00000000 | 11 | Critical | Azimuth tracker zeroing failure |
Error Information Extracted
Additionally to the information extracted stated above, we need to extract more information because we need to check if the error is a false positive in the sources of error. In addition, we could need create complex database querys to make a more accurate diagnosis.
The information that we extract from the files will be described below.
| Field | Description |
|---|---|
| 'brewerid' | unique identificator from which brewer the error comes |
| 'date' | date when the error was produced |
| 'gmt' | time in GMT when the error was produced taken from the files |
| 'level' | integer that represents the importance of the error |
| 'code' | unique error code for a given error type |
| 'filename' | the file's name where the error was extracted |
| 'line' | the line inside the file where the error was extracted |
| 'message' | the original message extracted from the file |
| 'count' | ocurrences of the same error type in this file until now |
Errors pending to implement
Time related Errors
The following errors: UTC date failure, Julian date failure and Time failure, are difficult to extract. The only way to know the status of the remote clock is with the files' modification time. When we are copying those files to the local machine, we lose the modification time and its imposible to check if the remote clock is shift.
Proposed solution: We need to create a mecanism that allows the status tool to pool the remote clock in order to check if is synced.
Manual sighting request
This problem is hard to parse because in the B or D files appears to be a user command. This means that the user can type a error message in their own language. Sometimes we found the message in english but almost the majority of times we found this messages in a different language which is imposible to parse.
Flow Graph
- Every 15 minutes the eubrewnet system pools the B and D files from the brewers.
- The diagnostic tool parser receives it as input.
- The output is written to database and files.
- The Eubrewnet webpage shows the information extracted before.
Front End
The front end consist of two views, the main view that is the brewer overview and the detailed view.
Main View
When the users get into the frontpage, they can see a map with the status of the brewers. A code of colours states the status of the brewer. green means OK, yellow means WARNING, orange means ERROR, red means CRITICAL, grey means no data since the last 24 hours and finally black means no data since the last 48 hours.
Detailed View
When the user clicks in a brewer, the user will be redirected to the details of the brewer. In the Brewer Status section we can see a list of original messages taken from D or B files explaining the status of the brewer.
The next example shows a brewer in a OK status.
The next example shows a brewer with several CRITICAL problems.
Detailed operation
The entry point of the diagnosis tool is monitor.py. The pseudo code of monitor.py is the following:
- create a list of path to posible files
- test if this file exist
- parse the file
- write the status on file
- write the status on database
The third item on the previous list is the parser itself and the file that contain the parser is parseSingleFile.py
parseSingleFile is a parent class of parseDFile.py and parseBFile.py. It contains many functionality to handle files and database access.
ParseBFile.py and ParseDFile.py contains an abstract function called parse_lines() that takes a list of lines of a given file and then creates a list of error. The pseudocode is the following:
- iterate over all lines of a file
- iterate over all error types
- if the line contains a specific string
- create new error
- push this new error in the errorList grouping by error code.

