====== Brewer diagnostic tool ====== In this document we describe the Brewer diagnostic system. All the code is available in the [[http://rbcce.aemet.es/svn/python/monitor/|rbcce subversion repository]]. The main component is a parser. This parser extracts information from the B and D files and then stores usefull information in status files and to database as well. === Application Requirements === In order to have a good indicator of the brewer's status, the next error types must be detected and then they should be condensed in a sets of bits that can be decoded again. Every single error type have associated a mask that set or unset a single bit in this status bit sets. In the beginning of the proyect we need 4 bytes to represent 32 error types. * Number of type error limitations: The more error we detect the more bits we need. At some point we will be run out error types and we need another byte to archive the upgrade. In order to avoid this problem we added a unique error code integer to have more than 2^64 error types. In the future we can get rid of this sets of bits because this requirement is a limitation itself. ^ Mask A ^ Mask B ^ Mask C ^ Mask D ^ Code ^ Level ^ Description ^ | 00000001 | 00000000 | 00000000 | 00000000 | 11 | Critical |Azimuth tracker zeroing failure | | 00000010 | 00000000 | 00000000 | 00000000 | 12 | Critical |Brewer failed to respond| | 00000100 | 00000000 | 00000000 | 00000000 | 13 | Critical |Filterwheel #3 reference not found| | 00001000 | 00000000 | 00000000 | 00000000 | 14 | Critical |HG failed| | 00010000 | 00000000 | 00000000 | 00000000 | 15 | Critical |Lamp not detected| | 00100000 | 00000000 | 00000000 | 00000000 | 16 | Critical |Lamp not on| | 01000000 | 00000000 | 00000000 | 00000000 | 17 | Critical |Micrometer is jammed| | 10000000 | 00000000 | 00000000 | 00000000 | 18 | - |Error code not used| | 00000000 | 00000001 | 00000000 | 00000000 | 21 | Critical |Micrometer reset attempted| | 00000000 | 00000010 | 00000000 | 00000000 | 22 | Critical |No intensity detected| | 00000000 | 00000100 | 00000000 | 00000000 | 23 | Critical |Reset failed| | 00000000 | 00001000 | 00000000 | 00000000 | 24 | Critical |Micrometer alignment trouble| | 00000000 | 00010000 | 00000000 | 00000000 | 25 | Critical |Zenith zeroing failure| | 00000000 | 00100000 | 00000000 | 00000000 | 26 | Warning |Illegal command| | 00000000 | 01000000 | 00000000 | 00000000 | 27 | - |Error code not used| | 00000000 | 10000000 | 00000000 | 00000000 | 28 | - |Error code not used| | 00000000 | 00000000 | 00000001 | 00000000 | 31 | Info |UV scan started| | 00000000 | 00000000 | 00000010 | 00000000 | 32 | Info |UV scan finished| | 00000000 | 00000000 | 00000100 | 00000000 | 33 | Warning |UV scan aborted| | 00000000 | 00000000 | 00001000 | 00000000 | 34 | Info |End of day started| | 00000000 | 00000000 | 00010000 | 00000000 | 35 | - |Error code not used| | 00000000 | 00000000 | 00100000 | 00000000 | 36 | - |Error code not used| | 00000000 | 00000000 | 01000000 | 00000000 | 37 | - |Error code not used| | 00000000 | 00000000 | 10000000 | 00000000 | 38 | Critical |user break or unbreak| | 00000000 | 00000000 | 00000000 | 00000001 | 41 | Critical |no data| | 00000000 | 00000000 | 00000000 | 00000010 | 42 | Critical |UTC date failure| | 00000000 | 00000000 | 00000000 | 00000100 | 43 | Critical |Julian date failure| | 00000000 | 00000000 | 00000000 | 00001000 | 44 | Critical |time failure| | 00000000 | 00000000 | 00000000 | 00010000 | 45 | Error |Manual sighting requested| | 00000000 | 00000000 | 00000000 | 00100000 | 46 | Info |Total ozone submited| | 00000000 | 00000000 | 00000000 | 01000000 | 47 | Warning |Unexpected schedule running| | 00000000 | 00000000 | 00000000 | 10000000 | 48 | - |Error code not used| There are another set of test that should be done comparing a parsed value against a threshold. This threshold could be obtained from 3 different ways per each test. - default value: in the database we will store a minimum value and a maximum value. - brewer dependant value: in the database we will store a minumum and maximum per brewer. - brewer dependant from config: in the database we will get a value from the brewer config and then we sustract and add 2 values [config - valueA, config + valueB]. If the tested parameter is in this range the Critical Level should be INFO, otherwise should be WARNING. ^ Code ^ Thereshold Type ^ Thereshold Default ^ Description ^ | 51 | default | [-5,5] | Azimuth discrepancy | | 52 | config | [config - 5, config + 5] | Azimuth steps per revolution | | 53 | default | [-2,2] | Zenith discrepancy | | 54 | default | [-15,15] | time correction | | 55 | TBD | TBD | micro is moved | | 56 | 15?? | 15?? | dark count | | 57 | TBD | TBD | micrometer #1 diode found | | 58 | TBD | TBD | micrometer #1 reference found | | 61 | NA* | NA* | running schedule | == A/D Values == We assign a single code to every single parameter. We will describe below the A/D values that we extract. We only extract A/D values with lamps on. In the future we will extract both numbers. ^ Code ^ Thereshold Type ^ Thereshold Default ^ Description ^ | 62 | TBD | TBD | brewer temp 1 | | 63 | TBD | TBD | brewer temp 2 | | 64 | TBD | TBD | brewer temp 3 | | 65 | TBD | TBD | h.t. voltage | | 66 | TBD | TBD | +15v power supply | | 67 | TBD | TBD | + 5v power supply | | 68 | TBD | TBD | -15v power supply | | 71 | TBD | TBD | +24v power supply | | 72 | TBD | TBD | rate meter | | 73 | TBD | TBD | relative humidity | | 74 | TBD | TBD | atm. pressure | | 75 | TBD | TBD | external temp | | 76 | TBD | TBD | + 5v ss | | 77 | TBD | TBD | - 8v ss | | 78 | TBD | TBD | standard lamp current | | 81 | TBD | TBD | standard lamp voltage | | 82 | TBD | TBD | pmt temp | | 83 | TBD | TBD | fan temp | | 84 | TBD | TBD | base temp | | 85 | TBD | TBD | +12v power supply | | 86 | TBD | TBD | -12v power supply | | 87 | TBD | TBD | below spectro temp | | 88 | TBD | TBD | window area temp | | 91 | TBD | TBD | mercury lamp current | | 92 | TBD | TBD | mercury lamp voltage | | 93 | TBD | TBD | moisture | | 94 | TBD | TBD | - 5v ss (v) | There are a few more error that we haven't parsed yet. ^ Code ^ Thereshold Type ^ Thereshold ^ Description ^ Comments ^ | TBD | TBD | TBD | Total Ozone of the day submitted | probably already in db | | TBD | TBD | TBD | Alignment of two Spectrometers | Never found in any file | | TBD | TBD | TBD | dead time high intensity | Never found in any file | | TBD | TBD | TBD | dead time low intensity | Never found in any file | | TBD | TBD | TBD | found diode at step | Which diode should be extracted? | | TBD | TBD | TBD | humidity | Is this parameter different to A/D values? Redundant | | TBD | TBD | TBD | manual sighting HC | already working on | | TBD | TBD | TBD | manual sighting NC | already working on | | TBD | TBD | TBD | mercury lamp intensity | probably already in DB | | TBD | TBD | TBD | standard lamp intensity | problably already in DB | | TBD | TBD | TBD | standard lamp R5 intensity | problably already in DB | | TBD | TBD | TBD | standard lamp R6 intensity | problably already in DB | | TBD | TBD | TBD | Run/Stop | difficult to parse | | TBD | TBD | TBD | UV Index | already in DB | | TBD | TBD | TBD | Ozone Quality indicator | TBD | | TBD | TBD | TBD | UV Quality indicator | TBD | TBD: to be defined. NA: not applicable (there are no value to be tested) === Error Information Extracted === Additionally to the information extracted stated above, we need to extract more information because we need to check if the error is a false positive in the sources of error. In addition, we could need create complex database querys to make a more accurate diagnosis. The information that we extract from the files will be described below. ^ Field ^ Description ^ ^ 'brewerid' | unique identificator from which brewer the error comes ^ ^ 'date' | date when the error was produced ^ ^ 'gmt' | time in GMT when the error was produced taken from the files ^ ^ 'level' | integer that represents the importance of the error ^ ^ 'code' | unique error code for a given error type ^ ^ 'filename' | the file's name where the error was extracted ^ ^ 'line' | the line inside the file where the error was extracted ^ ^ 'message' | the original message extracted from the file ^ ^ 'count' | ocurrences of the same error type in this file until now ^ === Errors pending to implement === == Time related Errors == The following errors: UTC date failure, Julian date failure and Time failure, are difficult to extract. The only way to know the status of the remote clock is with the files' modification time. When we are copying those files to the local machine, we lose the modification time and its imposible to check if the remote clock is shift. Proposed solution: We need to create a mecanism that allows the status tool to pool the remote clock in order to check if is synced. == Manual sighting request == This problem is hard to parse because in the B or D files appears to be a user command. This means that the user can type a error message in their own language. Sometimes we found the message in english but almost the majority of times we found this messages in a different language which is imposible to parse. === Flow Graph === {{:wiki:flow_diagram_diagnosis_tool.png?500|}} - Every 15 minutes the eubrewnet system pools the B and D files from the brewers. - The diagnostic tool parser receives it as input. - The output is written to database and files. - The Eubrewnet webpage shows the information extracted before. === Front End === The front end consist of two views, the main view that is the brewer overview and the detailed view. == Main View == When the users get into the frontpage, they can see a map with the status of the brewers. A code of colours states the status of the brewer. green means OK, yellow means WARNING, orange means ERROR, red means CRITICAL, grey means no data since the last 24 hours and finally black means that last reception was 15 days ago. example: {{:wiki:monitor_map_overview.png?700|}} == Detailed View == When the user clicks in a brewer, the user will be redirected to the details of the brewer. In the Brewer Status section we can see a list of original messages taken from D or B files explaining the status of the brewer. The next example shows a brewer in a OK status. {{:wiki:monitor_detailed_view1.png?700|}} The next example shows a brewer with several CRITICAL problems. {{:wiki:monitor_detailed_view2.png?700|}} === Detailed operation === The entry point of the diagnosis tool is monitor.py. The pseudo code of monitor.py is the following: - create a list of path to posible files - test if this file exist - parse the file - write the status on file - write the status on database The third item on the previous list is the parser itself and the file that contain the parser is parseSingleFile.py parseSingleFile is a parent class of parseDFile.py and parseBFile.py. It contains many functionality to handle files and database access. ParseBFile.py and ParseDFile.py contains an abstract function called parse_lines() that takes a list of lines of a given file and then creates a list of error. The pseudocode is the following: - iterate over all lines of a file - iterate over all error types - if the line contains a specific string - create new error - push this new error in the errorList grouping by error code.