This chapter describes the position of reliability analysis in mechanical engineering design. It outlines the most frequently used reliability analysis techniques, and describes the problem statement, the research approach, and the outline of this thesis.
A designer usually uses a deterministic method to design a structure. Standards prescribe discrete values of the loads on the structure and of the allowable material stresses in the structure. The standards guarantee, that the structure will not fail, if the loads on the structure do not exceed the allowable stresses. More precisely: the probability that a structure will fail is acceptably small, if the designer follows the standards. (The reader will know, that the loads and the allowable stresses in these standards are determined with a probabilistic method, that assumes a probability distribution of the load and of the material strength.) In many cases this method is satisfactory.
However, in a growing number of cases this design method is not adequate. Since the second world war, mankind creates structures that are far more complex than ever before, such as space ships, and nuclear power plants. It appears, that when a large number of reliable components are combined into a large structure, the result is not necessarily a reliable structure. The effect of failure of these structures can be extremely large. Failure would lead to loss of human lives, or large economic damage. Because of these effects, it is very important to secure the reliability of these structures. Therefore, risk analysis experts help the designers to guarantee the safety of these complex structures. They analyse whether the system can perform its major functions, such as carrying load and executing motion using a probabilistic model. Both the load carrying capacity of a system and the external loads can show a stochastic behaviour. This thesis concentrates on modelling the stochastic behaviour of the system; minor attention is given to the stochastic character of the loads.
The risk analysis techniques evolved from analysis techniques for hazardous situations into techniques to economically optimise maintenance. Nuclear and rocket scientists first applied the techniques in the United States. The development of nuclear power plants and the storm surge barriers in the Delta Works introduced risk analysis in The Netherlands. The experts, who worked on these projects, now teach students at Delft University of Technology how to use risk analysis [41], and use the techniques to economically optimise the maintenance of the Dutch conventional power plants. They also give lectures [23] to transfer their knowledge to a larger group of people, and to stimulate the application of risk analysis. Originally, the techniques were only applied in cases which could lead to hazardous situations. Now, they are applied more and more in everyday cases.
The most frequently used risk analysis techniques are: Failure Modes and Effects Analysis (FMEA), Fault Tree Analysis (FTA), and event tree analysis. Appendix D, Beem and others [8], Henley [11], Rao [12], Knezevic [13], Van Gestel and others [22], O’Connor [44], Carter [45], Zacks [55], Lewis [56], Ushakov [57], Vrijling [60], and Ansell [64] describe these techniques in detail. This section only gives a brief description of the techniques:
Fault tree analysis is the most commonly used technique, because the results of the analysis quantify the reliability, making it possible to compare design solutions, and localise the critical spots in a structure.
This technique, however, is complex and labour intensive. First, to construct the fault tree it is necessary to understand both the behaviour of the structure to be analysed as well as the fault tree analysis method. Second, to analyse the fault tree it is necessary to understand the rules of Boolean algebra. Third, to quantify the probability of failure, it is necessary to understand probabilistic mathematics and know the appropriate failure data of the components of the structure. Yamashina [35] says: ‘However, the fault tree construction itself is a tedious, error prone, and timeconsuming task.’
Therefore, the designer usually does not execute the fault tree analysis himself. A risk analysis expert assists him in this task. The expert analyses the structure at the end of the design process, when the lay out of the structure has been determined. However, in this stage of the design process, it is not possible to introduce major changes to the structure. Therefore, the results of the analysis have little influence on the design.
The reliability analysis would have a major influence on the design, if it were to be applied during the conceptual design phase. This would result in more reliable and less expensive structures; a structure that is reliable in concept is less expensive than a structure that is not reliable in concept, but was improved in a later phase of the design process.
The introduction of reliability analysis in the conceptual design phase would have consequences on the design process. To achieve this, the designer and risk analyst would have to work closely together. The risk analyst would have to make fault tree analyses for many variants, increasing the costs of the design.
Automation can make the analysis less complex, can reduce the time of an analysis, and can prevent errors. Therefore, attempts are made to automate parts of this process. Software that analyses the fault tree and quantifies the probability of failure, was implemented successfully (Isograph Ltd. [69], and many others). Kócza [28], Robitaile [32], O’Hern [32b], Sacks [33], Matsuoka [34], Yamashina [35], Kohda [36], and Takahashi [37] also attempt to automate the construction of the fault tree. They all use some sort of flow diagram to model the structure being analysed. Their software automatically generates a fault tree from the flow diagrams and analyses it.
This thesis moves one step further: can automation support the reliability analysis in such a way that the designer can execute the analysis himself? In this way the designer would not have to depend on the availability of a risk analysis expert. He would not have to wait for the results of the analysis. He could execute the analysis himself, and immediately decide whether the design should be improved or not. The designer could optimise the reliability of a structure by determining the reliability for a number of concepts. Thus, reliability analysis could be applied in many more cases than before. This should lead to better designs.
To verify this idea, this thesis will answer the following questions:
This thesis focuses on the analysis of the construction only, but it should be clear to the reader that failure of the structure is only a (small) part of the total risk considered by a risk analyst. The paragraphs below give a global overview of the environment of a structure. The structure and its environment form a system. Vrijling [42], [43] divides this system into five layers:
The first two layers of this system cause risk to the next three layers of the system. Examples of risks caused by the natural system are storm, extremely high tide, and flooding. Examples of risks caused by the technical system are the collapse of a structure, or the explosion of a structure, such as a chemical plant.
The amount of risk that is acceptable depends on the benefits, that people receive from taking the risk. The people in the fifth layer, the bystanders, receive no benefits. Thus, the acceptable risk for them is very low. The people in the fourth layer, the users, receive some benefits from using the structure, the bridge, electricity. Thus, the acceptable risk for them is higher. The people in the third layer, the professional system, receive benefits in the form of salaries. They receive the highest benefits. Thus, the acceptable risk is the highest.
The paragraph above describes the acceptable risk for an individual. A different approach is necessary for a group of individuals. If a large group of people is exposed to the same source of risk, failure could lead to a large number of casualties. Society does not accept large numbers of casualties, although the individual risk is equal for all individuals, and might be acceptably low. Therefore, the acceptable risk is lower for risk sources that can cause a large number of casualties, than for sources that can cause smaller number of casualties.
Of course it is technically possible to reduce the risk, caused by a structure, to a very low level. However, to reduce the risk, extra investments in the structure are necessary. Section 2.1 discusses the economically optimal acceptable risk.
This thesis discusses the analysis of the second layer, the technical system, the structure. Within the analysis of the structure, the analysis of the drive train is only a small part. Berenbak [63] divides the reliability analysis of the storm surge barrier in the New Waterway into many subanalyses of subsystems, such as the civil structure, the mechanical installation, electrical installation, the software. The TAW [24], [25] also demonstrates, that the drive train is a small part within the total risk analysis. It recognises three subsystems within the analysis: First, the water barrier - the dike. Second, the artefact in the barrier - the navigation lock, the storm surge barrier. Third, the closing operation of the artefact. However, the drive train plays an important role in the total safety, because the closing operation, in which the drive train plays an important role, consumes a large part of the allowable probability of failure.
The third and fourth layers do not only determine the allowable probability of failure, they also influence the actual probability of failure of a structure. The way a structure is maintained and used can have both a positive and negative influence on the performance of the structure. This thesis does not discuss this subject, and assumes that the circumstances for the analysed structures are all equal and comparable with the circumstances in civil structures in The Netherlands.
The term Computer Aided Design system suggests, that such a system actively supports the designer in the design process. The classic CAD systems, however, are more or less automated drawing boards; these programs support drafting, not designing. A CAD-drawing consists of geometry, and the designer, not the system, knows the interpretation of the geometry. Thus, a classic CAD system offers only limited design support.
The drafting programs do not use the extra possibilities of a computer, which can not only store the geometry, but also the semantics (meaning of objects and relationship between objects) of a design. The extra knowledge, that is stored in this manner, offers the possibility to automate design analysis. Since analysing design solutions plays an important role in the design process, integration of this task in a design system can improve the support of the designer. The system can present the results of expert calculations to the designer directly, which increases the speed of the design process. Thus, more variants of designs can be made and compared, design will take less time, so more alternatives can be studied, and the quality of the designs will be higher.
The individual components of a design and the way they are connected determine the analysis results. Therefore, a design should not consist of geometry only, but should be built up from components with meaning, elements, or objects, that are recognisable to both the computer and the designer. In this way, a design will not be built from geometric entities, such as lines, circles, and arcs, but from meaningful construction components, such as: gears, roller bearings, shafts, rack and pinions, and hydraulic cylinders. The computer can interpret the design. Schwab and Van der Werff [1] call this method Design with Discrete Components, while Wouters [2] calls this method Design with Design Elements. More generally, it is the method of Primitive Instancing (Taylor [68]).
This research project implemented this idea in software. The software contains a component modeller that helps the designer to compose a structure, and an analysis program that calculates the reliability of the structure. The modeller enables the designer to think in terms of mechanical components, rather than in terms of reliability analysis. The modeller stores the design as components instead of plain geometry. It also stores nongeometrical data, such as the probabilities of failure, with the components. The modeller is coupled directly with the fault tree analysis program, which determines the failure modes, and quantifies the (un)reliability of a design. Furthermore, it designates which components of a structure influence the reliability the most.
To automate the fault tree analysis, it is necessary to make a more abstract description of the functions of a structure. It is too ambitious to describe all functions of every type of structure. Therefore, this research project applies the theory in design practice, which was found at the Mechanical Engineering Department of the Construction Division of the Ministry of Transport, Public Works, and Water Management. This department designs drive trains of moveable bridges, lock gates, and other structures. A reliability analysis is part of the design process of these structures. The law and standards demand such an analysis of structures in water barriers. The customers also ask for a reliability and availability analysis of other structures. Until now, external experts executed the analyses. However, the design department wishes to integrate reliability analysis into the design process to improve their designs.
The major functions of a drive system can be decomposed into a function carry load and a function execute motion. Thus, an abstract description for only two functions is necessary. It is possible to describe these two functions with a specially adapted finite element theory. The equilibrium equations, DT. sigma = f , describe the function carry load, and the continuum equations, epsilon = D.u , describe the function execute motion. This idea was inspired by Besseling [10] and Van der Werff [17].
The analysis program uses these equations for the reliability analysis. Assume that a particular combination of components has failed. It is possible to express this in the finite element equations. When it is not possible to find a permissible stress distribution that can carry the load, or kinematically permissible displacement field u that realises the desired motion, the combination of failing components is a failure mode. All failure modes are found by trying all combinations. Finally, the probability of failure for all failure modes is calculated.
A fault tree analysis can only describe the reliability concerning one function of a system at a time for one specific geometrical configuration and load case. A reliability analysis should consider all significant functions of a structure. Therefore, the designer should make a separate fault tree analysis for each function.
Chapter 2 describes the design process, various reliability analysis techniques, and the functions of a drive train. It ends with an overview of the automated reliability analysis method.
Chapter 3 discusses the finite element method and an adaptation to the method to describe the behaviour of hydraulic components in drive systems. The classic finite element theory is appropriate to describe the behaviour of mechanical components, such as a shaft, roller bearings, and a pair of gears. However, the theory is not suitable to model hydraulic components, such as pipes and valves. To model these components, this thesis uses a specially adapted finite element theory, that is analogue to the theory for mechanical components. The theory for mechanical components describes the mechanical domain, while the theory for hydraulic components describes the hydraulic domain. Schwab and Van der Werff [1] combine both domains into a multi domain system. This thesis also combines both domains, but with a slightly different method.
Chapter 4 describes in detail the software that was built. It introduces a new concept of program architecture for finite element software, which fully separates the implementation of the algorithms from the implementation of the elements. Usually, the programmer stores the definition of the elements in separate source files. The main program calls the routines in these files, for instance: a routine to number nodes, a routine to number deformations, a routine to build matrices, and many more. This program architecture requires the implementation of the same general theory in many different places in the source code. The programmer must adjust all source files, when changes are made to an algorithm or a new one is developed. When creating a new element, many algorithms must be copied and adjusted. Adjusting one algorithm is easily forgotten.
This research project produced both new algorithms, such as an algorithm to determine the failure modes of a structure, as well as special elements, such as a gear element (Rankers [70]). The architecture of the developed program tackles the problem described above. The architecture separates the implementation of the theory and the implementation of the elements in such a way, that changing or developing a new algorithm can not introduce errors in the elements, and vice versa, changing or developing a new element can not introduce errors in the algorithms. To achieve this, the elements are defined in recipe files (see Figure 1.4-1). The program reads these files at run time. The advantages of the recipe file architecture are:
Chapter 5 describes methods to quantify the minimal cut sets - the failure modes - of a fault tree. These methods are not new; enough literature is available describing various models to quantify minimal cut sets. It is not always clear when and how these models should be applied. Chapter 5 gathers theory from various sources, and describes how it can be applied in the analysis of drive trains of civil structures.
Van Gestel and others [22] describe a method to calculate the unavailability from a fault tree. They consider unavailability only. This is an important parameter for the reliability of continuously operating systems. However, for discontinuous operating systems the probability of failure is also of importance. Furthermore, Van Gestel and others only consider continuous and discontinuous processes. However, it is not sufficient to consider these two processes only (Kiestra [6] and Van Geijlswijk [9]). Failure of a system can occur during four phases: rest, start, action, and stop. Rest and action are continuous processes, while start and stop are discontinuous processes. Different failure mechanisms take place in each phase. Example: a roller bearing fails due to corrosion during the rest phase, and it fails due to fatigue during the action phase. Different data should be applied for each failure mechanism, though both are continuous processes. Chapter 5 demonstrates how to calculate the probability of failure and the unavailability in each phase. Depending on the phase different models must be used.
Chapter 6 elaborates on the quality aspects of this project. It divides the quality of the project results into three categories: the quality of the results of the algorithms that were developed, the ability to model and analyse real structures, and the influence of the software on the designs of these structures. Chapter 6 refers to the quality control of these categories as: qualification, verification, and evaluation.
Chapter 7 describes the introduction of the automated reliability analysis software in the design process. The introduction of software does not always succeed. Chapter 7 discusses the factors for success (and failure). It is only possible to successfully introduce such a design system, when the users accept the system and are willing to use it. A test group of future users can ease the acceptance of the system. Test group members should be carefully selected: they should be representative for the total group of users, they should be held in high esteem by their colleagues, and they should be willing to accept changes - young people accept changes easier.