COMPUTER SYSTEMS RESEARCH
Software Testing 3 - Process Modeling
3rd Quarter 2006
- Also see the Testing Report worksheet to attach your results

  • Resources:

    Mathematical modeling for verification and validation - find formulas that predict outputs and behaviors independently of running your program. When you run your program, does it behave in a way that is consistent with the predictions of the mathematical formulas? Investigate why your program may act differently from what is predicted by the theoretical math formula(s)

    1. Vocabulary for process modeling
      • Process modeling
           Process modeling is the concise description of the total variation in one
        quantity,  
        y
        , by partitioning it into
        
        1. a deterministic component given by a mathematical function of one or more other quantities,  
x_1, x_2, \ldots
, plus

        2. a random component (What is process modeling?)
      • Model components:
        	"There are three main parts to every process model. These are
        
        1. the response variable, usually denoted by y,

        2. the mathematical function, usually denoted as f(\vec{x};\vec{\beta}), and

        3. the random errors, usually denoted by \varepsilon. (terminology)
      • The response variable
          The response variable, y, is a quantity that varies in a way that  we hope to be able to 
          summarize and exploit via the modeling process.  Generally it is known that the variation 
          of the response variable is systematically related to the values of one or more other variables 
           before the modeling process is begun, although testing the existence and nature of this 
           dependence is part of the modeling process itself.
        
        

      • The mathematical function
           The mathematical function consists of two parts.  
           These parts are the predictor variables,  x_1, x_2, \ldots , and the parameters, \beta_0, \beta_1, \ldots.  
           The predictor variables are observed along with the response variable. 
           They are the quantities described on the previous page as inputs to the mathematical function, f(\vec{x};\vec{\beta}).  
           The collection of all of the predictor variables is denoted by \vec{x} for short.  
           

         \vec{x} \equiv (x_1, x_2, \ldots)

        The parameters are the quantities that will be estimated during the modeling process. Their true values are unknown and unknowable, except in simulation experiments. As for the predictor variables, the collection of all of the parameters is denoted by  \vec{\beta} for short.

         \vec{\beta} \equiv (\beta_0, \beta_1, \ldots)
        The parameters and predictor variables are combined in different forms to give the function used to describe the deterministic variation in the response variable. For a straight line with an unknown intercept and slope, for example, there are two parameters and one predictor variable

         f(x;\vec{\beta}) = \beta_0 + \beta_1x.

        For a straight line with a known slope of one, but an unknown intercept, there would only be one parameter

         f(x;\vec{\beta}) = \beta_0 + x.

        For a quadratic surface with two predictor variables, there are six parameters for the full model.

         f(\vec{x};\vec{\beta}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_{12}x_1x_2 + \beta_{11}x_1^2 + \beta_{22}x_2^2.

        (Terminology for process models)

      • Random error
            Like the parameters in the mathematical function, the random errors
        are unknown.  They are simply the difference between the data and the
        mathematical function.  They are assumed to follow a particular probability
        distribution, however, which is used to describe their aggregate behavior.
        The probability distribution that describes the errors has a mean of zero
        and an unknown standard deviation, denoted by  \sigma,
        that is another parameter in the model, like the  \beta's.
        
            (Terminology for process models)
         

    2. Four main purposes for process models:
          Process models are used for four main purposes:
      	   1. estimation
      	   2. prediction
      	   3. calibration
      	   4. optimization
      
         (Process models)
      
      • Estimation
                The goal of estimation is to determine the value of the regression function  
        	(i.e., the average value of the response variable), for a particular combination 
        	of the values of the predictor variables.
              
      • Prediction
                The goal of prediction is to determine either
        	   1. the value of a new observation of the response variable, or
        	   2. the values of a specified proportion of all future observations of the response variable 
        	for a particular combination of the values of the predictor variables
              
      • Calibration
                 The goal of calibration is to quantitatively relate measurements made using one 
        	 measurement system to those of another measurement system.
               
      • Optimization
                 Optimization is performed to determine the values of process inputs that should be 
        	 used to obtain the desired process output. Typical optimization goals might be to 
        	 maximize the yield of a process, to minimize the processing time required to fabricate 
        	 a product, or to hit a target product specification with minimum variation in order to 
        	 maintain specified tolerances.
               

    3. Statistical methods for model building
         There is often more than one statistical tool that can be effectively 
         applied to a given modeling application. 
      
         Some of the more well-established statistical techniques useful for different 
         model building situations:
      	Process Modeling Methods 	
      
      	   1. Linear Least Squares Regression
      	   2. Nonlinear Least Squares Regression
      	   3. Weighted Least Squares Regression
      	   4. LOESS (aka LOWESS) 
      
      	   Statistical methods for model building
      

    4. Basic steps for developing an effective process model
           	The basic steps of the model-building process are:
      
      	   1. model selection
      	   2. model fitting
      	   3. model validation 
      
      	These three basic steps are used iteratively until an appropriate model 
      	for the data has been developed. In the model selection step, plots of 
      	the data, process knowledge and assumptions about the process are used 
      	to determine the form of the model to be fit to the data. Then, using 
      	the selected model and possibly information about the data, an appropriate 
      	model-fitting method is used to estimate the unknown parameters in the model. 
      	When the parameter estimates have been made, the model is then assessed 
      	to see if the underlying assumptions of the analysis appear plausible. 
      	If the assumptions seem valid, the model can be used to answer the 
      	scientific or engineering questions that prompted the modeling effort. 
      	If the model validation identifies problems with the current model, 
      	however, then the modeling process is repeated using information from 
      	the model validation step to select and/or fit an improved model.
      

    5. Predictions and estimations
          	Once a model that gives a good description of the process has been developed, 
      	it can be used for estimation or prediction.