When testing a software, it is important to consider what inputs will be used and how they will affect the outcome and usefulness of the test. There are several design techniques for determining input variables for test cases, but the two I will focus on in this post are boundary value and equivalence class.
Boundary value testing is designed to, as the name suggests, test the boundary values of a given variable. Specifically, this test design technique is intended to test the extreme minimum, center, and extreme maximum values of a variable, and it assumes that a single variable is faulty. In normal boundary value testing, the following values are selected for a given variable: the minimum possible value, the minimum possible value plus one, the median value, the maximum possible value minus one, and the maximum possible value. Normal testing only tests valid inputs. Robust boundary value testing, in addition to testing the values of normal testing, tests invalid values. This includes the minimum value minus one and the maximum value plus one.
Equivalence class testing divides the possible values of a variable into equivalence classes dependent on the output that value represents. Equivalence class testing can also be divided into robust and normal where robust testing includes invalid values and normal does not. In addition, equivalence class testing can also be weak or strong. Weak testing only tests one value from each partition and assumes that a single variable is at fault. Strong testing tests all possible combinations of equivalence classes and assumes that multiple variables are at fault.
Something that has haunted me since last semester is Gradle. Gradle is a flexible tool used in the automation process that can build nearly any software and allows for the testing of that software to happen automatically. I know how to use it, but it has been difficult for me to understand how Gradle actually works. So I wanted to do some research and attempt to rectify this.
Gradle’s own user manual, linked below in the ‘Sources’ section, provides a high-level summary of the tool’s inner workings that has been instrumental in developing my own understanding of it. According to the manual, Gradle “models its builds as Directed Acyclic Graphs (DAGs) of tasks (units of work).” When building a project, Gradle will set up a series of tasks to be linked together. This linkage, which considers the dependencies of each task, forms the DAG. This modeling is accessible to most build processes, which allows Gradle to be so flexible. The actual tasks consist of actions, inputs, and outputs.
Gradle’s build lifecycle is comprised of three phases: initialization, configuration, and execution. During the initialization phase, Gradle establishes the environment the build will utilize and determines which projects are involved in the build. The configuration phase builds the Directed Acyclic Graph discussed previously. It evaluates the project’s code, configures the tasks that need to be executed, and it determines the order in which those tasks must be executed. This evaluation happens every time the build is run. Those tasks are then executed during the execution phase.
When considering how to test a software, there are two major options: white box testing and black box testing. Both are uniquely useful depending on the purpose of your test.
White box testing is a technique that examines the code and internal structure of a software. This technique is often automated and used within CI / CD pipelines. It is intended to focus on the implementation and architecture of an application’s code. As such, this testing technique is useful in identifying security risks, finding flaws in the code’s logic, and testing the implementation of the code’s algorithms. Some examples of white box testing include unit testing, integration testing, and static code analysis.
Black box testing is a technique that tests a system and requires no knowledge of that system’s code. This technique involves a tester providing input to the system and monitoring for an expected output. It is meant to focus on user interaction with the system and can be used to identify issues with usability, reliability, and system responses to unexpected input. Unlike white box testing, black box testing is difficult to fully automate. In the event of a failed test, it can be difficult to determine the source of the failure. Some examples of black box testing include functional testing, non-functional testing, and regression testing.
Grey box testing exists in addition to these. It is a combination of white and black box testing which tests a system from the perspective of the user but also requires a limited understanding of the system’s inner structure. This allows the application to be tested for both internal and external security threats. Some examples of grey box testing include matrix testing, regression testing, and pattern testing.
This week I’ll be writing about test automation since the class has already covered this topic; specifically with section five and homework assignment two. According to the link above, test automation is the practice of running tests automatically, managing test data, and utilizing test results to improve software quality. In order for a test to be automated, it must fulfill three criteria. First, the test must be repeatable since there’s no real sense in automating something that can only be run once. Second, a test must be determinant meaning that the result should be consistently the same given the input is also the same every time. Third, the test must be unopinionated which means that aspects that are tested shouldn’t be matters of opinion. These criteria allow automated testing to save time, money, and resources which results in software to be improved efficiently.
Personally, I’ve experienced how efficient test automation is through simply working on assignments; for this class even. I don’t quite remember what assignment I was working on, but I recall using gitlab’s built in automated testing environment. I was nearly finished with the assignment but there was one function that wasn’t working correctly. Instead of having to go to the directory, run the tests, edit the code, and push the changes to gitlab, I had the option of simply running the tests and editing the faulty code on gitlab. The option was extremely convenient and freed up a decent chunk of time, and that was just a minor assignment. I’d imagine that applying automated testing onto larger projects would save even more time.
A transpiler is a program that converts code from one programming language to another programming language. This is comparable to a compiler, which is a transpiler that converts into machine code. It is also related to an interpreter, which behaves similarly, except rather than writing new code, it performs the code.
In my work on the Sea programming language I’m making, I took a long time writing a custom system for transpiling. However, while it succeeds at managing indentation pretty well, it makes actually transpiling statements much more challenging. So, recently I’ve gone back to the drawing board and have decided to pursue the classic model. If it ain’t broke, don’t fix it.
I’m working off of David Callanan’s Interpreter Tutorial. While it’s a very useful tutorial, the code is admittedly pretty poor, as it contains a few files with hundreds of lines. I’m also using Python exceptions to carry errors, since as far as I’m aware, Python has one of the safest exception systems (unlike C++). I can safely catch and handle exceptions to create useful messages for the user. The tutorial, on the other hand, is manually passing around errors from function to function. That said, the explanations are decent and it is a very useful tutorial. I’ll just have to make a lot of modifications and refactoring after each episode in the tutorial. That said, let’s go over how a transpiler works fundamentally:
The Process
The first step in transpilation is reading the source file. The lexer goes character by character and matches them to a set of predefined tokens. These tokens define a significant part of the syntax of a language. If it doesn’t recognize symbols, it can give an error that alerts the programmer. If there aren’t any errors, the lexer will go through the entire file (or files) and create a list of these matched tokens. The order of the list encodes the order that elements appeared in the file. Empty space and otherwise meaningless syntax symbols are not passed on.
Next, the list of tokens is sent to the parser. The parser will then go through the list of tokens and create an Abstract Syntax Tree (AST). This is a tree of tokens whose structure encodes the order of operations of the language’s syntax. In this stage, the order of the list is lost; however, that order isn’t important. What matters is in what order tokens should be. For instance, the list of tokens for 5+22*3 might look something like [INT:5, PLUS, INT:22, MUL, INT:3] and the list of tokens for (5+22)*3 might look like [LPAREN, INT:5, PLUS, INT:22, RPAREN, MUL, INT:3]. The ASTs for these token lists will look something like this respectively:
Lastly, you then traverse the tree using depth-first-search (DFS), or more specifically, Preorder Traversal of the tree. This means we start at the root node and we the work our way down the left side and then down the right side. This is incredibly simple to implement using recursion. Each new node you check can be treated as the root to a new tree where you can then proceed to repeat the search. This occurs until the entire tree is traversed.
In this final stage, this is also where transpilers, compilers, and interpreters differ. Until now, the same code could be used for all three. At this point, if you want a transpiler, you use the AST to write new code. If you want a compiler, you use the AST to write machine code. If you want an interpreter, you use the AST to run the code. Notice this is why there is such a performance benefit to using a compiler over an interpreter. Every time you interpret code, assuming there is no caching system in place, the interpreter has to recreate the entire token list and AST. Once you compile code, it is ready to be run again and again. The problem then comes from compiled code potentially being more complicated for higher-level language features, and thus making it a pain to write a new compiler for every CPU architecture, due to different architectures using different machine instructions.
This week’s post is yet again about unit testing, but this time focuses on a much more broad question. After spending the past two posts trying to determine exactly what unit test and the variety of patterns available it is only natural that this next post focuses on how to write the tests well. As someone who personally has not written many, I can acknowledge that there may be some best practices I am not aware. Thus for this week’s post I am going to discuss another blog post, this one by Sergey Kolodiy, that goes into how to write a good unit test.
So how do you write a good unit test? Conveniently enough Sergey has compiled some principles, which are that the tests be easy to write, readable, reliable, fast, and truly unit, not integration. Easy to write and readable are pretty straightforward, and go hand in hand, as both just mean the tests should be easy to implement to cover lots of different cases and the output of the tests should easily identify problems. As for being reliable this means the tests must be giving the correct output, in addition to actually detecting bugs rather than just passing. Sergey also brings up a good reason for keeping the tests fast, being that lazy developers might skip the tests if they take too long. Finally there is the truly unit, not integration principle, which sounds more complex than it is. This simply means that the unit test and system should not access any external data or resources, such as a database or network, which will ensure that the code itself is working. Sergey chooses to focus on another very important part of writing good unit tests after this.
The rest of this blog revolves around discussing writing testable code as a good unit testing principle. He states a plethora of examples to show some bad practices, such as using non-deterministic factors. To clarify, this means some variable in a method that can have different values every time it is run; the example he uses helps put this into perspective more effectively. The original purpose of this post was simply to discuss writing the tests themselves, so I do not want to stray too much. I just wanted to mention this part, as it is interesting! If you want to learn more check out the link below.
Black and white box testing are the testing methods you usually hear about, but what is grey box testing? You probably have done a sort of grey box testing multiple times before learning other structured testing methods. While in black box testing the code structure is known and in white box testing the structure is unknown, in grey box testing, the structure is partially known. Grey box testing is sort of a combination of both black and white box testing. For example, when testing a drop down menu in a UI that your are creating, you can test the drop down on the application then change its internal code and try again. This allows you to test both sides of the application, its representation and its code structure. This is primarily used for integration testing.
The main advantages of using grey box testing include that it combines the pros of both black and white box testing while eliminating many of the negatives for each, you get the testing and feedback from both the developers and testers creating a more solid application, and makes the testing process quicker than just testing one at a time. The saved time from this also allows more time for developers to fix these issues. Lastly, it lets you test the application from both the developers and the users point of view. Some negatives of grey box testing are that there is usually only partial access to the code so you do not have full code coverage of what you are testing and also lacks in defect identification.
Grey box testing does not mean that the tester must have access to the source code, but that they have information on the algorithms, structure, and high level descriptions of the program. Techniques for grey box testing include matrix testing – states status report of project, regression testing – rerunning of the test cases once changes are made, orthogonal array testing, and pattern testing – verifying architecture and design. Grey box testing is highly suitable for GUI, functional testing, security assessment, and web services/applications. Grey box testing is especially good for web services with their distributed nature.
Now that we have a good base understanding of unit tests we can dive a little deeper into the subject. When reading through the previous blog I saw mentions of different types of unit tests and my interest was piqued. From past examples I had seen, I assumed all of these tests followed the same format. Thus for this week’s post I wanted to discuss the different types of unit tests, as I only just learned that there were multiple. To aid in this I found a blog post from a programmer named Jonathan Turner who clarifies what each type is.
This blog post identifies three major types of unit tests, these being arrange-act-assert, one act, many assertions and finally test cases. The arrange-act-assert format is the more traditional method of unit testing and the one that most people are probably familiar with. This format involves setting up the conditions for the test, running the code with the test conditions, and subsequently examining the results of the test. As for the one act, many assertions pattern it uses the same basic setup as the previous pattern, but differs in having multiple assertions about the code at the end of the test. Finally there is the test cases patterns, which takes a different approach than the other two by using a collection of many inputs to check their respective outputs. Now that we understand what each of these patterns are we can discuss their advantages.
Each of these tests have their own use cases where they will be most efficient. The arrange-act-assert pattern is the traditional method and, thus, the most straightforward to implement. This pattern should mostly be used for testing specific conditions or situations of a certain system. The one act, many assertions pattern is best used when you have code that has different sections that each act independent of each other. To clarify, use this if testing a method that has multiple blocks of code that do not affect each other, but must each be validated. Finally, the test cases method is very advantageous if you have a program that has a wide span of input output values. This could be one implementing an algorithm that converts values; the blog post gives a very good example. I hope that this post gave you a glimpse into the variety of unit tests available and would recommend checking out the blog post by Jonathan Turner for further information.
On the first exam for my Software Quality Assurance and Testing course, and in activities previous to it, Black-Box, Gray-Box, and White/Clear-Box Testing were important topics/definitions to thoroughly understand. Not only did we have to know the meanings of these terms, but we had to be able to compare them and know how those testing methods are used. White/Clear-Box Testing is when the tester knows the contents of a function or method. This comes with its advantages and disadvantages of course. The advantages are that it is very easy to navigate the complexity, get legible test cases, and makes debugging smoother. The disadvantages would include bias being used by the tester and possibly longer and more expensive testing in general. On the other hand, Black Box testing is quite the opposite. The tester is not able to view the inner workings of the function/method and is only able to test based on what inputs are given and what outputs are received. Although this seems counterintuitive for a testing method, it also has advantages and disadvantages that make it a viable option. The advantages would be that it would take less time and expenses to test and that it eliminates tester bias altogether. The disadvantages are that because the tester is not able to see the inner workings of the function or method, it makes it harder to debug, find complexity, and have easy to read test cases. The two methods are basically opposites. Lastly, Gray-Box Testing is somewhere in-between the two. The tester knows a little bit about the inner workings of methods and functions, but is not focused on them completely like in White/Clear-Box Testing. This makes all of the advantages and disadvantages even out more overall which could be good in some cases but could also not be a valid testing option in other cases. Before this semester, I actually had never even heard of these terms, and it was interesting to go through and research them for this post and for my course!
As a beginner programmer, testing my code meant putting in a few inputs, and if the code ran then I had myself a successful program. Recently, I’ve learned of two better testing methods that give you the information needed to ensure that the program can run given any particular scenario. These are Boundary Value Testing and Equivalence Class Testing. While they do work differently, both of these methods are similar in that they each choose an input based on the pool of values given the set conditions.
In Equivalence Class Testing, the focus is on the conditions. Looking at the given conditions, we can determine which values are valid and which are invalid. The range of valid values as well as the range for invalid values are the pool of values that will be tested, which are divided into intervals. For each interval, given an input, if it passes then it stands to reason that all inputs within the range of values will pass for that interval. By the same reasoning, if an input does not pass, all inputs within the range will not pass.
For example, consider a program that is testing for a vending machine, with a variable named cash for the amount of money the machine can accept. The range of valid values for cash is 0 <= cash <= 100. Now if we put in a value for cash that is between 0 and 100 and it passes, that means all values between 0 and 100 will pass. Likewise, if the value does not pass, then all values between the range will not pass. All values below 0 and above 100 are invalid, so testing those numbers will result in an error.
Boundary Value Testing is similar, in that we take valid and invalid values and test them. The difference is that there are 5 particular values we are testing. Say that the pool of values are between 0 and 100 inclusively. The values that will be tested are: 1. the minimum valid value, 0 2. the maximum valid value, 100 3. a nominal value between the minimum and maximum, 20 4. a value just below the minimum, -1 5. and a value just above the maximum, 101.
These inputs test all possible scenarios, therefore if they all pass, then the program is successful. The minimum, nominal, and maximum values test all valid inputs, and the minimum below and maximum above values test invalid inputs.
Equivalence testing and boundary testing are both great methods to use when testing your program. They can both be used to test valid and invalid values, and by doing so, are capable of ensuring that a program is error free.