Category Archives: CS-443

Creating a Transpiler

A transpiler is a program that converts code from one programming language to another programming language. This is comparable to a compiler, which is a transpiler that converts into machine code. It is also related to an interpreter, which behaves similarly, except rather than writing new code, it performs the code.

In my work on the Sea programming language I’m making, I took a long time writing a custom system for transpiling. However, while it succeeds at managing indentation pretty well, it makes actually transpiling statements much more challenging. So, recently I’ve gone back to the drawing board and have decided to pursue the classic model. If it ain’t broke, don’t fix it.

I’m working off of David Callanan’s Interpreter Tutorial. While it’s a very useful tutorial, the code is admittedly pretty poor, as it contains a few files with hundreds of lines. I’m also using Python exceptions to carry errors, since as far as I’m aware, Python has one of the safest exception systems (unlike C++). I can safely catch and handle exceptions to create useful messages for the user. The tutorial, on the other hand, is manually passing around errors from function to function. That said, the explanations are decent and it is a very useful tutorial. I’ll just have to make a lot of modifications and refactoring after each episode in the tutorial. That said, let’s go over how a transpiler works fundamentally:

The Process

The first step in transpilation is reading the source file. The lexer goes character by character and matches them to a set of predefined tokens. These tokens define a significant part of the syntax of a language. If it doesn’t recognize symbols, it can give an error that alerts the programmer. If there aren’t any errors, the lexer will go through the entire file (or files) and create a list of these matched tokens. The order of the list encodes the order that elements appeared in the file. Empty space and otherwise meaningless syntax symbols are not passed on.

Next, the list of tokens is sent to the parser. The parser will then go through the list of tokens and create an Abstract Syntax Tree (AST). This is a tree of tokens whose structure encodes the order of operations of the language’s syntax. In this stage, the order of the list is lost; however, that order isn’t important. What matters is in what order tokens should be. For instance, the list of tokens for 5+22*3 might look something like [INT:5, PLUS, INT:22, MUL, INT:3] and the list of tokens for (5+22)*3 might look like [LPAREN, INT:5, PLUS, INT:22, RPAREN, MUL, INT:3]. The ASTs for these token lists will look something like this respectively:

Created on https://app.diagrams.net/

Lastly, you then traverse the tree using depth-first-search (DFS), or more specifically, Preorder Traversal of the tree. This means we start at the root node and we the work our way down the left side and then down the right side. This is incredibly simple to implement using recursion. Each new node you check can be treated as the root to a new tree where you can then proceed to repeat the search. This occurs until the entire tree is traversed.

In this final stage, this is also where transpilers, compilers, and interpreters differ. Until now, the same code could be used for all three. At this point, if you want a transpiler, you use the AST to write new code. If you want a compiler, you use the AST to write machine code. If you want an interpreter, you use the AST to run the code. Notice this is why there is such a performance benefit to using a compiler over an interpreter. Every time you interpret code, assuming there is no caching system in place, the interpreter has to recreate the entire token list and AST. Once you compile code, it is ready to be run again and again. The problem then comes from compiled code potentially being more complicated for higher-level language features, and thus making it a pain to write a new compiler for every CPU architecture, due to different architectures using different machine instructions.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

Unit Testing: Principles of Good Tests

This week’s post is yet again about unit testing, but this time focuses on a much more broad question. After spending the past two posts trying to determine exactly what unit test and the variety of patterns available it is only natural that this next post focuses on how to write the tests well. As someone who personally has not written many, I can acknowledge that there may be some best practices I am not aware. Thus for this week’s post I am going to discuss another blog post, this one by Sergey Kolodiy, that goes into how to write a good unit test.


So how do you write a good unit test? Conveniently enough Sergey has compiled some principles, which are that the tests be easy to write, readable, reliable, fast, and truly unit, not integration. Easy to write and readable are pretty straightforward, and go hand in hand, as both just mean the tests should be easy to implement to cover lots of different cases and the output of the tests should easily identify problems. As for being reliable this means the tests must be giving the correct output, in addition to actually detecting bugs rather than just passing. Sergey also brings up a good reason for keeping the tests fast, being that lazy developers might skip the tests if they take too long. Finally there is the truly unit, not integration principle, which sounds more complex than it is. This simply means that the unit test and system should not access any external data or resources, such as a database or network, which will ensure that the code itself is working. Sergey chooses to focus on another very important part of writing good unit tests after this.

The rest of this blog revolves around discussing writing testable code as a good unit testing principle. He states a plethora of examples to show some bad practices, such as using non-deterministic factors. To clarify, this means some variable in a method that can have different values every time it is run; the example he uses helps  put this into perspective more effectively. The original purpose of this post was simply to discuss writing the tests themselves, so I do not want to stray too much. I just wanted to mention this part, as it is interesting! If you want to learn more check out the link below.

Source:

https://www.toptal.com/qa/how-to-write-testable-code-and-why-it-matters

From the blog CS@Worcester – My Bizarre Coding Adventures by Michael Mendes and used with permission of the author. All other rights reserved by the author.

Grey Box Testing

Black and white box testing are the testing methods you usually hear about, but what is grey box testing? You probably have done a sort of grey box testing multiple times before learning other structured testing methods. While in black box testing the code structure is known and in white box testing the structure is unknown, in grey box testing, the structure is partially known. Grey box testing is sort of a combination of both black and white box testing. For example, when testing a drop down menu in a UI that your are creating, you can test the drop down on the application then change its internal code and try again. This allows you to test both sides of the application, its representation and its code structure. This is primarily used for integration testing.

The main advantages of using grey box testing include that it combines the pros of both black and white box testing while eliminating many of the negatives for each, you get the testing and feedback from both the developers and testers creating a more solid application, and makes the testing process quicker than just testing one at a time. The saved time from this also allows more time for developers to fix these issues. Lastly, it lets you test the application from both the developers and the users point of view. Some negatives of grey box testing are that there is usually only partial access to the code so you do not have full code coverage of what you are testing and also lacks in defect identification.

Grey box testing does not mean that the tester must have access to the source code, but that they have information on the algorithms, structure, and high level descriptions of the program. Techniques for grey box testing include matrix testing – states status report of project, regression testing – rerunning of the test cases once changes are made, orthogonal array testing, and pattern testing – verifying architecture and design. Grey box testing is highly suitable for GUI, functional testing, security assessment, and web services/applications. Grey box testing is especially good for web services with their distributed nature.

Sources:

Gray box testing. (2021, January 31). Retrieved April 02, 2021, from https://en.wikipedia.org/wiki/Gray_box_testing

What is grey box testing? Techniques, example. (n.d.). Retrieved April 02, 2021, from https://www.guru99.com/grey-box-testing.html

From the blog CS@Worcester – Austins CS Site by Austin Engel and used with permission of the author. All other rights reserved by the author.

Unit Testing: What Types to Use

Now that we have a good base understanding of unit tests we can dive a little deeper into the subject. When reading through the previous blog I saw mentions of different types of unit tests and my interest was piqued. From past examples I had seen, I assumed all of these tests followed the same format. Thus for this week’s post I wanted to discuss the different types of unit tests, as I only just learned that there were multiple. To aid in this I found a blog post from a programmer named Jonathan Turner who clarifies what each type is.


This blog post identifies three major types of unit tests, these being arrange-act-assert, one act, many assertions and finally test cases. The arrange-act-assert format is the more traditional method of unit testing and the one that most people are probably familiar with. This format involves setting up the conditions for the test, running the code with the test conditions, and subsequently examining the results of the test. As for the one act, many assertions pattern it uses the same basic setup as the previous pattern, but differs in having multiple assertions about the code at the end of the test. Finally there is the test cases patterns, which takes a different approach than the other two by using a collection of many inputs to check their respective outputs. Now that we understand what each of these patterns are we can discuss their advantages.

Each of these tests have their own use cases where they will be most efficient. The arrange-act-assert pattern is the traditional method and, thus, the most straightforward to implement. This pattern should mostly be used for testing specific conditions or situations of a certain system. The one act, many assertions pattern is best used when you have code that has different sections that each act independent of each other. To clarify, use this if testing a method that has multiple blocks of code that do not affect each other, but must each be validated. Finally, the test cases method is very advantageous if you have a program that has a wide span of input output values. This could be one implementing an algorithm that converts values; the blog post gives a very good example. I hope that this post gave you a glimpse into the variety of unit tests available and would recommend checking out the blog post by Jonathan Turner for further information.

Source:

https://www.pluralsight.com/tech-blog/different-types-of-unit-tests/

From the blog CS@Worcester – My Bizarre Coding Adventures by Michael Mendes and used with permission of the author. All other rights reserved by the author.

Software Quality Assurance and Testing Blog Post #3 (Black-Box vs. Gray-Box vs. White/Clear-Box Testing)

On the first exam for my Software Quality Assurance and Testing course, and in activities previous to it, Black-Box, Gray-Box, and White/Clear-Box Testing were important topics/definitions to thoroughly understand. Not only did we have to know the meanings of these terms, but we had to be able to compare them and know how those testing methods are used. White/Clear-Box Testing is when the tester knows the contents of a function or method. This comes with its advantages and disadvantages of course. The advantages are that it is very easy to navigate the complexity, get legible test cases, and makes debugging smoother. The disadvantages would include bias being used by the tester and possibly longer and more expensive testing in general. On the other hand, Black Box testing is quite the opposite. The tester is not able to view the inner workings of the function/method and is only able to test based on what inputs are given and what outputs are received. Although this seems counterintuitive for a testing method, it also has advantages and disadvantages that make it a viable option. The advantages would be that it would take less time and expenses to test and that it eliminates tester bias altogether. The disadvantages are that because the tester is not able to see the inner workings of the function or method, it makes it harder to debug, find complexity, and have easy to read test cases. The two methods are basically opposites. Lastly, Gray-Box Testing is somewhere in-between the two. The tester knows a little bit about the inner workings of methods and functions, but is not focused on them completely like in White/Clear-Box Testing. This makes all of the advantages and disadvantages even out more overall which could be good in some cases but could also not be a valid testing option in other cases. Before this semester, I actually had never even heard of these terms, and it was interesting to go through and research them for this post and for my course!

From the blog CS@Worcester – Tim Drevitch CS Blog by timdrevitch and used with permission of the author. All other rights reserved by the author.

Improved Testing Methods

As a beginner programmer, testing my code meant putting in a few inputs, and if the code ran then I had myself a successful program. Recently, I’ve learned of two better testing methods that give you the information needed to ensure that the program can run given any particular scenario. These are Boundary Value Testing and Equivalence Class Testing. While they do work differently, both of these methods are similar in that they each choose an input based on the pool of values given the set conditions.

In Equivalence Class Testing, the focus is on the conditions. Looking at the given conditions, we can determine which values are valid and which are invalid. The range of valid values as well as the range for invalid values are the pool of values that will be tested, which are divided into intervals. For each interval, given an input, if it passes then it stands to reason that all inputs within the range of values will pass for that interval. By the same reasoning, if an input does not pass, all inputs within the range will not pass.

For example, consider a program that is testing for a vending machine, with a variable named cash for the amount of money the machine can accept. The range of valid values for cash is 0 <= cash <= 100. Now if we put in a value for cash that is between 0 and 100 and it passes, that means all values between 0 and 100 will pass. Likewise, if the value does not pass, then all values between the range will not pass. All values below 0 and above 100 are invalid, so testing those numbers will result in an error.

Boundary Value Testing is similar, in that we take valid and invalid values and test them. The difference is that there are 5 particular values we are testing. Say that the pool of values are between 0 and 100 inclusively. The values that will be tested are:
1. the minimum valid value, 0
2. the maximum valid value, 100
3. a nominal value between the minimum and maximum, 20
4. a value just below the minimum, -1
5. and a value just above the maximum, 101.

These inputs test all possible scenarios, therefore if they all pass, then the program is successful. The minimum, nominal, and maximum values test all valid inputs, and the minimum below and maximum above values test invalid inputs.

Equivalence testing and boundary testing are both great methods to use when testing your program. They can both be used to test valid and invalid values, and by doing so, are capable of ensuring that a program is error free.

Helpful Source:

https://www.guru99.com/equivalence-partitioning-boundary-value-analysis.html

From the blog CS@Worcester – CSBlogger by mjaber54 and used with permission of the author. All other rights reserved by the author.

Erockwood Post #3

Today we will be discussing some things I find interesting in one of my classes. In this class, we are learning about cloud, parallel, and distributed computing. I find it very interesting that we are able to use one fast computer to send tasks to a bunch of slower computers to calculate data in a very quick and efficient manner, rather than just running the data through one singular computer. I also find it interesting how systems like Hadoop are able to maintain data integrity by storing all information in three different places so that if a computer dies, it still has the data in two other places, in which it will then get the data to another computer to keep the data in three places going.

We are currently discussing machine learning, which I personally think is a little overrated. All it really is is just telling a computer when it guesses if it is right or wrong then it adjusts its guesses accordingly until it gets about 99% correct. This is called training. Once the computer is “trained” it is good to go. Statistics and machine learning go hand in hand, as both are used to take data and use that data to predict outcomes based on the input data. One subset of machine learning is data mining, which is many different methods used to extract insights from data.

From the blog CS@Worcester – Erockwood Blog by erockwood and used with permission of the author. All other rights reserved by the author.

Unit Testing: What and Why

For this class I felt the most fitting first post was one that explained what unit testing is, as this will be key for understanding more complicated topics further on in the semester. I have only had brief experiences with unit tests in the past, usually being included in code I had from a professor. Thus this is still a relatively foreign concept to myself, granted the general purpose is pretty self explanatory. To tackle this subject I found a blog post that goes sufficiently in depth about this topic, explaining what it is and why it’s useful.


So first things first, what exactly is unit testing? As it’s defined in the post on the testingxperts.com blog, this is a testing type done in the earlier stages of the software. This is advantageous as it allows testers and developers to isolate specific modules for any necessary fixes, rather than having to deal with the whole system. Unit tests usually consist of three phases which are arrange, act, and assert. The arrange phase involves implementing the part of the program that is to be tested in your testing environment. The act phase is where you define the test’s stimuli, what could or could not break the function. Finally there is the assert phase, where the behavior of the program is observed for any potential abnormalities. So now that we have a common understanding of these tests, we can ask ourselves why we should even bother using them.

Admittedly, on the few past occasions I had seen these, I thought the unit tests were mostly superfluous and not very useful. In reality however, these tests can be an invaluable asset for larger systems and can save a lot of time and pain from trying to bugfix further into development if used properly. The primary, and pretty major benefit, is that components of a program are tested early on in the process and individually. This is great for the immediately obvious reason, being that you do not have to worry about progress halting further on in the project because you are testing long before then. Besides this, it also helps the development teams understand the code better, as the tests must be tailored to the specific component they are working on. This familiarity can have other benefits as well, allowing certain members knowledge of how code in some components can be changed or reused for further benefits. Then there is also the fact that this debugging process is simpler than a more traditional approach, as it is done before some of the more complex code is written to interconnect components. These are the major reasons listed in the article, and that I could deduce, but there could very well be more! I do not quite have enough space to discuss the rest, but I would recommend reading the rest of the article if you have any interest in unit testing as it goes into further detail.

Thanks for reading!


Source:

From the blog CS@Worcester – My Bizarre Coding Adventures by Michael Mendes and used with permission of the author. All other rights reserved by the author.

Erockwood Post #2

Software testing is very important, as it allows for one to test their/others programs for mistakes, whether it is logical mistakes or syntax mistakes, to ensure that software performs as expected as much as possible. A common testing suite for Java programs is JUnit 5. JUnit 5 comprises of 3 sub-projects, being JUnit Platform, JUnit Jupiter, and JUnit Vintage. The differences between the three are that Platform is used as the base foundation for the frameworks of the Java Virtual Machine(JVM). Jupiter is the model for writing tests and other extensions in JUnit 5. Lastly, Vintage is used for running JUnit 3 and 4 based tests on the newer JUnit 5 platform.

Projects like JUnit are very important, because they allow you to automate testing as much as possible. This allows a tester to save time, by not having to test one thing at a time, they can test all sorts of things at once. It also allows the tester to keep testing methods separate from the main code so they do not accidentally mix and mess some things up. JUnit allows you to write tests using methods like assertTrue, assertFalse, assertEquals, and assertThrows. Each of these are useful in their own ways. For example, assertTrue can be used to ensure that, for example, a customer being created with a customer ID of 1 can be tested to ensure that we expect the customer ID to be 1, then pass in a customer object with the ID 1, to ensure that they are equal. Another thing we can use is assertThrows, to test that invalid parameters throw the correct error, or to make sure an error is thrown at all.

Source:

Stefan Bechtold, S. (n.d.). JUnit 5 user guide. Retrieved March 30, 2021, from https://junit.org/junit5/docs/current/user-guide/

From the blog CS@Worcester – Erockwood Blog by erockwood and used with permission of the author. All other rights reserved by the author.

Scrum Quality Assurance

            Given that we are working in a Scrum team in our Software Development capstone and that we got ample practice with the Scrum in our Software Project Management course, I thought it would be interesting to see the crossover between quality assurance and the Scrum workflow. Scrum is also widely used, so this should be important to know for the future as well. I found this article: https://medium.com/serious-scrum/how-does-qa-fit-with-scrum-4a92f86bec5b which talks about the role of quality assurance on a Scrum team.

            It’s important to remember that the members of a development team do not have pre-defined roles. It’s assumed each member of the team can collaborate with any part of a project, even if certain members have more specialization in certain areas, like product testing in this case. With that in mind, it makes the Definition of Done all the more important. Requirements for testing should be documented and understood by the development team and the product owner. This prevents conflicts during the sprint review if the development team thinks something like compatibility testing needs more work and the product owner is ready to deploy. If there is a more comprehensive Definition of Done, these conflicts are avoided since they would have been discussed ahead of time.

            Quality assurance on a Scrum team is a large part of the process and requirements are developed during sprint planning. It’s important that there is close collaboration during the sprint between the development team and the product owner though, even though the development team is delegated most of the decision-making responsibility for how work will be completed. This keeps the whole team in tune and avoids conflict. It also provides for a closer, more efficient work environment as understanding is enhanced throughout the team and the owner.

From the blog CS@Worcester – Marcos Felipe&#039;s CS Blog by mfelipe98 and used with permission of the author. All other rights reserved by the author.