Category Archives: Languages

Testing in Python and Sea

Something that is worth noting is Python’s assert keyword. Python has a built in way to create basic unit tests beyond simply printing and making comparisons. This has me thinking about my Sea programming language.

I think much of the hassle I’ve had with unit testing has come from simply getting the environment set up in the first place. If unit tests could be more like a data structure or design pattern – something you know the design of and you can simply implement – I think that would simplify a lot. So while I could create a Sea library for unit testing similar to numerous other libraries (or wait for someone else to make one in Sea), I’d prefer to create a more internal solution.

Sea is fundamentally C, so that means it doesn’t have a lot of high level features that can make unit testing easier. I mean, Sea isn’t even object oriented. That said, the entire design philosophy behind Sea is based on simple syntax that doesn’t add a runtime performance cost. One way I could achieve this is by adding another “stage” to the language. The current design of Sea involves a preprocessor, a lexer, a parser, and a visitor (interpreter, transpiler, or compiler). What I could do is add a tester that would run before the visitor, to test compile-time things (checking types or any other value known at compile time) and then a runtime tester. These could be modular features of the language itself and could be removed if desired.

Another simple solution is to add a handful of keywords to Sea similar to assert. The problem with that is C doesn’t have runtime exceptions. The design would have to be based on the notion that the program is the unit tests. I’ll continue to think this over. After all, Sea is still in its early stages. I’m currently rewriting it (just on the visiting functions now) and then I’ll add functions, pointers, etc. However, if Sea is to become a real, usable language then there will have to be ways of testing code; the simpler and more convenient, the better. After all, that’s the whole point of Sea.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

Testing Functional Code

In programming, it is essential that developers can ensure that their code works as expected. One way to ensure this is unit testing. However, unit testing and many of the general testing styles are only convenient and simple for basic types of projects. For example, unit tests are great for public libraries made up of classes and functions that can easily be tested. Not only is it easy to test these, but it is simple to understand. You just make objects and call methods and check the results. However, not all code is made up in that fashion.

A clear example of this is a programming language such as my Sea language. In theory, I could take a few weeks and create proper tests for everything involved. However, creating production-level unit tests for a project of this size by yourself is tedious and time consuming, to say the least. One way I find convenient to help test my language is to simply create a sea file with code I want to test the interpretation/transpilation of and then run both the transpiler and interpreter to check the results. Another thing I did was add a debug option to print out the generated tokens, AST, and memory.

This is admittedly janky. There is no standardization for my testing process. Luckily, most of the problems I run into either generate a Python exception, or an obviously incorrect output. That said, maybe that’s only because those kinds of problems are the only ones I find and there are tons of hidden errors that are laying quietly. That overall is one of the best things about Python, and my motivation for Sea. In Python, it is really easy to write thirty lines of code and then run it and have it execute without any errors. This is because the syntax is so simple and readable that its easy to find errors before you even run the code.

Sometimes, I think its alright to not properly commit to testing code. Code that relies more on user input than parameter input (what I’ve called functional code) can be incredibly challenging to properly test. For instance, if I were to properly test Sea, I would need to create sea files that have almost every combination of valid syntax. Not only that, but I’d need combinations of valid and invalid syntax as well to make sure the code finds the errors. The point of testing is to save time and to ensure valid code. In the case of a programming language, I think its easy enough to ensure valid code and in exchange you can save almost half of the total time you’d otherwise need to be spending.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

Starting off Clean

I have a noticeable tendency when I’m coding or even when I’m gaming. After a while of making progress, I often reach a point where I’ve learned enough to remake – whatever it is that I made – much better. So, I’ll get to work on modifying what I currently have in an attempt to improve it. This can work sometimes; usually, however, I end up with a lot of complexity laid out in front of me and I become lost in it. Then, I follow my tendency to simply start over.

In my Sea programming language for instance, I started without any real knowledge of how to make a language. I followed a tutorial with pretty bad code and modified it to serve my needs, learning along the way. I’ve recently added all of the combined assignment operators as well as loops. The next step is to add a method of mimicking main memory so I can add functions and proper variable scopes, as well as memory management. So, I want to start off by refactoring the code I currently have. I made decent progress but then I found myself with numerous files open without any new working code. Since I was rewriting basic features anyway, I finally gave in and started rewriting all of the Python code from scratch. The new code can currently be seen in the overhaul branch, until I finish and merge it. (If you’re reading this in the future, that link might not be useful anymore so try this compare link.)

So, I’d like to discuss the benefits and drawbacks of this habit of mine. First off, sometimes, starting off clean just helps manage the complexity of a problem. Being able to go through every file of code and determining which lines are good is incredibly useful and you can find things you otherwise won’t notice. With git and the ability to copy and paste, it takes hardly any time to reincorporate good code. However, I tend to still think of this process as something I ideally shouldn’t do. Maybe it would save me a lot of time to simply refactor what I need to. It’s possible I just need to get better at refactoring. However, it’s also possible that I’m simply new at a lot of this and in the future I’ll write more solid code from the beginning anyway.

I think, at least in this current Sea overhaul, starting off clean has been nothing but positive for me. This is code I’m very new with and having added so many features, I understand it significantly better than when I first wrote it. I think at the end of the day, both refactoring and starting off from the beginning are valid strategies. As we learn, we’ll learn which to use when. If you only need to make a few changes, then just refactor the code you already have. However, when you’re faced with a massive redesign, just take the time to rewrite it. The problem isn’t so much which you choose to do, but rather when you decide to do one over the other.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

Apprenticeship Pattern – Unleash Your Enthusiasm

The key focus of this design pattern is the situation in which you have more enthusiasm for software development then the rest of your colleagues. Due to this, you end up holding back to fit with the group. I have a few experiences related to both sides of this. While I’ve never partaken in a software job, I have taken part in software courses. There have been plenty of times when I’m in a group project and the topic is so interesting to me I can just get to town and code almost everything. I guess rather than having it hold me back, I end up usually embracing it.

Then there are situations similar to that which I’m in now. It isn’t precisely a mirror for the apprenticeship pattern, but it’s pretty close. I find that when I have other things I need to do are when I’m most passionate about other things. For example, I’ll most intensely want to theorize about physics when I have an assignment due; however, once I actually have free time, I’m mostly content just wasting my time playing video games. Over the last month, I’ve become fixated with Sea. I had been working on creating a Minecraft Server manager in Node.js for my friends and I, then I moved onto creating a way of backing up my playlists. Then I realized it’d probably be easiest to do it in Python, so I started rewriting the code. In that process, I fell in love with Python.

I had used it before and enjoyed it, but I was almost something of an elitist. It had no strict types, it was easy to write. I treated it as if it were a beginners language. So I never really took the time to learn it. I rewrote my code in Python and learned a lot of the joys of Python such as context managers, list/set/dict/string comprehension, etc. (I just need to figure out the SQL commands and maybe I’ll eventually finish it). I had always been aware of the fact that Python running in a single threaded interpreter will never be able to perform as well as something like C. C by itself has a lot of joy to write in. Every language has its own personality you get to learn. That said, C can be tedious to write in and to read. That began my quest to create Sea – a version of the C language with Python-like syntax. I have become passionate about designing what the ideal language would be. Something modular, with high and low level features, and is easy to write and debug. Sea is just the first step in that.

I have found that I can so easily spend six hours straight coding, debugging, and refactoring Sea code. I can then go to bed and while I’m trying to fall asleep or even while I’m dreaming, I’ll be making design decisions. Classes that are otherwise fine can seem boring by comparison. Being able to just create something functional that has a clear use case feels great. At least sometimes, it can be really easy to share that enthusiasm with other students and it overall helps all of us.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

Creating a Transpiler

A transpiler is a program that converts code from one programming language to another programming language. This is comparable to a compiler, which is a transpiler that converts into machine code. It is also related to an interpreter, which behaves similarly, except rather than writing new code, it performs the code.

In my work on the Sea programming language I’m making, I took a long time writing a custom system for transpiling. However, while it succeeds at managing indentation pretty well, it makes actually transpiling statements much more challenging. So, recently I’ve gone back to the drawing board and have decided to pursue the classic model. If it ain’t broke, don’t fix it.

I’m working off of David Callanan’s Interpreter Tutorial. While it’s a very useful tutorial, the code is admittedly pretty poor, as it contains a few files with hundreds of lines. I’m also using Python exceptions to carry errors, since as far as I’m aware, Python has one of the safest exception systems (unlike C++). I can safely catch and handle exceptions to create useful messages for the user. The tutorial, on the other hand, is manually passing around errors from function to function. That said, the explanations are decent and it is a very useful tutorial. I’ll just have to make a lot of modifications and refactoring after each episode in the tutorial. That said, let’s go over how a transpiler works fundamentally:

The Process

The first step in transpilation is reading the source file. The lexer goes character by character and matches them to a set of predefined tokens. These tokens define a significant part of the syntax of a language. If it doesn’t recognize symbols, it can give an error that alerts the programmer. If there aren’t any errors, the lexer will go through the entire file (or files) and create a list of these matched tokens. The order of the list encodes the order that elements appeared in the file. Empty space and otherwise meaningless syntax symbols are not passed on.

Next, the list of tokens is sent to the parser. The parser will then go through the list of tokens and create an Abstract Syntax Tree (AST). This is a tree of tokens whose structure encodes the order of operations of the language’s syntax. In this stage, the order of the list is lost; however, that order isn’t important. What matters is in what order tokens should be. For instance, the list of tokens for 5+22*3 might look something like [INT:5, PLUS, INT:22, MUL, INT:3] and the list of tokens for (5+22)*3 might look like [LPAREN, INT:5, PLUS, INT:22, RPAREN, MUL, INT:3]. The ASTs for these token lists will look something like this respectively:

Created on https://app.diagrams.net/

Lastly, you then traverse the tree using depth-first-search (DFS), or more specifically, Preorder Traversal of the tree. This means we start at the root node and we the work our way down the left side and then down the right side. This is incredibly simple to implement using recursion. Each new node you check can be treated as the root to a new tree where you can then proceed to repeat the search. This occurs until the entire tree is traversed.

In this final stage, this is also where transpilers, compilers, and interpreters differ. Until now, the same code could be used for all three. At this point, if you want a transpiler, you use the AST to write new code. If you want a compiler, you use the AST to write machine code. If you want an interpreter, you use the AST to run the code. Notice this is why there is such a performance benefit to using a compiler over an interpreter. Every time you interpret code, assuming there is no caching system in place, the interpreter has to recreate the entire token list and AST. Once you compile code, it is ready to be run again and again. The problem then comes from compiled code potentially being more complicated for higher-level language features, and thus making it a pain to write a new compiler for every CPU architecture, due to different architectures using different machine instructions.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.

From C to Shining Sea

The Snake

As of recently, I’ve been spending most of my personal coding time in Python. I enjoy a lot of languages and Python certainly isn’t for everything, but when you can use Python, boy is it a joy. As someone who strictly indents in any language, I love the indentation style of denoting blocks. Curly braces have their use, but the vast majority of the time, they’re purely redundant. The same goes for semicolons. I completely agree with the movement of programming languages towards spoken language. The main downfall of Python, comes from how high-level of a language it is.

Being a high-level language allows for it to be as convenient to write in as it is, however you are completely unable to use low level features. It also means Python’s performance is often much lower than that of C++ or other languages. Of course, everyone says that each language has its own use and Python isn’t meant for performance-intensive programs. But why not? Wouldn’t it be nice if there were a single modular language that had Python-like simple syntax with the features of JS, Python, C++, etc.

The Sea

Before I take on the task of creating such a language, I want to start smaller. Introducing Sea- It’s C, just written differently. I am currently working on a language called Sea which is effectively C, but with Python-like syntax. I say Python-like because much of the syntax of Python relies on internal data types. My goal is to keep Sea true to C. That is, no increase performance penalty; all of the penalty should be paid at compile time. That’s phase one. Start off with a more concise face for C. Then, I want to create libraries for Sea that take it one step further – introducing data types and functions innate to Python like range, enumerate, tuples, etc. Lastly, I want to use the knowledge I’ve gained to create the language to end all languages as described above.

I’m starting off with a Sea-to-C Transpiler, which is available on Github. In its present state, I am able to transpile a few block declarations and statements. I’m currently working on a data structure for representing and parsing statements. Once that’s made, I can add them one by one. The final result should look something like this:

include <stdio.h>
include "my_header.hea"

define ten as 10
define twelve as 12

void func():
    pass

int main():
    if ten is defined and twelve is defined as 12:
        undefine twelve
        // Why not

    c block:
        // Idk how to do this in Sea so I'll just use C
        printf("Interesting");

    do:
        char *language = "Python"

        print(f"This is an f-string like in {language}")

        for letter in language:
            pass

        break if size(language) == 1
    while true and ten == 11

    return 0

Once the transpiler is done, I want to create an actual compiler. I’ll also want to make a C-to-Sea transpiler eventually as well. I’ll also want to create syntax highlighting for VS Code, a linter, etc. It has come a surprisingly long way in such a short while, and I’ve learned so much Python because of it. I’m also learning a good amount about C. I’m hoping once I create this, there will never be any reason to use C over Sea. There are reasons why certain languages aren’t used in certain scenarios. However, I see no reason why certain syntaxes are limited in the same way. Making indentation a part of the language forces developers to write more readable code while removing characters to type. Languages should be made more simple, without compromising on functionality. That is my goal.

From the blog CS@Worcester – The Introspective Thinker by David MacDonald and used with permission of the author. All other rights reserved by the author.