everyone should write a programming language

i know, it sounds completely bonkers. ive been toying with the notion for a year or two, and my original thinking on the matter was more of an ideal than a practical thing.

the first question is, “not everyone can write a programming language.” ok, thats not even a question.

not everyone can learn to code, either. wait, yes– everyone (that can learn to read) can learn to code. they might not be able to code in python, or java– but they can definitely code in something. theres absolutely no question. its a matter of making the language simple enough– and thats a hint.

why, why, why, should everyone write a programming language?

because its simply the best way to learn to code. its not the only way! its not always the easiest way– and if learning how to write a programming language isnt helping the student learn to code, then perhaps do something else.

but here i am, explaining to a good friend and fellow blogger who tells me im very good at explaining programming in plain english (thank you, its a huge compliment) that they dont even have to “learn” to code if they can write a programming language–

i know how crazy it sounds. and this is not the first but the second person ive said it to this week. im starting to think i could even be serious.

of course some people need help to learn how to code– and other people need help writing a programming language.

most people would need a lot of help writing a programming language– but then they wouldnt have to learn the language, get it? they would make it up, and thus skip the whole chore of learning it.

now that doesnt mean they would have it memorised– i still have to look up a command in the language i wrote myself, if im not using that command all the time. ive known most of those commands for more than year but hey, theres almost 100 of them. did i make this one with one parameter or two? oh, right, it had two.

what it means is if they made it up, they understand how it will work. steering them towards keeping it simple could be important, though in some cases it could be better to show them how to do things in a more sophisticated way. some people will find it worth the trouble, others will want an easy way (i sure did.)

if creating a language in python (source-to-source compiler or interpreter) i recommend implementing variables, i/o, simple math, loops, conditionals and functions– if its an interpreter you can put all of it in a loop, and have the exit command simply break the loop. that saves having to implement a real loop feature.

for a long time ive wanted to create a programming language designed to make it easy enough for anyone to create their own programming language.

python is the closest thing to that i know, but i mean something easier.

one of the reasons people dont know how easy it is to create a programming language, is that they start with a parser and try to make it sophisticated. as they teach at brown university– dont! start with functions. its what i did to create my language, and i found out its what they teach. you can add a parser later.

a parser can be simple. as they teach at brown, you can build an api and then have more than one parser for different styles of language. i did it that way because i wasnt trying to build a language first, and i didnt know i was going to. thats how easy it was, done that way– “what if i told you… that a language implementation could be written just like any other program?”

parser, scanner, lexer– cut that out! you need to loop through a string of text (possibly containing newlines) and do things based on what the text says. thats your “parser.” stop trying to make an enterprise-grade general purpose language as your first language, and youll have something to show for it a lot sooner.

people often say these days, you need something to accomplish if youre going to learn to code– an app, or a website. it isnt true! you can just practice little tiny snippets that are fun but dont do much thats useful. people learn that way sometimes.

but if youre looking for a thing to accomplish while learning how to code, try a language implementation! seriously– of a very simple language. you can do it! itll be fun. and it will take less time to learn how to code, because youll end up understanding how languages work before youve even learned your first. and yes, at this point i am completely serious.

 

 

newb update: version 1

newB version 1 has been launched ! newB the language maker newB is more than just a new language. We have in mind to enable anybody to create his own language. So, keeping this in mind, we now allow users to custom define their very OWN KEYWORDS. It is a humble step in the wake […]

via newB-version 1 launched : The democratisation and customisation of languages — A bit of everything – copy

newb– a fun proglang being developed by arj

 

now someone is doing just that:

the blog entry: https://abdurrahmaanjanhangeer.wordpress.com/2017/02/27/newb-a-new-language-just-surfaced/

the github: https://github.com/Abdur-rahmaanJ/newB

and my first contribution, in case he can use it: (under any license he likes.)

license: creative commons cc0 1.0 (public domain) 
http://creativecommons.org/publicdomain/zero/1.0/ 
7c7,14
< with open('txt.nub','r+') as f: --- > filepath = 'txt.nub'
> from sys import argv
> if len(argv) > 1:
>     if len(argv[1]):
>         if ".nub" in argv[1]:
>             filepath = argv[1]
> 
> with open(filepath,'r+') as f:

 

since arj isnt using diff, im going to make these contributions simpler:

#### license: creative commons cc0 1.0 (public domain) 
#### http://creativecommons.org/publicdomain/zero/1.0/ 
from sys import argv
if len(argv) > 1:
    if len(argv[1]):
        if ".nub" in argv[1]:
            filepath = argv[1]

with open(filepath,'r+') as f:

 

 

how to write a programming language

(3080 words)

 

introduction

writing a programming language is a science, like math– or an art, like writing a play. one day theyll have a word for things that are both an art and a science.

you can learn how i wrote a very simple demo programming language in 25 minutes just by taking it apart, or if you prefer you i can take you on a tour of the features and talk about how to generalize the concepts. (thats what this post is for.)

programming is fractal in nature– the way you write a complex program isnt fundamentally different from the way you write a simple one. its true, that as you write more complex programs you will incorporate little tricks that you didnt need for simpler programs– they dont change the fundamental nature of the task but they refine it. writing a programming langauge is the same way.

but remember that all programming becomes these tiny, very simple little steps, so fundamentally all programs (and programming) work the same way. talking about one way to do it can help you understand a wide variety of ways, minus less significant details you can pick up along the way.

one of the most important differences between programming language designs is “interpreted or compiled?” these dont have to be as different as they seem:

if your target language already has all the features you need, then writing a compiled language will probably be easier– just translate the input code into output code and youre done (if that sounds like an oversimplification, it will be addressed in slightly more detail to demonstrate.)

certainly if youre compiling down to machine code, youre going to want to know at least two “languages” (other than the one youre writing) including the machine code itself, and the higher-level language youre probably writing the compiler in.

the thing is, compiling to machine code isnt the rule it used to be. coffeescript, as one example– compiles to javascript. (“compiles” is the right word– you can just as easily call it “translating” which is very common, and also correct.)

fig is written (implemented) in python and compiles to python. the demo language shown in this post is also written (implemented) in python.

technically speaking, the language is the language and the program that makes it possible to use is the “implementation.” fig is the name of my programming language, and it is also the name of the implementation (the translator or “compiler”) which is written in python. a good reason to distinguish a language from an implementation, is that you can have implementations for the same language, written in more than one language–

the most popular implementation of python is written in c, and there are other implentations of python– pypy is another typical implementation of python, written in a simplified version of python!

 

compiling, or?

but speaking in fundamentals, the difference between implementing a compiler and implementing an interpreter is this:

an interpreter will actually run code, after translating it.

a compiler will take the translated code and write it to a file instead.

if you think thats a huge difference, think about it this way:

for every line of code your intepreter runs, you can convert the part that runs it into a print statement and compile to stdout. then instead of running the interpreter this way:

$ name-of-interpreter name-of-inputfile

 

you would run it this way:

$ name-of-compiler name-of-inputfile > outputfile.targetlanguage

 

that will be demonstrated with a “real language” in this post.

one of the biggest differences between a simple language and a sophisticated one, is the complexity of the parser (and so, the complexity of the source it can parse.) people keep implementing variations of lisp, because its probably the easiest (least complicated) recursive language you can parse. basic is more complicated (to parse.) so is javascript.

if you want to learn how to write a complex language first, get a big, big book on implementing programming languages, and get to work. people do, its probably how youd go about it if you had a college course in the subject and wanted your first language to look and walk and quack like the big languages you use everyday.

if, on the other hand you want to write a programming language that is simple enough to understand in a week or so, and start working your way up to more complex languages after having an easy one under your belt– well, keep reading. after i wrote fig, i started looking at how more complex languages were written, hoping to find the “secret” of making them far more complicated.

the secret is, there isnt one. learn how to get a more complicated parser, and youll have a more complicated language.

you can do it the following ways:

  • get a parser generator and learn and use its syntax (its supposed to be easier, especially if your target language is c.)
  • make your language free software and adapt an existing free software parser to your needs
  • learn how to write recursive parsers, perhaps starting as so many do with a parser for lisp-like languages

a big, big book can help you with any of those. as i said, i skip all that and go for:

  • write a simpler language to parse, and a simpler parser to parse it

writing an interpreter is easier if your language is going to have simple features. if its not going to have loops and conditionals, then an interpreter is simply easier to implement– you might even do it without it feeling like youre writing a programming language.

a couple tips for starters:

you can have limited loop functionality (the easy way) in an interpreted language, by putting the entire interpreter in a big loop, and implementing an exit command to stop looping. i watched one person with a tiny interpreter do this, and its one way to have some looping functionality without a lot of trouble.

you can have limited conditional functionality by creating functions that are conditional based on input– use your imagination there.

what makes an interpreter difficult to implement with full loop and conditional functionality is that you have to implement branching– you have to be able to move from one part of your program to another, rather than simply intepreting each line. in the old days, this was done with an intermediate translation to a “bytecode.” (this isnt a dead practice, as java shows, but i think it has probably evolved.)

im going to walk you through a simple interpreter, tell you how to avoid bytecode, and then offer you the option of compiling to a high-level language as the easiest way to have all the complex branching features you like. (thats what fig does.)

in other words, fig (as a language) offers the functionality of conditionals and loops, by translating (compiling) to python conditionals and python loops. thats the easiest way to have that– write a source-to-source compiler. but if you dont need that functionality (no loops or conditionals in the language i will walk you through) then an interpreter is even easier.

 

too easy:

if you really want to “cheat” then you can implement at the very least, a pseudo-language interpreter by abusing the exec() command in python. python has a ton of features that are wonderful for language implementation, and exec() is the one i deliberately avoided in fig, both for security and educational purposes.

wanna see a print command implemented using exec in python 2? you can laugh:

x = raw_input() ; x = x.split()
if len(x) > 1:
    if x[0].lower() == "print":
        exec("print " + " ".join(x[1:]))

 

im not saying this is useless or should never be attempted, but i suspect the language you create will be famous one day for programs with security problems, (exec is simply too powerful and easy to abuse, though someone with more experience can perhaps easily prove me wrong, and that would be pretty cool) and to me this is so close to “cheating” that youd have to do something really novel with it to make it worthwhile.

other than that, i dont want to discourage you. but you may really prefer source-to-source compiling, over this.

 

variables

a python feature i do recommend for intepreters however, is hash tables (aka python dictionaries.) theyre a great place to put all the variables for your interpreter. so the first line of the demo interpreter creates a dictionary for storing variables:

variables = {}

now if you want your intepreted program to set a variable, all the code you need to set it is, run this code:

variables["identifier"] = value

 

this also isolates your languages variables from the rest of the program. if we cheat using exec instead:

exec("x = 5")
exec("print x")

 

that actually works, but youre storing x right in the same variable space / scope that the rest of the program is working in. that could lead to some fun issues later! (but if your language design is complex enough, you could have some real sciencey fun playing with that sort of thing. exec isnt all bad, its just more powerful than you probably even want.)

if you want to print your variable the dictionary way instead, run this code:

print variables["identifier"]

 

copy a variable? sure:

variables["2ndvariableidentifier"] = variables["1stvariableidentifier"]

 

now be careful here, if youre using python– because what i just showed you will have the same features (and issues) of copying a variable in python using =

for a string or numeric, theres no problem. if youre copying a list however, youll be linking to it, not cloning an independent copy. i get out of this by using type():

if type(variables["1stvariableidentifier"]) == list:
   variables["2ndvariableidentifier"] = variables["1stvariableidentifier"][:]
else:
   variables["2ndvariableidentifier"] = variables["1stvariableidentifier"]

 

if your language doesnt have python lists as a feature (hint: tuples of constants dont have this issue) then you can just use the single-line variable copy.

for a source-to-python compiler, if i want to set x to 5, i just do more or less this:

newline = "\n"
outfile.write("x = 5" + newline)

 

so its more like exec, except that its safer and (because it doesnt run until the compiler exits) it doesnt clash with the variable space of the translator program. (and yes, i still use the trick i showed you for dealing with list copying.)

 

you can teach yourself by trying to do it

now you may wonder how i learned to implement features like this. i honestly didnt learn from anything except learning to write programs– and this is a secret i want to pass along to you–

i wrote a programming language exactly the same way i would write any other program:

* i thought about what i wanted to accomplish
* i broke those steps down into individual tasks
* i implemented each of those tasks using code, using what i know about how the implementation language (python) works– and occasionally looking up things

(but for example, i already knew you could clone a list using [:] and i vaguely knew type() existed.)

the goal of your language implementation (program) should be:

1. take text (or numbers, or even pixels or sound data– but information) as input
2. break text down into pieces that the program understands
3. do things (intepret) with those pieces, or:
4. write things to a file (compile) based on those pieces

 

when you understand the simplest notion of programming, you can figure out yourself how to write a programming language. or as ive heard it put very, very simply:

“read text and do stuff based on the contents.”

 

function calls

probably half the trouble of implementing a simple language is parsing. you can separate that into lexing and parsing, etc. etc. fine– lets just say “parsing” for now because thats what its called when you “parse” phrases in english. (in programming languages, you might call it lexing.)

but almost half the work (for me) is a parser, and at least half the work is writing functions.

in fig, a function is what you call a command. you can call it a command, but a function is a function, a sub is a function, a command is a function, etc. this is largely due to the python underneath:

def figprint(p): print p

 

thats not a whole lot more complicated than using exec, is it? in 30+ versions of fig, i never needed more for the “print” command. the “prints” command is a little more involved:

def figprints(p): stdout.write(str(p)) ; sys.stdout.flush()

 

and what about the “left” command?

def figleft(p, s): return p[:s]

 

these fig commands illustrate the point, but our 25-minute demo language doesnt use them. it implements these 4 commands, all of which have 2 parameters each (that makes parsing a little easier. i was trying to be quick!)

def repeatprint(p, x): print ((p + " ") * int(x)).rstrip()

def printsum(p, x): print float(p) + float(x)

def setvariable(p, x): global variables ; variables[p] = x

def repeatvariable(p, x): print ((str(variables[p]) + " ") * int(x)).rstrip()

 

do functions have to be one-liners? of course not. some functions in fig are at least several lines long.

but our little demo language (based on its function calls) can do 4 things, using 4 commands:

repeatprint string count

printsum numeric numeric

setvariable identifier constant

repeatvariable identifier count

 

of course “identifier” is a fancy way of saying “name.”

 

demo program

now that we know what the demo language can do, lets implement a demo program:

* set the variable d to 12
* repeat “hello” 5 times, and print 5 + 9
* print the variable d, 9 times

putting the program in a multiline string instead of bothering to load a text file, heres what it looks like:

demoprogram = """
setvariable d 12
repeatprint hello 5 printsum 5 9
repeatvariable d 9"""

 

a ridiculously simple parser

now it pays off that every function has exactly 2 parameters (to make it easier to parse.)

lets process all the excess spaces (this is to make the language easier to use, but you could demand strict interpretation of spaces instead)

xp = []
for p in demoprogram.replace("\r", " ").replace("\n", " ").split(" "):
    p = p.strip()
    if len(p): xp += [p]

 

that just turned our multiline string demo into this:

[‘setvariable’, ‘d’, ’12’, ‘repeatprint’, ‘hello’, ‘5’, ‘printsum’, ‘5’, ‘9’, ‘repeatvariable’, ‘d’, ‘9’]

 

which is even easier to process. so (keeping it simple,) we go through the list xp, and add to another list called “build.” when its length is 3, we do a function call with the name, the second list item as a parameter, and the third list item as a parameter:

build = []
for p in xp:
    build += [p]
    if len(build) == 3: 
        if build[0].lower() == "repeatprint": repeatprint(build[1], build[2])
        if build[0].lower() == "printsum": printsum(build[1], build[2])
        if build[0].lower() == "setvariable": setvariable(build[1], build[2])
        if build[0].lower() == "repeatvariable": repeatvariable(build[1], build[2])
        build = []

 

so here is our demo program:

setvariable d 12
repeatprint hello 5 printsum 5 9
repeatvariable d 9

 

here is the expected functionality:

* set d to 12
* print hello 5 times
* print the sum of 5 and 9 as a float
* print the value of d, 9 times

here is what it actually does when you run the program:

it puts “hello hello hello hello hello” on the screen
it puts “14.0” (the sum of 5 and 9) on the screen
it puts “12 12 12 12 12 12 12 12 12” on the screen

 

compiling

now what about compiling instead of interpreting?

well the way fig works, is instead of declaring these functions:

def repeatprint(p, x): print ((p + " ") * int(x)).rstrip()
def printsum(p, x): print float(p) + float(x)
def setvariable(p, x): global variables ; variables[p] = x
def repeatvariable(p, x): print ((str(variables[p]) + " ") * int(x)).rstrip()

 

it puts them in a multiline string:

f = """
def repeatprint(p, x): print ((p + " ") * int(x)).rstrip()
def printsum(p, x): print float(p) + float(x)
def setvariable(p, x): global variables ; variables[p] = x
def repeatvariable(p, x): print ((str(variables[p]) + " ") * int(x)).rstrip()
"""

 

and writes that string to a the file it compiles. (you can get fancy and only write the functions that are called by the program, but i didnt want to– better imo to let people edit the python code it output. if you werent compiling to source, youd definitely want to do a pass to count the functions actually called, and only compile those!)

then instead of calling a function when its encountered like this:

if build[0].lower() == "printsum": printsum(build[1], build[2])

 

write it to the same file that the functions are defined in, like this:

if build[0].lower() == "printsum": outfile.write("printsum (" + str(build[1]) + ", " + str(build[2]))

 

improving the language

one very trivial way to make this existing language more sophisticated, is to have it allow more or fewer parameters for each command. you can do this by making it known where the end of each command is.

instead of always looking for two parameters, have it look for a separator of lines (logical lines of code, or lloc.) a popular way to do this is to look for a semicolon:

setvariable d 12
repeatprint hello ; 5 printsum 5 9
repeatvariable d 9

 

you dont need one at the end of every line, if a newline also counts.

in javascript, you need one at the end of every line– in python, you dont.

that doesnt matter, since youre translating things to python, not from it– you decide how the translation (and thus the language youre implementing) works.

one thing about this program and that extra sophistication is:

setvariable d 12
repeatprint hello ; 5 printsum 5 9
repeatvariable d 9

 

now you can have differing parameter counts, but you cant offer a semicolon as a string. you can get around this in several ways:

* strings have to go in quotes
* escaped semicolons \; dont separate lloc
* the only way to print a semicolon is to put a numeric in a variable, then convert using a chr command or something
* instead of using semicolons, demand a one-command-per physical line (ploc) convention

when youre creating your own language, you get to decide which of those (or other) solutions are most reasonable to you, and to the goals of the language.

 

what fig does with this

just so you know, fig addresses the matter this way:

block commands and function definitions have to go on their own line (“own-line” commands)

most commands (“shared-line” commands) start with a “main variable” and share the line– each command in fig has a fixed number of parameters, and the translator has a dictionary that tells it how many parameters to expect for each command

“own-line” commands can have any (unexpected) number of parameters. the only command in fig that takes advantage of this is “function”, which lets you define a function in fig like this:

function identifier

 

or this:

function identifier paramname

 

or this:

function identifier paramname paramname

 

…etc. this is like using def identifier(paramname, paramname): #in python

as a result, fig does not require punctuation for most code– but it does tolerate some. it requires “quotes for strings” and # hashes for comments and decimals do what you would tend to expect (and dont use them unless you need them for something.)

 

 

what if everyone wrote a programming language?

what would happen? the literal and obvious answer is that we dont know. but if we allow ourselves to speculate, here are some thoughts springboarded by a few facts and metaphors:

 

  • reality: not everyone is going to write a programming language

i know, but this is speculation. a lot of it is really “what if a lot more people wrote programming languages,” but the question is only fair because i encourage everyone to write one.

 

  • natural selection: most of these languages will not be very good

thats ok; in fact its not even important. if people took a few hours to write a programming language, it is a single lesson that would substantially increase the way they understand programming.

 

  • true scotsman: most of the languages wont even be “real languages”

actually, its nearly impossible to create a “programming language” that isnt a programming language. lots of useful software projects can have scripting engines without creating full-fledged languages, markdown is probably not the very last “standard” of its kind, dont be afraid of being “cute,” because it will be fun.

i made a language that consisted only of the goto statement. it could do 25 different things, and i adapted it from another simple language interpreter someone else wrote. did i mention that i redid their interpreter in my own programming language?

 

  • aiming for the stars: most people cant create a programming language

this is like saying most people cant code. the trick is to make it easier to do. making it easier to create a programming language is something ive been working on for a while. and yes, just like you need things like scratch or basic or logo to get “everyone” programming, you need something simpler if everyone is going to make their own language.

thats not a real problem.

 

  • convergent evolution: too many languages, and no one will know the same one

this isnt true at all. if everyone made a programming language, the features would coalesce into more serious languages. even if a few inspired a new feature (or approach) in serious or educational languages, it would be worthwhile. but although this would likely happen, even if it didnt:

 

  • optimal learning: we should focus on serious languages

i dont agree with this either. first of all, educational languages have made it easier for younger programmers to start earlier and get comfortable with computing. the idea that we can skip this step is like saying we could skip educational programs to introduce young children to reading, because the best time to start learning to read is grade 4.

no one learns how to read “the wrong way,” or how to code the wrong way: thats paranoid nonsense. some people are not good at coding, and other people have not yet learned best practices, but dabbling on a toy piano is not going to prevent you from becoming a skilled pianist or composer.

only in programming is there a fear that early education and practice could somehow “taint” the future student. when there is more (or any) reasonable scientific evidence that this is the case, thats another matter. (even then, its still more paranoid to think it is irreversible.)

 

  • but why? you havent said why…

the reasons vary, so i will give mine.

  • its fun
  • it will make you a better programmer
  • its not nearly as difficult as its made out to be
  • you will better appreciate and understand programming
  • its just one more “programming exercise” worth trying
  • it will inspire you to write lots of other programs
  • i believe it will ultimately push language development forward
  • literacy is not just about reading, but authoring

you can argue against all of those– and i can argue for them. really, my job isnt to convince you to do something you dont want to do; only to make a case for trying it; and to help people that are interested in making it easy enough to actually achieve something.

thats pretty much all i have to say about it for now, i would love to revisit the subject or have other people share their thoughts on the matter.

note that when i talk about “writing a programming language” i am talking about a simple starter language: something that could be done in hours, days, or a week (depending on the level of sophistication.)

i do include writing your own program to compile (probably to another high-level language) or interpret the language; you could also adapt an existing compiler or interpreter.

whether it became a bigger project than that would depend on how much the author got out of the effort. if it was boring and they didnt learn much, i wouldnt necessarily push going forward from there. but if they had fun at all, they could continue with it or write a slightly more enhanced language after that.

if there were a push towards trying, i think a lot of people would realize it isnt that difficult; even if they thought it was something they would never do.