creating my own programming language

Hello,

I wanted to know where would I start to create my own programming language. I have no idea where to start, any help would be appreciated. I don't know much about how compiler and parser works. So if anyone could point me to a direction like a website/book/online articles to get me started writing my own programming language.

Thank You.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Getting Started

Please see our Getting Started resources. I personally would suggest starting with Essentials of Programming Languages and then moving on to Types and Programming Languages. You can (and should!) also find Shriram Krishnamurthi's Programming Languages: Application and Interpretation—if, as I suspect, you're a starving student, you can download it for free. If not, then I recommend buying it.

Two other very important books that are somewhat more focused (read: you'll learn how to implement languages in the Lisp family) are Structure and Interpretation of Computer Programs and LISP In Small Pieces. SICP is, famously, used in introductory (!) computer science courses at MIT. LISP In Small Pieces is, IMHO, the definitive current text on the various design choices available to developers of members of the Lisp family. If you want to deeply understand how Common Lisp and Scheme are different (and how they're the same!) as well as understand several other Lisp dialects that are of historical note—and perhaps more importantly to you, understand how to implement whatever type of Lisp you might wish for—this is your book.

Fair warning: precisely zero of these is a light read. On one hand, they're all based on a seemingly trivial observation: an interpreter or compiler for a programming language is just another program. Like many seemingly trivial insights, though, this one is actually extremely deep, perhaps even profound. As Guy Steele has observed, creating a new programming language isn't actually hard. What's hard is creating a new programming language that satisfies some difficult-to-articulate set of constraints in ways that aren't already satisfactorily available in an existing language. So please give some thought to whether you want to do this for your own education and enlightenment (in which case, knock yourself out and have fun!) or whether you're doing this in anger—that is, you think that all existing languages (that you know—a crucial, but frequently overlooked, point) suck in unbearable ways, and that you will succeed in creating the Next Big Language™—in which case, good luck. :-)

Parsing

Keep in mind that the texts Paul mentioned usually do not deel with the early phases of language processing: lexical analysis and parsing. These processes can be delegated to tools (lex/yacc;flex/bison etc.), but if you want to learn about them, you should consult a text book the covers parsing. The Dragon Book is the the classic reference.

What's your background? And purpose?

What languages do you know right now? Are you willing to learn another language to implement your compiler? Do you have time to do that? Do you understand regular expressions? Backus-Naur form for expressing grammars? The languages that you know should probably affect your approach.

There are tools that help you generate compilers in a variety of languages, like Lex and Yacc or Flex and Bison in C++, or JFlex and JavaCUP in Java, or ANTLR, or something else.

What is your language going to do? Will it be focused on mathematics, or text parsing, or is it a special-purpose language intended to be embedded into some other application, or what? Languages should not be designed in a vacuum--all of the design decisions you make are tradeoffs, making it perhaps more suited to one mode of use than another, and making programs easier or harder to write.

If you honestly have no idea where to start, I might recommend a more practical and less theoretic approach. Start by building a calculator.

Your programming language will almost certainly have to parse and evaluate mathematical expressions, so this is a good starting point. Learn how to parse standard mathematical notation, including getting the order of operations right.

If you try to build the calculator using two different parser generators, one that accepts LL grammars and one that accepts LALR grammars, you'll learn good lessons about using the right tool for the job, and about the practical differences between these types of grammars, and about explicit and implicit precedences. (The O'Reilly book "Lex and Yacc" will give a good introduction to building a simple, working calculator in straight C, but you might want something higher-level. Its discussions of precedences is helpful.)

Once you can parse and evaluate simple numeric equations with parentheses and the like, add the ability to set and read from variables. This will give valuable lessons about "stuff has to be stored somewhere". Where does that stuff get stored? Well, that's up to you, but it will have to get stored somewhere that you can access.

From there, you can start to build functions, more complex data types such as arrays, etc.

By the way, when someone mentions "the Dragon Book," it's likely that they're referring to the book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, which will be easier to find in the card catalog. :)

True. But if you use google

True. But if you use google (and who doesn't) you'll find both the wikipedia page and the amazon page for the book, simply by plugging in "dragon book". Such is the power of technology...

Prepare to throw one away

I agree that the best way to learn is to start by building a simple intrepeter and continue from there. But a good advice is to keep in mind that your first attempt(s) are likely to be dead-ends. There are many useful patterns used in the design of language processing tools, which you shouldn't rediscover for yourself. After playing a bit with your own design, throw them away happily and go to the litterature (or web sites) to find standard compiler/interpreters designs to copy and modify...

One? Several!

Which is part of the fun!

You are quite right, of

You are quite right, of course.

More on Dragon Books

There's three of them, all bearing a different-color dragon on the cover.

Principles of Compiler Design, the "green dragon". The first book by the set of authors in question (or a subset thereof--it came out in 1977. Much of the theory is a bit out of date, and I'd recommend one of the successors, but if you can find a used copy cheap, go for it. This one is written by Aho and Ullman.

Compilers: Principles, Techniques, and Tools, aka the "red dragon", by Aho, Ullman, and Ravi Sethi. The second edition of this work adds Monica Lam as a co-author, and is the "purple dragon". The PD contains most (if not all) of the content of the red dragon, plus a lot of new material on modern compiling techniques, such as JIT. The 2nd edition just came out this year, and you should expect to pay near full price (which might be a consideration if you're a Starving College Student--we all were at one point and some of us still are).

The Dragon Books aren't terribly heavy on "programming language" theory--i.e. documenting and classifying the different features and issues with designing a PL or analying an existing one. They instead focus on implementing compilers, with particular focus on compilers which translate from a HLL to machine code. They might not be the best place to start--it's better (an easier!) to write an interpreter for your language at first, and only if you find it useful should you consider writing translators.

One other book I'd recommend, as a bit of a gentler intro to the topic, is Michal Scott's excellent Programming Language Pragmatics (I see someone else has recommended it below). If you have a limited theoretical background in PLT (programming language theory), type theory, or other related topics, this book will help you get your feet wet. PLP is probably my favorite undergraduate-level text which is specific to PLT. One other one I highly recommend is Peter van Roy's Concepts, Techniques, and Models of Computer Programming. Like SICP, this isn't a book on PLT so much is it is a book on programming techniques--but still an important consideration, as a good PL will make the common programming language techniques easily accessable to the programmer.

Finally, a few paradigm-specific works: Bertrand Meyer's Object Oriented Software Construction is still the definitive work on OO (though it tends to beat the Eiffel drum somewhat). Guy and Mauny's The Functional Approach to Programming is a good intro to FP, and you'll never know from reading it that it's a translation of a French work. (Which lazy-FP book is best for beginners?) Oh, and although it isn't a PL work, you should have a few books on your shelf by database guru Chris Date, starting with An Introduction to Database Systems and proceeding to one of the various versions of The Third Manifesto (there are three versions and all can be found at a bookseller near you).

I almost forgot...

to credit Peter van Roy's co-author, Seif Haridi. PVR, as he's known around these parts, is a regular contributor to LtU.

Modern Compiler Implementation in ...

Regarding the dragon books, I'd recommend the more recent and, to my opinion, easier to access, Appel's series "Modern Compiler Implementation in (your favorite language)".

The million dollar question ...

WHY?

The million dollar answer

Why do learners solve tones of tasks which have already been solved by others?

Resolving

I would venture to suggest this is because the solved tasks are not packaged in a reusable way, and that the task of creating such packaging remains an unsolved problem. Thus people, not just learners, re-solve problems because they must. Of course, there are numerous partial solutions, but a real solution requires popular acceptance as well as the appropriate technical properties.

The book that you want

Programming Language Pragmatics

It is a common sense kind of book. It covers a little bit of everything and helps you see the big picture as well has some of the details. Buy it now.

You might also want to consider...

...writing an embedded domain specific language (DSL). Rather than having to write your own lexer and parser you can override features of an existing language to give you what you want. C++, Python and Haskell seem to be good candidates of choice for the 'host' language, depending on the type of task you're trying to achieve. I also know a paper comparing languages for this purpose here. If you do this you may find you inherit all kinds of useful features from the 'host' language for free.

When it comes to DSELs

When it comes to DSELs (domain specific embedded languages), don't forget Prolog. See here for several examples and techniques.

A prolog tangent

Prolog can itself be a DSEL. Here's a Prolog compiler in Lisp from Norvig's Paradigms of Artificial Intelligence Programming. Then you can write code like this:

(<- (zebra ?h ?w ?z)
       ;; Each house is of the form:
       ;; (house nationality pet cigarette drink house-color)
       (= ?h ((house norwegian ? ? ? ?)	;1,10
	      ? 
	      (house ? ? ? milk ?) ? ?)) ; 9
       (member (house englishman ? ? ? red) ?h)	; 2
       (member (house spaniard dog ? ? ?) ?h) ; 3
       (member (house ? ? ? coffee green) ?h) ; 4
       (member (house ukrainian ? ? tea ?) ?h) ; 5
       (iright (house ? ? ? ? ivory)	; 6
	       (house ? ? ? ? green) ?h)
       (member (house ? snails winston ? ?) ?h)	; 7
       (member (house ? ? kools ? yellow) ?h) ; 8
       (nextto (house ? ? chesterfield ? ?) ;11
	       (house ? fox ? ? ?) ?h)
       (nextto (house ? ? kools ? ?)	;12
	       (house ? horse ? ? ?) ?h)
       (member (house ? ? luckystrike oj ?) ?h)	;13
       (member (house japanese ? parliaments ? ?) ?h) ;14
       (nextto (house norwegian ? ? ? ?) ;15
	       (house ? ? ? ? blue) ?h)
       (member (house ?w ? ? water ?) ?h) ;Q1
       (member (house ?z zebra ? ? ?) ?h))) ;Q2

;; Solve the Zebra problem:
(?- (zebra ?houses ?water-drinker ?zebra-owner))

Interestingly, an optimized version of the code from PAIP can solve this sort of problem much faster than a straightforward Lisp implementation -- thus showing the usefulness of Prolog as an embedded language for logical inference. Here's a comparison backing that up with code and numbers.

Impractical Advice

One way to go about it (which's why I have my own pet language project) is to find languages that you think are really cool and learn them. If you find a language you truly fall in love with, then don't write your own language -- master that one instead. If not, remember what dissatisfied you about that language and repeat with a new language until you're either hacking in a language you love or you know what it is that you really want in a language.

By then all you've gotta do is to figure out how to implement it, and all the papers & books they've referenced above will hold you in good stead.

Or wite some great tooling for it

You gotta admit that Java has some pretty advanced development environments. Write a nice parser plugin for an IDE.

One more thing

Don't expect your language to take over the world. Guys who "strike gold" like Guido or Matz or Larry Wall, are the exception and not the rule. You have greater chances of starting a garage band and winning a major-label record deal, then the fruits of your labor becoming the next Ruby or Perl.

If you do hope that your language will attract a following, find a niche. One, preferably, that you are experienced with.

You have greater chances of

You have greater chances of starting a garage band and winning a major-label record deal, then the fruits of your labor becoming the next Ruby or Perl.

Except he has an award winning idea that comes out of his studies. Be friendly to a guest, since he could be the mask of a god.

Its true that you may not

Its true that you may not write the next big thing, but that doesn't mean you can't take what you've learned and contribute to a larger project. The very least you'll take from the experience is a greater understanding of implementation of PLs and of the languages that you use.

That said, a good start would be a tutorial on flex/yacc or related tools in the language of your choice. Then you can pick up a compiler book - both Dragon and 'Modern Compiler Application in *' are very helpful.

No unfriendliness intended

None at all. And I don't intend to cast any aspersions at the original poster or his ideas or capabilities.

Simply pointing out that the vast majority of programming language projects fail to become "popular". Heck, the vast majority fail to attract any users beyond their developer(s).

Which is OK--in many cases, popularity is not the intent--the purpose of the language is to demonstrate some principle or otherwise edify the creator.

But if someone is here thinking that they will get to be BDFL for a programming language which gets mentioned on people's resumes, and has an O'Reilly book with a critter on the cover...the line forms on the right. :)

At any rate, welcome to LtU!

...the line forms on the

...the line forms on the right. :)

And here's some helpful reading material while standing on that line: Some words of advice on language design, by Frank Atanassow. Even though the advice is geared towards someone trying to create an important new language, it may help keep priorities straight even if you're aiming lower.

Actual languages

Sometimes, after tiring of reading the theory, it helps to look at the implementation of actual languages. For this I recommend the njs javascript compiler. Self-hosting, written in javascript itself with a C virtual machine, it provides excellent examples of compiling to bytecodes, optimising, bootstrapping, and language constructs like eval.

I recommend this over looking at the source for the Lua virtual machine, which although informative (particularly if you're implementing constructs like closures or coroutines), is a mess. A real mess.

Compiler Construction

Since we are at making recommendations I want to remind an excellent little book written by a researcher that ran out of fashion. The book Compiler Construction by Niklaus Wirth is only 131 pages long and guides through the construction of an Oberon compiler. Source code is fully included in the PDF and after all Wirth created among the most readable programming languages.

Project Oberon

I second this recommendation of Niklaus Wirth's Compiler Construction, and would like to add that Project Oberon provides an excellent analysis by example of how compilers and language tools tie in with an operating system as a whole (see particularly section 6, The Module Loader).

I find it strange..

..that the original poster never replied to this thread, especially because he was asked a few semi-direct questions, and lots of suggestions have been made. He's not given us any idea why or what kind of language he wants to make, though that seems to be less of an issue, since I'm sure we've covered just about everything with papers and books on almost if not every topic.

I have a feeling he left the

I have a feeling he left the conversation awhile ago.