How to decrease bugs in the code

After couple of years of using a Haskell, I noticed that errors in the code were not decreased (lie, as well as "you will be more productive", "you will write code faster", "code will be smaller" - all of these sentences were a lie), so I thought about it: how to decrease bugs. I know these methods:

  • More unit and properties tests (it's available in most languages)
  • Free monads - allows to check logic in code which looks like "imperative" without to involve real IO
  • Indexed monads - to prevent denied transitions in monad (IO, etc) - IMHO leads to complex less readable code
  • Prove tool (Agda, Idris, etc) - seems difficult to use, not sure
  • Liquid Haskell - not sure how it helps in real world applications, also not sure will it alive after dependently types introduction in the next GHC versions
  • Literate programming - good to decrease logical errors IMHO

The most embarrassing of me circumstance is that, as I understood, most real world errors (not typos and other stupid errors) can not be caught by type system because: 1) they happen in run-time 2) their roots are unexpected behavior of something external 3) often they are logical: some complex business logic is not fully correct but it's difficult to describe it formally even. Also I'm not sure is it possible to qualify anything as some kind of type: types of typical Haskell app are so many, that attempt to use more complex types will lead to something absolutely unreadable and unsupportable.

I think Design by Contract can help to cover some of the errors, but I can not find good DbC framework for Haskell. IMHO it can be something like "predicate under monad", because contract, sure, should be executed with side-effects (I'm interesting to verify some external entities, etc).

In this case all functions like `f :: a -> b -> IO c` become `f :: Ctr a -> Ctr b -> CtrIO c` or something similar. But I'm not sure here, because I need to check not only pre-/post- conditions but also invariants. How they can look in Haskell where you have only spaghetti code of functions? How to code asserting conditions in those monads? I found work of Andres Loh, Markus Degen, but this does not helps me. Most of articles (Peyton-Jones, Andres Loh, etc) look very shallowly, academically or unpractical (on research/experimental level, sure, as all other in Haskell). What do you use to decrease bugs in Haskell code? Also will be very interesting real success stories!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

How to program

Slowly, over several decades of trying to build a better programming language, I've come to suspect we have no clue how to program, and our efforts to do so may be a subject for hilarity by future students of history... if civilization survives our blundering so that there are future students of history. My perspective, fwiw:

I started out figuring what we need is vastly more abstractive power, which at the time I thought I knew how to do, though my early thinking on that now seems to me prosaic and naive. I did try to develop a highly general theory of abstractive power, and still believe abstraction is a key thing to understand although the specific formal treatment I've developed thus far might not be what's wanted. My ideas on specific tactics have changed greatly, and my thinking currently focuses on the difference between sapient minds and non-sapient technology. I blogged on my skepticism about formal types some years back, that typing has become more hindrance than help, and have since developed from this the insight —not yet on my blog, as the relevant draft posts are still incomplete— that the reason mathematical abstraction is so profoundly more facile than programming abstraction is that mathematics is a conversation between sapient minds, whereas when programming abstractly we try to "explain" what we're doing to computers that, being technological artifacts and thus non-sapient, are fundamentally incapable of understanding anything, ever. This I've found is the unifying theme of Lisp, that it strives to make programming less inhuman by minimizing conversation with the computer: besides the elementary evaluator algorithm (making evaluation more humanly accessible), garbage collection avoids elaborate negotiation over memory allocation, bignums avoid elaborate wrangling over numerical representations — and traditional Lisp simply omits wrangling over elaborate typing. I'm currently interested in the relationship between sapience and interpreted programming languages.

Notably, technology (being non-sapient) is particularly ill-suited to error handling, which it's therefore entirely unsurprising we are particularly clueless how to program well. Which ties in, again, to my rising sense that we don't have a clue how to program computers.

Very important to find somebody with similar feeling!

John, thank you for links to your blog!

I had some intuitive feeling, some thinks about types, I'm not scientist or mathematician, so I can talk about this only from developer/practical point of view.

I see some processes in Microsoft solutions: they borrow FP/Haskell ideas, but rethink them. I can only show some examples, my observations: nullable type is like Maybe but don't force you. Currently it raises warnings, but type-checking passes. Sure, reason #1 is backward compatibility, but IMHO not only. Example: I can have some information which I want to hardcode literally (as constants) in my application. And it will be organized as map/dictionary. Fine, but get of values will propagate 'Maybe'. I can use "fromJust" but it's not totally and leads to flame in reviews :) So, warning instead of type-check failure is possible approach in the language too: and C# does exactly this (I'm absolutely far from think that people in Microsoft are more stupid then in Haskell community ;)))

Another example: C# allows dynamic types. Hindley-Milner typing algorithm can infer types (without explicit signatures/types notations), in Python we have dynamic types, so what is the problem to hide types from programmer at whole? We have a lot of constructions on low (compiler/optimizer) level and programmer should not know about them: types can live at the same layer too (if they are necessary). For example, if function's argument type seems to expect Int and String then type can be automatically infer as Int|String, then compiler can check that programmer does not confuse both branches (one uses Int's operations, another one - String operations only), also no calls with other types, etc. Such algorithm can exist too and I don't think it's error: compiler can raise warnings about such cases but does not treat them as types violations. Mostly me, as programmer, is not interesting in types playing (HKT and other complex abstractions). I need types to help me: 1) refactoring 2) some kind of documentation. But all of this involves another problem: you can print Haskell code on the paper and you will not understand it: you need some IDE to check real type of the object: using operators are very generic and you don't know what means "handle" or "a" or "ws" (normal variable names in Haskell). You can expect good naming, but... :) Larry Wall, as linguist, found own solution: in English we have different "service" words, prepositions, etc. So, he uses sigils. Sigils looks fine (they exist in PowerShell even), but he introduces twigills also, which looks not so beauty ;)

Another my observation was: strong types lead to super big set of types in the code. Most of them repeat external entities which are already strongly defined (as XML/JSON schemes, DDL, etc, etc). Microsoft offers solution with types providers - Haskell community does not understand the strongly types problems even.

Another my observation: types are "markers", "qualifier", but often we can not use them correctly/adequate to reality. Example, famous Path library of Chris Done. For example, it was implemented that Path can be Absolute|Relative (a) and File|Directory (b). So, type is "Path a b". And you can not have POSIX/NT paths in the same application :) which sure is totally wrong in enterprise applications. But more interesting is if we will have such possibilities, because in the case we need more parameters to Path. Also, I can imagine path-like manipulations with URLs. Also, there are NETBEU paths, etc. So, "Path a b" become "Path a b c d...". Or "Path (a,b,c...)", or "Path a::b::c..." (best case IMHO). So, rightly designed Path type will be complex and bad in real world usage. Which means for me: you can not use strongly types in most correct way, only some trade-off, cut-down version. Otherwise, application become very complex.

And sure my (sure very subjective) experience: bugs in Haskell programs are not less than in other languages, but more interesting is: code becomes more complex and puzzle-like, DSL-style programming is difficult in refactoring...

It's only intuitive feeling but IMHO strongly types direction is not very fine, I mean it has own serious problems in the real world.

Units tests, Types, and Abstraction.

If you Google "unit tests dont reduce bugs" you will find plenty of anecdotal evidence that what unit tests do is prevent regressions. Such testing is probably necessary whether the language is typed or untyped, so to me unit-tests are actually orthogonal to the question at hand.

My experience is that bugs hide in complexity, so the simpler things are the better. Often this comes down to finding the right abstraction. With the right abstraction the operations the user wants to perform map to simple primitive operations in the abstraction.

Regarding types, I cannot understand how a function that accepts Int|String can ever be necessary. I do not think allowing arbitrary paths for example is good practice. Let's say I am new coder working on a project (and there may be none of the original team there), and a have to rewrite a function that is passed a variable called 'path', what kind of path can I expect it to contain? If it conforms to some Path type I have a good idea. Now let's say I put a NETBEU path in there, but the rest of the program cannot cope with such paths, I will have a lot of debugging to do to understand why things have gone wrong. Much better to have a type that reflects the common understanding of what a Path is.

So types form a contract between modules, and the more powerful the type system, the more invariants can be specified. If you get to a full proof system then you can have axioms in the interfaces to express the invariants you want.

However I think this is straying away from bugs. I think types help with bugs like adding a distance in feet to a distance in metres, which is useful for one class of bugs. I think unit testing helps against regression. However I think the biggest single factor in reducing bugs is reducing the amount and complexity of code to achieve a task, and I think that comes down to abstractive power and finding the right abstraction.

The best understanding of abstraction I have found to date is in Alexander Stepanov's "Elements of Programming", but the book is not like a normal textbook, it does not spoon feed you the answers, rather it's like a long worked example that you have to read several times before you can get the underlying principles.

Yes, tests are still the best weapon

Example with feet/metres is good: today it's easy solving with measure units of F#. There are Haskell libraries for measure units too, but they look not very good :)

Your idea about simplicity (clean semantics of method/function) is good and means easy to test it - it reminds me about Smalltalk... Small methods, TDD. But usually we write tests after code :(

Yes, idea of Path type is good, but my point was that in real world application I can have different kind of paths which can not be described only with 2 type "markers": relative/absolute and directory/file. Also my phantom type will be expanded to cover POSIX/NT path, NETBEU paths, etc. And when I think about path-join I see cases:

Path Posix Abs Dir -> Path Posix Rel Dir -> Path Posix Abs Dir

Path Posix Abs Dir -> Path Posix Rel File -> Path Posix Abs File

Path Posix Rel Dir -> Path Posix Rel Dir -> Path Posix Rel Dir

Path Posix Rel Dir -> Path Posix Rel File -> Path Posix Rel File

Path Nt Abs Dir -> Path Nt Rel Dir -> Path Nt Abs Dir

Path Nt Abs Dir -> Path Nt Rel File -> Path Nt Abs File

Path Nt Rel Dir -> Path Nt Rel Dir -> Path Nt Rel Dir

Path Nt Rel Dir -> Path Nt Rel File -> Path Nt Rel File

Path Posix Abs Dir -> Path Nt Rel Dir -> Path Posix Abs Dir

Path Posix Abs Dir -> Path Nt Rel File -> Path Posix Abs File

Path Posix Rel Dir -> Path Nt Rel Dir -> Path Posix Rel Dir

Path Posix Rel Dir -> Path Nt Rel File -> Path Posix Rel File

Path Nt Abs Dir -> Path Posix Rel Dir -> Path Nt Abs Dir

Path Nt Abs Dir -> Path Posix Rel File -> Path Nt Abs File

Path Nt Rel Dir -> Path Posix Rel Dir -> Path Nt Rel Dir

Path Nt Rel Dir -> Path Posix Rel File -> Path Nt Rel File

And here we use only 3 classifiers: Posix/Nt, Rel/Abs, File/Dir. What will happen if I try to cover URLs manipulations with the same Path concept... :) Sure, we can generalize it: with type-classes (and to separate cases at run-time) or with type-level lists (and to separate cases at compile-time, for example) - lists because we can have more generic and more specific phantom types and lists, *may be*, will allow us to skip enumeration of all classifiers. But this example shows my point: classification of real world with types can be very tedious :)

I have feeling that strongly typing and theorem prove are good tools but not in enterprise where there are a lot of exceptions from some generic rules: exceptional cases are much more than some generic and elegant rules, business logic with their details eliminates pros of these methods.

Use Unix :-)

To blame an operating system problem on types seems a bit of a stretch to me :-) windows clearly has a problem with paths, whereas Unix manages to get away with a single global definition. This is a good example of my point about the wrong abstraction causing complexity in the application.

If we see a path type as a contract between all the developers working on a project, agreeing what a valid path is, then when the paths on a system are complex of course the type will be complex too, but what alternative is there? You could write documentation to describe what a valid path is, but then it has to be intelligible (as human languages tend to be vague and full of different interpretations) and it's has to be precise, and it has to be kept up to date with the code. So it seems a formal language is required for the precision and universal intelligibility, and if you have a formal language why not get the compiler to check the code against it, to make sure the documentation is up to date. So it seems to me a type system is exactly what you need. Current day type systems may not be ideal, but that is a reason to improve them, not abandon them

You could use a simple disjoint sum type for the Path, so the type would always be 'Path' and you switch between the type-tags at runtime. This is still safer than putting all your paths in a string and expecting everyone who reads/writes the string to know what it means and what is valid.

In fact the argument against types is poor, because the alternative, of just expecting everyone to know what kind of paths are valid in some string, is complete chaos, and will almost certainly be a bug magnet in any application.

BTW, I like the idea of TDD, and I try to do it as much as I can, but the reality is that you want to write the program and not the tests and it is hard to keep going with it. Often the time to write mock-services and script UI interaction is significant, and it is easier to just ad-hoc test by starting the application and clicking. Automating testing of UI interaction is also tricky but not impossible.

I don't see why the path

I don't see why the path separator would need to be lifted into a type. A path is simply a list of path components, it's up to the file system abstraction to apply whatever separator is needed.

I even question making a distinction between relative/absolute paths and file/directory. In fact, I'm not even sure why you'd do this with paths anyway, and not proper handles. Only a directory type should support path lookup.

The complexity results from simply using the wrong abstraction.

Test weapons

NO! Tests are the WORST weapon. They're the method of last resort.

THINKING is the best weapon. Visual appearance is important.
The compiler's type checking. A sketch of a proof.

Writing tests is a good weapon. The tests themselves are not there to find bugs, they're their to reinforce confidence attained from the OTHER methods of assurance.

"if you Google "unit tests

"if you Google "unit tests dont reduce bugs" you will find plenty of anecdotal evidence that what unit tests do is prevent regressions"

So they reduce *future* bugs. I have a suite of regression tests for my product and they are VERY good at catching changes that break things, be they fixes or extensions. In fact I have a rule: if it passes the tests, commit, if it doesn't don't. CI servers shouldn't store artefacts unless the tests pass.

The best method is code

The best method is code reviews:

https://kev.inburke.com/kevin/the-best-ways-to-find-bugs-in-your-code/

But is important to think: What *increase* the bug count?

The increase on line count.

And the lack of time on work on that lines to be sure are ok. "Review code" is in fact a combination of type system, automatic testing, prove tools and manual inspections, etc.

If you HAVE THE TIME to do it, your bug count must get near 0.

The problem is that most are under presure to produce MORE LINES and not TO REVIEW THE LINES.

I think this all of this.

So, tools will NEVER be good enough, because them only will automate a small sample of possibilities and can't solve that a yelling boos ask you why you haven't shipped features 300...450 yesterday...

Six suggestions

1) Reduce complexity as a matter of course (refactor and abstract).

2) Write a formal / semi-formal specification for your program; keep it separate from the implementation.

3) Leverage type systems (mostly only reduces more trivial bugs, but a bug is a bug).

4) Write unit tests for code you can't make trivially simple.

5) Make the system debuggable as a whole - even if it is distributed (EG - consider using MetaFunctions rather than Microservices).

6) Code reviews also help, but don't scale very well.

That should be a good start, at any rate.