How disable comments make static analysis tools worse

// linter:disable comments have always made me feel uneasy, so much so that they aren’t available in elm-review.

I wrote my thoughts about how disable comments make static analysis tools worse, and how we can do without. It’s not strictly Elm-related, but it’s a lot inspired by my work with it. I hope you find it interesting anyway.

13 Likes

Thanks for another fun read :slightly_smiling_face:

I was wondering though, don’t some of the comment ignore drawbacks apply to elm-review?

Specifically Disabling rules that should never be disabled, Unnecessary disable comments, and Original intent is lost. These seem like problems that come up with when ignoreErrorsForFiles and ignoreErrorsForDirectories are used as well.

Before elm-review v2 (I think), there was no way of ignoring errors at all. Unfortunately in some cases, like for generated or vendored code, it becomes necessary. If you look at the documentation for the ignoreErrorsFor* functions, you’ll find cases where I think it’s okay to use, and when it’s not.

If you go against those recommendations, then just like any tool when you’re using it in a way that is not recommended, it’s not going to be great. I don’t have control over how users use these unfortunately, because I can’t tell whether their use

At the moment, you’d still likely use these functions for temporarily ignored errors though, but that’s where the proposal for temporarily suppressed errors comes in. When that happens, we’d be able to tell the difference. At the moment, the only thing you’d be able to do to tell the difference is nice comments in the configuration.

One thing I argue though in the article, is that ignoring errors in the configuration is better than ignoring errors in code. Disable comments will be jarring, can be duplicated, are easier to add, …, which is not the case IMO for ignores done in the configuration.

1 Like

Every time I receive this comment I would ask why they think it would be useful, and the typical answer is something like “well… similar tools have it?” .

My own thoughts on disable comments are a little bit more nuanced than what I will present here, but I figured it would be good to get a stronger case for disable comments (or more broadly inline disable annotations, comments or otherwise) than “similar tools have it” for you to have a go at :).

The case for inline disabling (whether via a comment or via a custom annotation) basically boils down to three things:

  1. Inevitably certain rules must be disabled for certain programming constructs (if this was not the case, then there is a strong case that those should just be part of the language proper).
  2. We’d like to limit the scope of the disabling to the absolute smallest unit possible (ideally just the relevant expression).
  3. Disabling rules (and the comments that describe why) are a form of documentation and documentation that lives closest to the site of the code is the most likely to be accurate.

First I think I want to point out that I’ve personally had pretty different experiences than apparently you have.

We will go through this in later sections, but the reason why a disable comment was used is from experience rarely explained next to the comment.

This really just anecdata back and forth, but this has not been my experience. At every place I’ve ever worked we have had a policy that every disable annotation must be accompanied by a comment detailing why it’s disabled, and of course you could build that in to your tool.

What kind of error did the static analysis tool report? Was it something that didn’t apply? Why did the developer choose to ignore it? Were they being lazy? Did they understand the error? What is the risk of keeping it ignored? If it’s being ignored, why is the rule even enforced?

All of those would be answered by an inline comment, which brings us back to the initial three points.

Inevitably rules must be disabled

The section “Why do users want disable comments?” leaves out a major reason I’ve disabled various static analysis and linting tools over my career: performance optimization. 99% of the time you don’t have to worry about it. But on any sufficiently large codebase over a long enough period of time, there will eventually be that 1% performance bottleneck, a hot path that must be made fast. There you might need the code to absolute be a certain shape for relevant optimizations to kick in, and those often conflict with what the static analysis tool allows.

False positives is the other big reason, as you’ve mentioned in your article. There are many useful rules whose utility outweighs the small, but non-zero false positive rate they bring.

Now if you are going to end up disabling rules, then you’ll want to minimize their blast radius which brings me to:

We’d like to limit the scope of the disabling to the absolute smallest unit possible

This is a minor one, but if there’s only one subexpression that needs to be excluded then ideally we want only that one subexpression to be excluded and all other code to still be included in the static analysis. This is trivial to do with inline annotations. This is significantly more difficult to do in an external configuration file. Probably the best granularity you can hope for in an external configuration file is top-level expressions.

Finally:

Disabling rules (and the comments that describe why) are a form of documentation and documentation that lives closest to the site of the code is the most likely to be accurate

Inline comments stand a good chance of being up-to-date with the code they annotate. File headers are getting pretty questionable. Comments in another file altogether? You better be absolute masters of discipline to have even a hope of keeping the majority of those in sync on a large team.

Any time code that has been excluded from a certain static analysis view has changed, I want to review whether that exclusion is still necessary. That is difficult for anyone to remember who wasn’t the author of the original exclusion if that lives in a separate configuration file altogether. That’s much much easier when it’s just inline with the code being excluded. I have seen many PRs get flagged for not removing an inline disable annotation that was no longer required from the person touching the code at that particular time. I don’t think I’ve ever seen one for an external configuration file where the PR author wasn’t the author both of the original code and the configuration file.

1 Like

In my town there are 20mph speed limits on the roads, which is a relatively new thing and they used to be 30mph. But the 20mph limit is practically never enforced, the lowest cameras are only found in 30mph zones. So everybody drives at 30, except for the occasional rule stickler insisting on going 20. I think the rule is ok since it has been shown to help lower fatalities and life-chaging injuries to pedestrians, cyclists, chidren. But I also think, why have a rule if it is not enforced? This leads to confusion and non-compliance.

So yeah, disabling static analysis checks makes it kind of pointless.

For some projects you might be introducing static analysis and want to enforce good rules on new code, but not be required to immediately fix existing code, is one reason to allow disable comments.

I feel this applies less-so to Elm, especially since elm-review rules often come with an auto fix too. Elm is quite a minimal language and there is often one way of doing things, so auto applied rules are likely to be more acceptable in Elm.

Also since Elm is not side-effecting and therefore easier to reason about, I feel more confident that auto fixes will not break Elm code.

I have also seen rules applied without thinking - using default settings that come with sonarqube for example. On a Java project that gave errors that were forcing developers to write bad exception handling code. When I queried this, the people who set up and controlled the rules agreed it was wrong but would not change the rules. For me, the most important thing about static analysis tools is that whatever rules there are should always be open to discusion and change by the developers themselves. Better to have that conversation and gain concensus than to allow developers to sneak around the rules with disable comments.

1 Like

I’m very glad that this has been the experience for you. It hasn’t been on my teams, but if I were to work with a tool that had disable comments, I would make that a practice as well. Maybe someone at your company made that policy because they had been bitten or gotten frustrated by these disable comments previously.

Yes, that would be my recommendation as well. In these sections I wrote a few suggestions, which I may not explicitly stated as improvements, but this is. The fact that some tools don’t make this mandatory, even though it’s probably very valuable, means that it needs to be repeated and communicated to them, and this was my way of doing it (no clue whether it will ever reach them though).

That is a very good point. But I would like to argue this may be an issue of language design and/or the design of the enforced rules.

(I’m going to say things but I know that changing existing languages is not trivial, please humor me, and to take with a nuanced view as well). In an ideal language, the most performant solutions would be the simple and easy constructs of a language, and you wouldn’t need to go out of your way (too much) to make something really performant. This is of course a bit idealistic because in very critical situations you need to balance different performance metrics (speed vs memory for instance), but I think the authors designers and implementers can aim for reasonable balances for most things.

Even if you need to use constructs that are a bit more complex, why are there static analysis tool forbidding this? Would the optimization require such an obscure or discouraged pattern? (you’ll probably tell me yes for your use-cases, and I’d argue that in an ideal language the designers would have made other choices). Could the rule not be less aggressive, or try to understand the specific optimization?

I’ll give you an example for an elm-review rule. a ++ b gets translated to _Utils_ap(a,b) (which is okay fast) while a ++ b ++ "" gets translated to (a + b + "") (which is faster).
The compiler could try to optimize this by infering or remembering the practical types of a or b, which should work in most cases. It currently doesn’t, but I expect that one day it will.

My Simplify rule reports errors when encountering a ++ b ++ "" because it considers the ++ "" redundant. I haven’t yet but I’m thinking of making the rule configurable so that this rule won’t report this specific issue. That way most applications will still this simplification (which can uncover more simplifications), while performance-critical libraries/applications can keep benefitting from the optimization they wanted to have.

I very much agree that I am likely idealizing situations, especially for things like complex C++ code, but at the same time, I think there are likely opportunities to improve our languages and our static analysis tools.

Maybe we’ll still need disable comments, likely so even, but it doesn’t mean we should not try to aim for environments without disable comments. And that’s maybe the entire point of my article.

Yes, I absolutely agree. But my personal belief is that inline comments have a number of drawbacks, some I mentioned in the article. Also, disable comments tend to be copied around, and the more people see them, the more common they become.

Experienced developers may very well carefully weigh all the trade-offs before disabling a comment, but I doubt beginners will be able to do the same. They’ll likely copy-paste a section and if it contains a disable comment, keep it there. If they don’t understand a reported error well enough and if they’ve seen errors being ignored in some places, they will be more likely to ignore this one as well.

In elm-review, I tried solving the issue of having to ignore an error by raising the quality of my rules and reducing the number of false positives, through all the techniques I explained in the article, and sometimes by not implementing a rule at all. Sometimes it’s through tagging a piece of code with what looks like a disable comment but can’t be used for other rules.

Disable comments is not the only solution and there are often better ones. And that’s what I people to get out of my article.

I am writing this coming from the context of Elm, so my views can be a bit idealized. But I want people to realize that languages with other tradeoffs can help in non-obvious ways such as static analysis reports.

Maybe one day I will give in and add disable comments to elm-review because too many rules started requiring it somehow, who knows. But in the meantime, I’ll do my best so we don’t have to.

1 Like

I’ve been told by coworkers that situations like this have caused huge errors. Despite multiple senior devs reviewing the code too. No one understood the impact of the linting warning they got and so disabled it. This resulted in breaking the app.

1 Like

[BEGIN DEVIL’S ADVOCATE]

I’ve been told by coworkers that situations like this have caused huge errors. Despite multiple senior devs reviewing the code too. No one understood the impact of the linting warning they got and so disabled it. This resulted in breaking the app.

@wolfadex it cuts both ways. I know of several bugs that shipped out to production because someone blindly followed a linting rule (e.g. one was a non-null linting rule that was turned on for a file interop-ing with a third-party library that expected nulls for certain behavior but could take any other type in its signature, causing it to silently accept a Maybe-like type that did not trigger the null case). When it comes to non-semantics-preserving linting rules, there is no substitute for understanding the impact of your linting rules.

So yeah, disabling static analysis checks makes it kind of pointless.

@rupert This is a rather black and white take. Having some static analysis checks disabled in some places in your code does not make the other places it’s used less valuable. Looking at a piece of code and knowing that because it doesn’t have certain annotations, you have certain guarantees is very valuable (even if further down the callstack you don’t have those guarantees anymore). Take Elm’s promise of total functions for example. It’s not 100% there (equality checks, blowing the stack, and spinning in an endless loop are all ways to break it), but we all derive massive benefit from it. Even in languages with exceptions (e.g. Haskell or even more so Scala), an expectation of no runtime exceptions in pure code makes it more pleasant to work with than languages which use exceptions willy-nilly. You don’t need 100% of something for it to be valuable (although it is valuable to acknowledge non-linearities in the spectrum).

Even if you need to use constructs that are a bit more complex, why are there static analysis tool forbidding this?

Would the optimization require such an obscure or discouraged pattern? (you’ll probably tell me yes for your use-cases, and I’d argue that in an ideal language the designers would have made other choices). Could the rule not be less aggressive, or try to understand the specific optimization?

@jfmengels This assumes that whether or not a certain optimization is required is knowable from the code. That is not always true. Whether a certain piece of code needs to have some amount of performance is ultimately a business issue and it can be those business needs that dictate whether a certain piece of code needs to be optimized a certain way, or whether it doesn’t. Indeed often times I explicitly don’t want the constructs I’ve allowed for a very specific piece of code that is written a very specific way to be allowed anywhere else in my codebase! I want the static analysis rule to disallow it 99% of the time! It’s just because of a very specific business reason (e.g. a particular user interaction) that I need to write this very small snippet of code this particular way.

The reason this matters BTW if it’s something that doesn’t show up 99% of the time is the tyranny of large numbers. While individually rare, over the lifetime of a project among many devs they become quite common.

Moreover, even in the cases where this is knowable from the code, this goes far beyond having the ideal language. This requires having an ideal stack, all the way down to hardware (e.g. caching hierarchy). Optimizing your tools to deal with this ideal of a combined hardware-software stack seems like a painful and inadvisable tradeoff the overwhelming majority of the time.

You bring up ++ (another Elm one is the compare subtraction against 0 trick), but one common linting example where it is unlikely simple fixes to the language would ever be enough is long argument lists vs passing in a configuration object. How that ends up affecting JIT, caching, pointer chasing etc. is a sufficiently complex interaction that I doubt it’s possible to ever have a sufficiently flexible rule when push comes to shove in a hot path.

I haven’t yet but I’m thinking of making the rule configurable so that this rule won’t report this specific issue.

Configuration comes with its own costs. One of the great joys of working with the Elm ecosystem is that by and large there’s no configuration needed. You have a single integrated compiler/debugger/package manager that has very few knobs to turn. Just point and run. It is often more user-friendly to allow for specific local overrides, than to add additional levers and knobs in configuration.

Disable comments is not the only solution and there are often better ones. And that’s what I people to get out of my article.

This is a far more measured take than the title of the article suggests (which is that they are an unalloyed bad).

[END DEVIL’S ADVOCATE]

As I mentioned at the beginning of this, my own actual personal take is more nuanced and my own personal preference would actually be something in the middle, not a compromise, but a sum that’s greater than its parts.

As far as I understand it @jfmengels your objections to inline static analysis rule disabling boils down to one thing: it makes it too easy to disable static analysis rules. And I understand and sympathize with that.

And yet that’s not an intrinsic part of inline static analysis rules! You can make it arbitrarily harder to disable static analysis rules while keeping all the benefits of the locality that inline disables give you. The trick is just to make the user have to do a lot of (computer-checked, but otherwise unaided) work in the direction of disabling rules and then make it really easy to re-enable rules.

So e.g. what you can do is force all inline disable annotations to have an accompanying entry in the centralized config file. You can then in the centralized config file further have a meta-annotation on whether this is meant to be a permanent or temporary exception, and if it’s a temporary exception, require an end date that

E.g. (purposefully in non-Elm pseudo-code to show its generality across languages)

-- In code
@disableRulePermanent(
  rule = "no-unsafe-sort-by-hashcode",
  -- This is a tag name to refer to this specific exception instance later
  -- on in the centralized configuration file, disambiguates among
  -- multiple exceptions in one file
  tag = "hot-path-branch-prediction",
  reason =
    """It turns out that the safe way of writing this code ends up causing
    massive slowdowns due to branch prediction errors and you need to
    do a redundant sort by hash code beforehand. Note that the sort by
    hash-code is okay here because the actual semantics of the code 
    aren't affected by the order, only the performance, so we just need
    some order, any order."""
)
myExpression = ...

@disableRuleTemporary(
  rule = "no-nulls",
  tag = "waiting-on-external-library-version-bump",
  reason =
    """`external-library-5.3.2` introduced a bug in `coolFunction` which
    requires a null arg to get the same behavior that would have been
    achieved by passing it 0 in version 5.3.1. This is tracked at
    https://example.com/issuetracker/31 and they should have a new release
    fixing this in October.
    """,
  -- Past 2021-10-01 our static analysis tool will throw an error
  expirationDate = "2021-10-01" 
)
myOtherExpression = externalLibrary.coolFunction(null)

-- And then later on separately in the central configuration file
inlineExceptions = List(
  -- If any information here disagrees with the inline annotation the
  -- static analysis tool reports an error
  Temporary(
    file = "src/MyFile",
    rule = "no-nulls",
    tag = "waiting-on-external-library-version-bump",
    expirationDate = "2021-10-01"
  ),
  Permanent(
    file = "src/MyFile",
    rule = "no-unsafe-sort-by-hashcode",
    tag = "hot-path-branch-prediction"
  )
)

You make users do redundant work, that is checked by the computer so that it’s in sync, but is burdensome enough that users aren’t tempted to reach for it willy-nilly (this also prevents the issue of users mindlessly copy-pasting code). When it comes to PRs, you also have a single file that is the authoritative truth of all exceptions to various static analysis rules. Then you make the other direction completely automated, so that there is a command to run that will delete all inline disable annotations if they have no accompanying entries in the config file and vice-versa. This tilts users towards a process that makes it hard to disable annotations, but easy to re-enable them.

You get all the benefits of the locality of inline annotations (any developer can see immediately that if coolFunction is no longer used, you can get rid of that annotation), along with all the purposeful friction and single centralized listing of all exceptions used in your project that you get with the config file approach.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.