My general disposition toward testing code is that it is a distinct job that needs to be carefully considered in its own right. What you are producing when you create a test suite is a product that serves the dual purpose of not only ensuring that the code is correct, but also substantively improving the experience of the developer, who includes but is certainly not limited to yourself.

What this entails is that a coherent testing regime can only be considered when there is enough of an artifact to test. Test suites that exclusively grow organically, especially when combined with test-first policy stipulations, tend to incentivize the creation of tests that are either redundant—as in the conditions are already part of the test suite of a given third-party dependency—or they are tautological and do not give you any new information. As such, a testing regime should be explicitly designed.

The converse of testing zeal is not testing enough. Writing tests diverts effort that could have been put toward expanding and refining the behaviour to the product itself, and does not always net positive. We should be honest about that.

There is a natural inflection point for diverting resources to testing, that happens around the time you can no longer just eyeball your results in a REPL. This eyeballing situation may not be uniform across the entire artifact, but you can get quite far that way.

There is a tendency for us to write tests as if we sometimes do not trust the computer itself to behave consistently; if this were actually the case, we have bigger problems than the integrity of our own software. This is often a symptom of composing test suites upward from the inside out rather than sculpting them from the outside in.

Therefore: Concentrate your initial testing efforts on the behaviour you want the artifact to exhibit before testing the behaviour you need for it to exhibit in order to exhibit the behaviour that you want. Backfill when you get unexpected inputs to those ultimate tests. Aim for sparse coverage, that is, a public interface that does what you expected it to. If there is something wrong somewhere in the plumbing then those tests are likely to fail too.

Regarding methodology: Once you have accumulated enough of an artifact that it does something coherent, and that eyeballs in a REPL are delivering sufficiently diminishing returns, set aside some time to sketch out the entire testing regime in one go. Begin with a single file; write out declarative prescriptions and proscriptions (i.e., artifact should [not] do…) as a flat list of bullet points in prose commentary. Formal language is too restrictive at this stage. It will very quickly become apparent how these expectations ought to be grouped, which expectations are redundant (i.e., they are implied by more salient assertions), and what shared resources and fixtures are required among the tests. Once you have arranged these intermediate results in a sensible manner, you can stub them in as code in a second pass, and fill in the contents as you progress.