To wrap or not to wrap? That is the question

2023-04-02

Bopher-NG has surpassed 300 SLOC. Well, it still is under 350 and probably will stay that way for a long, long time. I’m not a big fan of feature creep, and, for example, adding (optional) clipboard copying support to the link stash functionality only took a single line in the corresponding function (see the README for details). But I needed existing functions to work as they should. And this led me the way to finally contradict the assumptions made back then when creating the original Bopher prototype, the main of these assumptions being the lines will never wrap.

In the original Bopher logic, wrapping any line of text meant that the physical screen would end up containing more lines than the virtual screen buffer we’re operating on. This breaks all scrolling and line positioning to the point of no repair. And commanding the terminal to disable line wrapping is not guaranteed to work everywhere and is also terrible for the overall UX. This is why we must make sure that whatever is displayed physically, is also mapped logically. We must wrap long lines to fit the desired width ourselves and not let the terminal do it. This way we can track all the physical text the same way we did before. And for this, we need a text reflow algorithm.

Now, there are different approaches to the same problem that involve different alignment strategy, whitespace handling, newline handling and so on. For the usage in Bopher-NG itself, I have settled for one of the simplest algorithms, that works line by line only, produces left-aligned output with no word breaks and preserves all the whitespace characters it encountered in the input. Since its primary field of usage is Gopher clients and text documents formatting for publishing on Gopher, I called this algorithm Phlow. Here it is:

Accept the line L and target page width W as parameters.
Get the line length LL: LL = len(L).
If W is 0 or LL < W, emit L and quit.
Allocate a variable LWS to track the last whitespace position. Set it to 0.
Allocate a variable CPOS to track the current relative position. Set it to 0.
Allocate a variable BPOS to track the current base position. Set it to 0.
Allocate an empty output string buffer OUT.
For every index I from 0 to LL, perform the steps 9 to 19:
If the CPOS value is less than W, go to step 10, otherwise go to step 13.
Fetch the current character C from the line L at position I.
If C is a whitespace character (0x20), set the LWS value to CPOS.
Append the value of C to the output buffer OUT. Go to step 19.
If LWS value is 0, set it to W.
Emit the value of the output buffer OUT truncated to LWS characters.
Empty the output buffer OUT.
Set BPOS to BPOS + LWS.
Set CPOS and LWS to 0.
Set I to BPOS.
Increment CPOS. End of iteration.
If the output buffer OUT is not empty, emit its value. End of algorithm.

The “emit” operation here can usually mean something like “output with a newline at the end”, but this is not a part of the algorithm, just a technical implementation detail: you may append those emitted lines to an array instead or do anything else with them. When translated to Bash (see, for example, phlow_lite() function in the bopher-ng.sh source code), this algorithm just works for the purpose of viewing plain texts with long lines. But this is exactly what caused the overall codebase to ramp beyond 300 SLOC. Well, whatever.

The real problem here is far more fundamental though. You see, since the advent of really online-oriented mobile OSes, the big Web had mostly fully addressed the problem of viewing content on small screens. Gopher community, which has been in fact alive and active for all these years, on the other hand, turned out to be divided into two factions: fans of old-school hard fixed-width text preformatting (70 or 80 characters width, no matter what) and fans of the more modern, Web-like and “mobile-friendly” approach of “all the wrapping must be done on the client, and the client only”. There even are two long posts (on Gopherspace itself, of course) advocating for each position, each written in the corresponding formatting style: pro-fixed rant by Wandering Geek and pro-wrap rant by Magical Fish. Both texts were really amusing to read, but where does this leave me, the author of Kopher, which is a Gopher client for the platform with the narrowest (240px wide) screens a smartphone can have these days, that can only provide an effective width of 25 characters in a font that still would be comfortable to read?

Yes, the post by Magical Fish, at its core, is more mobile-friendly and can be read on Kopher with no issues whatsoever. The Wandering Geek’s post is not so easy to read in the mobile setting even with the text wrapping turned on, because hard-wraps that occur at the end of already soft-wrapped lines leave single words on their own lines, which just annoys the hell out of me. That’s why, as an author of Kopher, in order to make both document types actually readable on a 25-character-wide screen, I must have come up with a more intelligent wrapping mode than there was before posting this. Unfortunately, reversing the reflow process on a hard-wrapped text (that can also be surrounded by whitespaces, tabs, CRs and other non-printables from both sides) is not so easy, so some compromises and sacrifices have to be made. With that said, let me introduce the Unphlow algorithm.

Read the input text into an array A of lines, separating them by LF (line feed, 0x0a) character.
Allocate an empty output string buffer BUF and an empty output string array OUT.
For each line L in the array A, perform the steps 4 to 10:
Remove from the line L all leading and trailing occurrences of the following characters: whitespace (0x20), TAB (0x09), CR (carriage return, 0x0d).
If the line L is empty, go to step 6, otherwise go to step 10.
Remove all trailing occurrences of the whitespace character (0x20) from the buffer BUF.
Append the value of buffer BUF to the array OUT.
Empty the buffer BUF.
Append an empty string to the array OUT. End of iteration.
Append the value of L and a whitespace character (0x20) to the buffer BUF. End of iteration.
If the buffer BUF is not empty, perform the operations described in steps 6 and 7.
(Optional step) For each line OL in the OUT array, replace all sets of consecutive whitespace characters listed in step 4 with a single whitespace (0x20).
Emit the array OUT. End of algorithm.

Now, the array OUT will contain all the lines ready to be output in the soft-wrapped fashion (or even passed to the Phlow algorithm to be rewrapped at another width). Note that the step 12 is optional and only necessary if the source text is width-aligned (and has uneven number of whitespaces between words on each line) or even right-aligned, but since I don’t know which kind of alignment my client encounters, I implemented the whole of it in Kopher 0.0.4. Also, on the more implementation detail side, since the wrapping mode switching doesn’t just involve CSS anymore and I don’t feel like having to make another network request to fetch the same content, I also prepare both “original” and “wrapping-friendly” versions of every response and just switch between them along with the corresponding content container attribute. Also, for the time being, I decided to only enable Unphlow for the plain text contents and not Gophermap entries, where you’re not supposed to put long descriptions anyway.

Here’s how the Unphlow algo itself was actually implemented in Kopher in JavaScript:

function unphlow(str) { // Unphlow algorithm implementation
  var lines=str.split('\n'), line, l = lines.length, i, buf = '', out = []
  for(i=0;i<l;i++) {
    line = lines[i].trim() // remove all leading/trailing whitespace-class chars
    if(line.length) // if the line is not empty, just append it and a whitespace to the buffer 
      buf += line + ' '
    else { // output logic
      out.push(buf, '')
      buf = ''
    }
  }
  if(buf.length) // process the remaining output
    out.push(buf)
  return out.map(function(s) { // final whitespace sanitation
    return s.replace(/\s+/g, ' ').trim()
  }).join('\n')
}

A tiny piece of code that makes such a large impact on mobile devices and allows to make peace with both old- and new-fashioned text documents. And this is what Pocket Gopher and other mobile clients actually should implement themselves if they opt to force-wrap Gopher-published content, otherwise it looks like a total mess even on a bit wider screens. Either wrap it properly, or don’t wrap at all. That’s my motto. And you should give your users a choice: to wrap or not to wrap. I don’t really know if Kopher is the first ever mobile client where all of this is finally done the way it should, but I won’t be surprised if it is.

To wrap things up with Kopher (no pun intended) at least for today, I also must say that I’m not quite sure yet what to do with the terminal escape sequences, some of which are even present on my own HNB Finger/Gopher page. For the next 0.0.5 version, I’m at least going to make it recognize and remove those sequences before even saving the content to the rendering buffers. Of course, I wish my eventual goal would be to do something like the authors of Lagrange did (because Lagrange actually supports that ANSI-code formatting, at least partially), but this would violate the entire idea of simplicity of just putting the file contents into the textContent property if we’re dealing with plain text, as opposed to complex rendering into innerHTML we only really need to do for Gophermaps. Maybe Kopher eventually will support inline ANSI coloring for Gophermap infolines only, who knows.

By the way, in another follow-up post, Wandering Geek also says that Lynx’s text wrapping is broken and doesn’t wrap at full words. That’s true. Many things in Lynx are so broken I don’t even consider it a point of reference anymore. And yes, they should think about their userbase and at least implement the above Phlow algorithm, to start with. But again, WG also complains about other things that would be solved so easily had he chosen a more thought-out client (or even written his own, it’s not so difficult as we already found out). For instance, he complains about mixing plain text with Gophermap syntax. Again, this is something I wouldn’t do (I have Lagrange to test this), and I have created a set of tools (like gmi2map.sh or gopherinfo.sh) to make sure I only serve well-formed Gophermaps generated from my Gemtext or plain text documents that need to have some navigable links, but, knowing perfectly such things happen in real life in modern Gopherspace, I made both of my own clients support these deviations from the RFC quite naturally, just treating any plain non-tabbed lines in a Gophermap as info lines. The bottomline is that, with this approach, you can see more Gopherspace content than the “orthodox” clients like Lynx or Lagrange would allow you to. Well, I’m old enough to remember similar debate about writing webpages in strict XHTML versus ordinary, relaxed HTML. And, just like with this issue, guess who won? Practice. As it does every single time.

Going back to the wrapping topic: yep, I’ve solved the issues of displaying unwrapped content on terminals and hard-wrapped content on mobiles, but what’s my personal preference when it comes to content authoring? Am I on the WG’s or MF’s side in this great whale vs. elephant battle? Well, my answer is: it totally depends on which kind of text you are presenting. It just makes absolutely zero sense to hard-wrap a program source code (or, for that matter, any text with program snippets in between, like this one), a hierarchical document like XML or HTML file, a Markdown or Gemtext document, an INI, JSON, YAML, TOML, CSV or TSV file (wait, oh shi…), a hexdump -C output, a log file, i.e. any text file which is still human-readable but not explicitly designed for presentation on Gopherspace. And the MF’s posts are just like that: they are Markdown documents, so it’s totally fine they are not hard-wrapped. If I, for some reason, want to read them in a fancier way than my clients allow to, I’ll download them and pass them to whatever Markdown renderer/viewer I get my hands on. Problem solved. On the other hand, if you are writing an article-like post, a diary entry, an information digest or e-zine, an essay or another piece of literature, and it is something to be only published on Gopherspace (and/or other plaintext-only places) to be read in its entirety by a living human and not copied/parsed/processed, then I’m all for hard-wrapping, as well as leading whitespaces, ASCII lines and other decorations on top of that. Because besides the content itself, all this, as I said in my previous post, is the only way to express your creativity in its presentation.

TL;DR: unwrapped Markdowns and Gemtexts are cool for tech docs and as the source format for anything else. Making users read them as plaintext compared to a properly preformatted and decorated text file is akin to making them read MFWS compared to BMFWS or Medium. I hope you get my point by now.

P.S. And yes, I’m preparing to launch a gopherhole on my $10/mo VPS that already is serving plenty of personal stuff. It will be announced here, on my main homepage and, of course, on my HNB page.

#gopher #bash #plaintext #algorithms