Leetcode – Bash: Transpose file

Next notable challenge from Leetcode is to transpose a file’s contents. Given an input of

name age
alice 21
ryan 30

transpose it to get

name alice ryan
age 21 30

At first glance this looks pretty easy. After we know tr can be used to replace spaces with newline, effectively transposing the first line. But the remaining lines will also get shafted into newlines, effectively creating a single column we don’t want. I took some time to get this, mostly because I had gone down too deep into the first rabbit hole of improving a flawed answer which failed several test cases. Eventually I made it work but not before getting several failures in between.

Throughout the whole time I found repl.it tremendously helpful. The website simulates an online IDE for almost any language including Bash and even lets you create and save files.

It was when I was doing another challenge at HackerRank that I realised there was a much simpler solution using the ‘cut’ and ‘paste’ utilities.

Paste allows you to merge a few files together. For example, if you are working with lists a.txt and b.txt

a.txt
1
2
3
4
b.txt
one
two
three
four

paste a.txt b.txt will give

1  one
2  two
3  three
4  four

You can replace the default delimiter of tab-space with anything. Cut allows you to select and print specific columns from input. I guess you can see where this is going. Together, one can select each column, then use paste to transpose and print it as rows. In the HackerRank challenge where one had to group three entries together as one row the solution was

cat file.txt | paste -d ";" - - -

to give

Buffalo, N.Y.;Burlington, Vt.;Caribou, Maine
Casper, Wyo.;Charleston, S.C.;Charleston, W.Va.

Now of course this presumes we know there will be three columns in the output. If we don’t then we need to figure out how many columns there’ll be. This can be done, somewhat cleverly with grep. Grep is normally used to find strings of words, but can be used to print the exact instance of each string. If one greps for the delimiter (-o switch), in this case space ” “, then pipes this to wc -l to count the number of lines, then we effectively have the number of columns minus one.

Why -1? Because the delimiter (space) occurs only between words in the input file. This is known famously as fencepost error, where the answer to the question

If you build a straight fence 30 meters long with posts spaced 3 meters apart, how many posts do you need?

is 11, not 10 because we need to count the first and last posts holding up the fence. Applied to this problem, grep gives the number of delimiters as say two, which means there are three columns with only two spaces separating the 1st from the 2nd, and 2nd from 3rd.

So the code to count the number of columns would be

col=$(($(head -n1 file.txt | grep -o " " | wc -l)+1))

Note the confusing number of brackets. Bash requires all arithmetic operations to be enclosed with double brackets (( )). As opposed to the triple brackets with a more sinister meaning, but I digress.

With the number of columns, we can construct a for loop to select each column with cut then pipe the output to paste to transpose it. Each iteration with column k would be

cut -d " " -f"$k" file.txt | paste -d " " -s

So putting it all together we have

#!/bin/bash

col=$(($(head -n1 file.txt | grep -o " " | wc -l)+1))

for (( k=1; k<=$col; k++ ))
do
  cut -d " " -f"$k" file.txt | paste -d " " -s
done

which is code that is readable and understandable, unlike my first attempt where I copied snippets of awk code online without understanding how it worked. The performance metrics of this isn’t too good though, though I improved on the previous submission by 200MB less memory used.

transpose words runtime.jpg

transpose words memory.jpg

At this point I’ve finished all 4 bash challenges on LeetCode. Hoping they’ll add more soon. Or maybe I’ll start on the database ones or continue doing more Bash ones on HackerRank.

The rabbit hole solution

My original rabbit hole solution is as follows

#count number of columns in input
col=$(awk '{print NF}' file.txt | sort -nu | tail -n 1)
wordcount=$(wc -w < file.txt)

if [ $wordcount == 1 ]
then
    cat file.txt
else
for ((i=1;i<=$col;i++)) 
do      
#pass shell variable $i into awk     
  (awk -v num="$i" '{ printf $num " " }' file.txt ) | awk '{print substr($0, 1, length($0)-1)}'
done
fi

As you can guess the awk code for counting columns was copied from Stackoverflow, and there is an odd conditional just to catch the one case it didn’t work for (single char). I did learn one importing thing though that awk cannot read shell variables. It must be passed to awk. The piped command is just to cut off the trailing newline character.