Xavier Olive research teaching python blog til cli

Linux at the command line

4 January 2018

As a research scientist in computer and data science, Linux is part of my daily work routine. I did the investment back in my students days (that would be a 15th anniversary!) and it has truly been worth it. Linux was a tad more unstable back in the days, but the terminal experience is still something I need on a daily basis. First as a developper, since developping a graphical user interface is a time consuming thankless task, but also as an end user to get a better control on my operating system and on various kinds of data produced by the devices that are have started to track accompany our lives in the past decade.

I will try to share here the tools I enjoy everyday and also reflect on tools I used to enjoy but which have become a bit irrelevant to me recently.
I did not intend to be comprehensive here, so this link may be a good reference if you need more about basic GNU core tools.

Index: awk, bc, chroot, convert, dmesg, docker, file, find, git, gnuplot, htop, ldd, locate, mdfind, nc, ncdu, nm, pdfjoin, pdfrename, pdftk, ps, python, qemu, screen, sed, sudo, tee, telnet, tree, type, unpack, virtualenv, which, xdg-open, zmv, zsh

(generated by:)

grep '^## ' 2018-01-04-linux-at-the-command-line.md | cut -f2- -d\  | sort |
sed 's,\(.*\),[\1](#\1), ' | paste -s | sed 's/\t/, /g'

Some commands come in two flavours: the BSD style and the UNIX style (not available in MacOS). As I am an avid user of both systems, I tried to collect snippets for both.

✯ System monitoring tools

htop

This tool used to not work with MacOS where I usually fall back to the “Activity Monitor”. It adds a nice and colorful user experience to the regular top executable. Useful for monitoring the CPU and memory consumption of processes.

A nice side usage to this command is to quickly check how many cores are available on your system:

ps

ps views the processes running on the system. Its most basic usage would be:

# full list of processes with detailed information (BSD style)
ps aux

I also have few aliases at hand:

alias psu="ps -U $USER"
alias psuf="ps -f -U $USER"

There is this website that presents the tool well.
I particularly like the following snippet:

# Linux version
watch -n 1 'ps -e -o pid,uname,cmd,pmem,pcpu --sort=-pmem,-pcpu | head -15'
# MacOS version
watch -n1 'ps -e -o pid,user,%cpu,%mem,comm  -r | head -15'

screen

I have a very basic usage of screen that I launch in order to run CPU intensive or time consuming processes on a remote computer. I am not really getting how tmux is better or worse and to be honest, I am not sure I care :)

sudo

xkcd 149

dmesg

I try to spend my time doing interesting stuff, but in case I break anything, that’s where the kernel logs its messages.

telnet

I like to use telnet and check whether a port (e.g. ssh) is open on a remote server. It is a good backup option when the server is configured to not reply to ping.

nc

nc (netcat) is the Swiss Army knife when you work with sockets. I use it to check that dump1090 or radarcape really produces data, or to infer which port to listen to.

It may also be useful in piped version to redirect traffic to a different computer:

# from a remote machine (ssh) when I can't see server1 from laptop2
nc server1 port1 | nc laptop2 port2

✯ Disk usage monitoring and managing tools

tree

tree lists contents of a directory in tree-like format. It has neat options like showing only directories

# it has a neat option for showing only directories
tree -d
# or for limiting itself to a depth level
tree -L 2

ncdu

du (for disk usage) is the historic command line tool to check the space taken by directories in your arborescence. ncdu adds a netcurses interface to the tool and makes it easy to identify which files are clogging your hard disk and where they are hiding.

file

file returns information for the given file.

file logo.png
> PNG image data, 16 x 16, 8-bit/color RGBA, non-interlaced

The MacOS complement mdls produces a very rich output with specific metadata attributes, most of which are accessible otherwise from the Finder information window.

pdfjoin

The pdfjam suite is a set of great tool to manipulate PDF files. I mostly use pdfjoin to merge several scans together.

pdfjoin -o output.pdf in1.pdf in2.pdf in3.pdf

pdftk

The pdftk (pdf toolkit) toolset is also great tool for similar tasks though it is hard to install on MacOS.

pdftk in1.pdf in2.pdf in3.pdf cat output out1.pdf

I sometimes use the same tool to uncompress pdf files and remove ugly annoying watermarks in ebooks legally purchased:

pdftk book.pdf output uncompressed.pdf uncompress
# and after editing the file
pdftk uncompressed.pdf output clean.pdf compress

Or to rotate pages:

pdftk in.pdf cat 1-endwest output out.pdf

pdfrename

This tool is not part of any suite. It is just one that I developed because of the frustration caused by opening tons of pdf files (mainly scientific publications) in order to copy-paste the title and rename the file accordingly. I cannot say it is a mature piece of software, but it cover my needs.

The downside of the story is that it is based on a library, pdfminer, which does not support Python 3.

pip install --user pdfrename

Source code and explanations available here.

zsh

I never really enjoyed bash as much as I wanted and tried many different shells from tcsh in the early days to more modern alternatives like fish. In the end I am happy with zsh with has been the best compromise between convenience and availability on different computers.

I like its syntax very close to bash (only one tool to learn!) and its features:

  • easily customisable prompts, with nice “defaults” in oh-my-zsh;
  • powerful globbing:
    cd d/d/<TAB> # expands to Documents/data/
    cd **/xoolive<TAB> # expands to repository/xoolive.github.io
    
  • spelling corrections;
  • easily customisable completion with navigation (use arrows);

Qualifiers are also of a great help, like in the following examples:

# show only directories
print -l zsh_demo/**/*(/)
# show only regular files
print -l zsh_demo/**/*(.)
# show empty files
ls -l zsh_demo/**/*(L0)
# show files greater than 3 KB
ls -l zsh_demo/**/*(Lk+3)
# show files modified in the last hour
print -l zsh_demo/**/*(mh-1)
# sort files from most to least recently modified and show the last 3
ls -l zsh_demo/**/*(om[1,3])

zmv

zmv is a zsh function for renaming files. First option to learn is -n which means no execution. Remove it when you are sure about what you typed.

Then there are many nice examples, including:

# rename file contents with its directory name as a prefix
zmv -n '(*)/(*.txt)' '${1}_$2'

# adding leading zeros to a filename (1.jpg -> 001.jpg, ..  
autoload zmv 
zmv -n '(<1->).jpg' '${(l:3::0:)1}.jpg'

# Change the suffix from *.sh to *.pl
autoload zmv 
zmv -n -W '*.sh' '*.pl'

convert

convert is part of the imagemagick suite and is a useful tool to resize, rotate or convert image formats.

My latest uses:

convert -background none -density 1000 -resize 1000x file.svg file.png
convert input.png -fuzz 10% -transparent white output.png

xdg-open

open is the magic command that opens anything under MacOS as if you double-clicked it in the file explorer.

xdg-open (Linux) behaves similarly by opening any file, url or directory using default applications. Configuration is made with .desktop files and a good keyword for help on the net is “MIME type”.

locate

locate helps you find files on your system by searching a local database. You can update the database by running (sudo) updatedb.

locate used to index the whole root arborescence but it is not a good practice since private data from different users would also be listed somewhere. You can produce a private index for your own home directory though.

alias pupdatedb="updatedb -l 0 -U $HOME --output=$HOME/.mydb.db"
alias plocate="locate -d $HOME/.mydb.db"

mdfind

mdfind is the MacOS interface to the Spotlight search engine. MacOS runs a daemon to index all files on the system. mdfind searches for patterns inside files; the -name option limits the search to file names.

# the five latest modified files (limited to the past 5 days)
mdfind -onlyin ~/Pictures 'kMDItemPixelCount > 15000000'

You can also limit the search to specific directories with -onlyin or work in real-time mode with the -live option.

# yields the number of files matching the criteria
mdfind -live 'kMDItemContentModificationDate >= $time.now'
[Type ctrl-C to exit]
Query update: 1 matches
Query update: 2 matches

A note on metadata attributes, also to be found from the output of mdls .

unpack

This function found in the Data Science at the Command Line repository appears handy to uncompress many kinds of archives.

I made it a zsh function and added the proper completion definition:

 11:35:55 > xo@lushlife > ...Documents/data/cats_dogs >
$ unpack <TAB>
preview/   test1.zip  train/     train.zip  validate/

✯ Tools for software development

type

type is a shell built-in command which gives information about commands.
Common options are -p, -S and -a.

$ type type
type is a shell builtin
$ type -p java  # similar to which
java is /usr/bin/java
$ type -S java  # follows symlinks
java is /usr/bin/java -> /etc/alternatives/java ->
/usr/lib/jvm/java-8-oracle/jre/bin/java
$ type -a java
java is /usr/bin/java
java is /usr/lib/jvm/java-8-oracle/bin/java
java is /usr/lib/jvm/java-8-oracle/jre/bin/java

ldd

ldd resolves shared object dependencies.

nm

nm is convenient with -C option for demangling C++ symbols.
It lists symbols present in compiled object files.

git

I just version any code I write. I sometimes publish it to github.com.

python

Python can be a great option for writing shell scripts. Once stable, I usually wrap them with entry points through setuptools. pdfrename shows a good example on how to do it.

setup(name='pdfrename',
      version='0.1',
      description='A tool for renaming batches of pdf files',
      license='MIT',
      author='Xavier Olive',
      packages=['pdfrename', ],
      entry_points={'console_scripts':
                    ['pdfrename = pdfrename.pdfrename:main']},
      install_requires=['pdfminer'],
      url="https://github.com/xoolive/pdfrename",
      )

On my system, these tools get installed to ~/.local/bin.

python setup.py install --user

virtualenv

I consider it a good practice to use virtual environments with Python. I am usually not using Anaconda for personal use (I use it with conda environments for teaching though).

virtualenv $HOME/.virtualenv/env_name -p python3.6

Then I have a zsh function with completion:

function activate {
    if [[ -z $1 ]]; then
        echo "Usage: activate [virtualenv name]"
        return
    fi
    if [[ -d $HOME/.virtualenv/$1 ]]; then
        source $HOME/.virtualenv/$1/bin/activate
    else
        echo "$1 not found"
    fi
}

local envdirs
envdirs=($HOME/.virtualenv)
compdef '_files -W envdirs' activate

docker

docker offers a great solution for software containers. I am not as proficient as I could be with this tool but I definitely should practice it more when I share my work.

✯ Chaining tools

tee

tee is a core util which redirects output to multiple files and processes.

I like how it is used in the “vim sudo tip”:

:w !sudo tee %

sed

The most common usage of sed is for substitution, but there are many other one-line applications here with explanations here.

bc

The legacy bad ass “Bash calculator”. Just look at the potentially infinite precision and not bad calculation time.

time bc -l -q <<< 'scale=1000; 4*a(1)'
3.141592653589793238462643383279502884197169399375105820974944592307\
81640628620899862803482534211706798214808651328230664709384460955058\
22317253594081284811174502841027019385211055596446229489549303819644\
28810975665933446128475648233786783165271201909145648566923460348610\
45432664821339360726024914127372458700660631558817488152092096282925\
40917153643678925903600113305305488204665213841469519415116094330572\
70365759591953092186117381932611793105118548074462379962749567351885\
75272489122793818301194912983367336244065664308602139494639522473719\
07021798609437027705392171762931767523846748184676694051320005681271\
45263560827785771342757789609173637178721468440901224953430146549585\
37105079227968925892354201995611212902196086403441815981362977477130\
99605187072113499999983729780499510597317328160963185950244594553469\
08302642522308253344685035261931188171010003137838752886587533208381\
42061717766914730359825349042875546873115956286388235378759375195778\
18577805321712268066130019278766111959092164201988

real	0m0.377s
user	0m0.373s
sys	0m0.004s
time bc -l -q <<< "12345^678 * 2345^837"
23787733548109912264376074416331748074657906964298941301627587525693\
01285513544048395617441770852568324804702160182707810817328853549660\
82950375070028531380241959076006929175997467242709393002360568230832\
70747914352492032824120065268399072567321910976912168862237535699668\
98738881080150116546367294856554927969060531729481978589874659311982\
[trimmed]

real	0m0.037s
user	0m0.031s
sys	0m0.004s

✯ Tools left in the cardboards

gnuplot

gnuplot is a great tool for quickly plotting data, that I used a lot in the old days.
I even wrote a blog post about it back then!

I enjoyed the various backends available, with an output to import directly in LaTeX. Last time I used it must have been during a C programming class about Horner’s method. Students had to compute a sine and cosine and I asked them to plot a circle from the stdout output.

Today, I use Python on a daily basis with its good yet perfectible matplotlib package.
In LaTeX, I enjoy tikz/pgfplots though it led to a bad experience with scientific editors.

qemu

qemu is a very impressive/efficient piece of software for emulating different operating systems without rebooting. But in the end, virtualbox is more convenient for a basic Windows usage and docker is great for emulating a freshly installed (Linux) system.

chroot

This command changes the apparent root directory in the current terminal. I only used it in the past to reset my root password after booting on a live CD with a different distro. Last time I had to hack myself, I rather used the following “trick” (validated by tech support!).

awk

I learned awk and sed in the early days but I just don’t need it anymore.
For basic tasks, cut and sed are often enough. If I need anything more, I just switch to Python.

find

This is the basic Unix tool for finding files you hardly remember about on your computer, but locate/mdfind and the zsh comprehensive globbing syntax are enough most of the time.

# basic usage
find . -name 'something'
# something more complicated (but does not work on MacOS)
find . -printf '%T+ %p\n' | sort -r | head
# a zsh alternative
# list all files in current (sub-)directories modified in the past hour
ls **/*(.mh-1)

# MacOS only
# the five latest modified files (limited to the past 5 days)
mdfind -0 -onlyin . 'kMDItemContentModificationDate >= $time.today(-5)' |
xargs -0 stat -f "%m%t%Sm %N" | sort -rn | cut -f2- | head -n 5

A note on metadata attributes, also to be found from the output of mdls

which

All I need is covered by type. In particular, I keep an alias function in my settings.
The result is a bit different from which as it follows all symbolic links.

function twhich {
    output=$(type -S $1) || {echo >&2 "$output" && return 1;}
    echo "$output" | tr ' ' '\n' | tail -n 1
}

✯ A nice reference