Linux at the command line
As a research scientist in computer and data science, Linux is part of my daily work routine. I did the investment back in my students days (that would be a 15th anniversary!) and it has truly been worth it. Linux was a tad more unstable back in the days, but the terminal experience is still something I need on a daily basis. First as a developper, since developping a graphical user interface is a time consuming thankless task, but also as an end user to get a better control on my operating system and on various kinds of data produced by the devices that are have started to track accompany our lives in the past decade.
I will try to share here the tools I enjoy everyday and also reflect on tools I used to enjoy but which have become a bit irrelevant to me recently.
I did not intend to be comprehensive here, so this link may be a good reference if you need more about basic GNU core tools.
Index: awk, bc, chroot, convert, dmesg, docker, file, find, git, gnuplot, htop, ldd, locate, mdfind, nc, ncdu, nm, pdfjoin, pdfrename, pdftk, ps, python, qemu, screen, sed, sudo, tee, telnet, tree, type, unpack, virtualenv, which, xdg-open, zmv, zsh
(generated by:)
grep '^## ' 2018-01-04-linux-at-the-command-line.md | cut -f2- -d\ | sort |
sed 's,\(.*\),[\1](#\1), ' | paste -s | sed 's/\t/, /g'
Some commands come in two flavours: the BSD style and the UNIX style (not available in MacOS). As I am an avid user of both systems, I tried to collect snippets for both.
✯ System monitoring tools
htop
This tool used to not work with MacOS where I usually fall back to the “Activity Monitor”. It adds a nice and colorful user experience to the regular top
executable. Useful for monitoring the CPU and memory consumption of processes.
A nice side usage to this command is to quickly check how many cores are available on your system:
ps
ps views the processes running on the system. Its most basic usage would be:
# full list of processes with detailed information (BSD style)
ps aux
I also have few aliases at hand:
alias psu="ps -U $USER"
alias psuf="ps -f -U $USER"
There is this website that presents the tool well.
I particularly like the following snippet:
# Linux version
watch -n 1 'ps -e -o pid,uname,cmd,pmem,pcpu --sort=-pmem,-pcpu | head -15'
# MacOS version
watch -n1 'ps -e -o pid,user,%cpu,%mem,comm -r | head -15'
screen
I have a very basic usage of screen that I launch in order to run CPU intensive or time consuming processes on a remote computer. I am not really getting how tmux is better or worse and to be honest, I am not sure I care :)
sudo
dmesg
I try to spend my time doing interesting stuff, but in case I break anything, that’s where the kernel logs its messages.
telnet
I like to use telnet and check whether a port (e.g. ssh) is open on a remote server. It is a good backup option when the server is configured to not reply to ping.
nc
nc (netcat) is the Swiss Army knife when you work with sockets. I use it to check that dump1090 or radarcape really produces data, or to infer which port to listen to.
It may also be useful in piped version to redirect traffic to a different computer:
# from a remote machine (ssh) when I can't see server1 from laptop2
nc server1 port1 | nc laptop2 port2
✯ Disk usage monitoring and managing tools
tree
tree lists contents of a directory in tree-like format. It has neat options like showing only directories
# it has a neat option for showing only directories
tree -d
# or for limiting itself to a depth level
tree -L 2
ncdu
du
(for disk usage) is the historic command line tool to check the space taken by directories in your arborescence. ncdu
adds a netcurses interface to the tool and makes it easy to identify which files are clogging your hard disk and where they are hiding.
file
file returns information for the given file.
file logo.png
> PNG image data, 16 x 16, 8-bit/color RGBA, non-interlaced
The MacOS complement mdls
produces a very rich output with specific metadata attributes, most of which are accessible otherwise from the Finder information window.
pdfjoin
The pdfjam suite is a set of great tool to manipulate PDF files. I mostly use pdfjoin
to merge several scans together.
pdfjoin -o output.pdf in1.pdf in2.pdf in3.pdf
pdftk
The pdftk (pdf toolkit) toolset is also great tool for similar tasks though it is hard to install on MacOS.
pdftk in1.pdf in2.pdf in3.pdf cat output out1.pdf
I sometimes use the same tool to uncompress pdf files and remove ugly annoying watermarks in ebooks legally purchased:
pdftk book.pdf output uncompressed.pdf uncompress
# and after editing the file
pdftk uncompressed.pdf output clean.pdf compress
Or to rotate pages:
pdftk in.pdf cat 1-endwest output out.pdf
pdfrename
This tool is not part of any suite. It is just one that I developed because of the frustration caused by opening tons of pdf files (mainly scientific publications) in order to copy-paste the title and rename the file accordingly. I cannot say it is a mature piece of software, but it cover my needs.
The downside of the story is that it is based on a library, pdfminer, which does not support Python 3.
pip install --user pdfrename
Source code and explanations available here.
zsh
I never really enjoyed bash as much as I wanted and tried many different shells from tcsh in the early days to more modern alternatives like fish. In the end I am happy with zsh
with has been the best compromise between convenience and availability on different computers.
I like its syntax very close to bash (only one tool to learn!) and its features:
- easily customisable prompts, with nice “defaults” in oh-my-zsh;
- powerful globbing:
cd d/d/<TAB> # expands to Documents/data/ cd **/xoolive<TAB> # expands to repository/xoolive.github.io
- spelling corrections;
- easily customisable completion with navigation (use arrows);
Qualifiers are also of a great help, like in the following examples:
# show only directories
print -l zsh_demo/**/*(/)
# show only regular files
print -l zsh_demo/**/*(.)
# show empty files
ls -l zsh_demo/**/*(L0)
# show files greater than 3 KB
ls -l zsh_demo/**/*(Lk+3)
# show files modified in the last hour
print -l zsh_demo/**/*(mh-1)
# sort files from most to least recently modified and show the last 3
ls -l zsh_demo/**/*(om[1,3])
zmv
zmv is a zsh function for renaming files. First option to learn is -n
which means no execution. Remove it when you are sure about what you typed.
Then there are many nice examples, including:
# rename file contents with its directory name as a prefix
zmv -n '(*)/(*.txt)' '${1}_$2'
# adding leading zeros to a filename (1.jpg -> 001.jpg, ..
autoload zmv
zmv -n '(<1->).jpg' '${(l:3::0:)1}.jpg'
# Change the suffix from *.sh to *.pl
autoload zmv
zmv -n -W '*.sh' '*.pl'
convert
convert is part of the imagemagick suite and is a useful tool to resize, rotate or convert image formats.
My latest uses:
convert -background none -density 1000 -resize 1000x file.svg file.png
convert input.png -fuzz 10% -transparent white output.png
xdg-open
open is the magic command that opens anything under MacOS as if you double-clicked it in the file explorer.
xdg-open (Linux) behaves similarly by opening any file, url or directory using default applications. Configuration is made with .desktop files and a good keyword for help on the net is “MIME type”.
locate
locate helps you find files on your system by searching a local database. You can update the database by running (sudo) updatedb
.
locate used to index the whole root arborescence but it is not a good practice since private data from different users would also be listed somewhere. You can produce a private index for your own home directory though.
alias pupdatedb="updatedb -l 0 -U $HOME --output=$HOME/.mydb.db"
alias plocate="locate -d $HOME/.mydb.db"
mdfind
mdfind is the MacOS interface to the Spotlight search engine. MacOS runs a daemon to index all files on the system. mdfind searches for patterns inside files; the -name
option limits the search to file names.
# the five latest modified files (limited to the past 5 days)
mdfind -onlyin ~/Pictures 'kMDItemPixelCount > 15000000'
You can also limit the search to specific directories with -onlyin
or work in real-time mode with the -live
option.
# yields the number of files matching the criteria
mdfind -live 'kMDItemContentModificationDate >= $time.now'
[Type ctrl-C to exit]
Query update: 1 matches
Query update: 2 matches
A note on metadata attributes, also to be found from the output of mdls
.
unpack
This function found in the Data Science at the Command Line repository appears handy to uncompress many kinds of archives.
I made it a zsh function and added the proper completion definition:
11:35:55 > xo@lushlife > ...Documents/data/cats_dogs >
$ unpack <TAB>
preview/ test1.zip train/ train.zip validate/
✯ Tools for software development
type
type is a shell built-in command which gives information about commands.
Common options are -p
, -S
and -a
.
$ type type
type is a shell builtin
$ type -p java # similar to which
java is /usr/bin/java
$ type -S java # follows symlinks
java is /usr/bin/java -> /etc/alternatives/java ->
/usr/lib/jvm/java-8-oracle/jre/bin/java
$ type -a java
java is /usr/bin/java
java is /usr/lib/jvm/java-8-oracle/bin/java
java is /usr/lib/jvm/java-8-oracle/jre/bin/java
ldd
ldd resolves shared object dependencies.
nm
nm is convenient with -C
option for demangling C++ symbols.
It lists symbols present in compiled object files.
git
I just version any code I write. I sometimes publish it to github.com.
python
Python can be a great option for writing shell scripts. Once stable, I usually wrap them with entry points through setuptools
. pdfrename shows a good example on how to do it.
setup(name='pdfrename',
version='0.1',
description='A tool for renaming batches of pdf files',
license='MIT',
author='Xavier Olive',
packages=['pdfrename', ],
entry_points={'console_scripts':
['pdfrename = pdfrename.pdfrename:main']},
install_requires=['pdfminer'],
url="https://github.com/xoolive/pdfrename",
)
On my system, these tools get installed to ~/.local/bin
.
python setup.py install --user
virtualenv
I consider it a good practice to use virtual environments with Python. I am usually not using Anaconda for personal use (I use it with conda environments for teaching though).
virtualenv $HOME/.virtualenv/env_name -p python3.6
Then I have a zsh function with completion:
function activate {
if [[ -z $1 ]]; then
echo "Usage: activate [virtualenv name]"
return
fi
if [[ -d $HOME/.virtualenv/$1 ]]; then
source $HOME/.virtualenv/$1/bin/activate
else
echo "$1 not found"
fi
}
local envdirs
envdirs=($HOME/.virtualenv)
compdef '_files -W envdirs' activate
docker
docker offers a great solution for software containers. I am not as proficient as I could be with this tool but I definitely should practice it more when I share my work.
✯ Chaining tools
tee
tee is a core util which redirects output to multiple files and processes.
I like how it is used in the “vim sudo tip”:
:w !sudo tee %
sed
The most common usage of sed
is for substitution, but there are many other one-line applications here with explanations here.
bc
The legacy bad ass “Bash calculator”. Just look at the potentially infinite precision and not bad calculation time.
time bc -l -q <<< 'scale=1000; 4*a(1)'
3.141592653589793238462643383279502884197169399375105820974944592307\
81640628620899862803482534211706798214808651328230664709384460955058\
22317253594081284811174502841027019385211055596446229489549303819644\
28810975665933446128475648233786783165271201909145648566923460348610\
45432664821339360726024914127372458700660631558817488152092096282925\
40917153643678925903600113305305488204665213841469519415116094330572\
70365759591953092186117381932611793105118548074462379962749567351885\
75272489122793818301194912983367336244065664308602139494639522473719\
07021798609437027705392171762931767523846748184676694051320005681271\
45263560827785771342757789609173637178721468440901224953430146549585\
37105079227968925892354201995611212902196086403441815981362977477130\
99605187072113499999983729780499510597317328160963185950244594553469\
08302642522308253344685035261931188171010003137838752886587533208381\
42061717766914730359825349042875546873115956286388235378759375195778\
18577805321712268066130019278766111959092164201988
real 0m0.377s
user 0m0.373s
sys 0m0.004s
time bc -l -q <<< "12345^678 * 2345^837"
23787733548109912264376074416331748074657906964298941301627587525693\
01285513544048395617441770852568324804702160182707810817328853549660\
82950375070028531380241959076006929175997467242709393002360568230832\
70747914352492032824120065268399072567321910976912168862237535699668\
98738881080150116546367294856554927969060531729481978589874659311982\
[trimmed]
real 0m0.037s
user 0m0.031s
sys 0m0.004s
✯ Tools left in the cardboards
gnuplot
gnuplot is a great tool for quickly plotting data, that I used a lot in the old days.
I even wrote a blog post about it back then!
I enjoyed the various backends available, with an output to import directly in LaTeX. Last time I used it must have been during a C programming class about Horner’s method. Students had to compute a sine and cosine and I asked them to plot a circle from the stdout output.
Today, I use Python on a daily basis with its good yet perfectible matplotlib package.
In LaTeX, I enjoy tikz/pgfplots though it led to a bad experience with scientific editors.
qemu
qemu is a very impressive/efficient piece of software for emulating different operating systems without rebooting. But in the end, virtualbox is more convenient for a basic Windows usage and docker is great for emulating a freshly installed (Linux) system.
chroot
This command changes the apparent root directory in the current terminal. I only used it in the past to reset my root password after booting on a live CD with a different distro. Last time I had to hack myself, I rather used the following “trick” (validated by tech support!).
awk
I learned awk and sed in the early days but I just don’t need it anymore.
For basic tasks, cut and sed are often enough. If I need anything more, I just switch to Python.
find
This is the basic Unix tool for finding files you hardly remember about on your computer, but locate/mdfind and the zsh comprehensive globbing syntax are enough most of the time.
# basic usage
find . -name 'something'
# something more complicated (but does not work on MacOS)
find . -printf '%T+ %p\n' | sort -r | head
# a zsh alternative
# list all files in current (sub-)directories modified in the past hour
ls **/*(.mh-1)
# MacOS only
# the five latest modified files (limited to the past 5 days)
mdfind -0 -onlyin . 'kMDItemContentModificationDate >= $time.today(-5)' |
xargs -0 stat -f "%m%t%Sm %N" | sort -rn | cut -f2- | head -n 5
A note on metadata attributes, also to be found from the output of mdls
which
All I need is covered by type. In particular, I keep an alias function in my settings.
The result is a bit different from which
as it follows all symbolic links.
function twhich {
output=$(type -S $1) || {echo >&2 "$output" && return 1;}
echo "$output" | tr ' ' '\n' | tail -n 1
}
✯ A nice reference
- Data science on the command line
http://datascienceatthecommandline.com/