Semantics-driven static analysis is being proposed by a group of researchers as way to ensure that Unix, Linux, and macOS shell programs are safe, bug-free, and work as expected. However, the effort faces unique challenges, due to the shell’s “pervasive dynamicity” and “opaque, polyglot commands.”
The researchers from Brown University, Stevens Institute of Technology, Rice University, and UCLA make their case in a newly published paper, “From Ahead-of- to Just-in-Time and Back Again: Static Analysis for Unix Shell Programs.” The authors stress that shell programming is as prevalent as ever but is quite complex due in part to the structure of shell programs, their use of opaque software components, and their complex interactions with the broader environment. Even when being extremely careful, shell developers discover devastating bugs in their programs only at runtime. At best, shell programs going wrong crash the execution of a long-running task; at worst, they silently corrupt the broader execution environment, affecting user data, modifying system files, and rendering entire systems unusable, the paper notes. The paper then asks if shell users could enjoy the benefits of semantics-driven static analysis before their programs’ execution, as offered by most other production languages? These benefits would extend to users of Linux, the BSD operating systems (FreeBSD, OpenBSD, and NetBSD), macOS, and anywhere the shell is used including containers and Windows Subsystem for Linux.
Shell scripting is very common, as the shell remains the glue that holds modern systems together; modern facilities such as continuous integration and continuous delivery (CI/CD) are often written in shell, said paper co-author Nikos Vasilakis, from Brown University, in an emailed response to questions. Other popular environments used for tasks such as building software, serving machine learning workloads, and provisioning the cloud are all thin wrappers around scripts, Vasilakis added. However, the shell language does not behave like other languages, he said. This leaves both inexperienced and seasoned users making many mistakes, with these mistakes tending to be catastrophic. “And because the shell is an old language, it lacks many of the facilities we’ve come to expect in modern languages,” Vasilakis said. “What’s more, the shell is used to manipulate programs on files on live systems. Mistakes can cause data corruption, service interruption, irreversible data loss, and leakage of sensitive user information.”