S2argv-execs: something was missing in libc, and now it is even easier

Da raspibo.
Jump to navigation Jump to search

Posix standard provides a number of functions to execute a file. It is the exec family.

Some of them can take a variable number of args, others use an array of strings. The formers have a name having an 'l' in the suffix, the latters a 'v'. There are also flavours with an 'e' suffix to define a specific environment set of variables.

But what do you do if you have a string including all the arguments separated by spaces? This is the case, for instance, when the command has been provided by the user at an input prompt or read from a file.

It happened to me dozens of times.

There are two naive solutions. strtok and system. strtok can be usen to tokenize the string, system delegates the problem to a shell. Both solutions are unsatisfactory to me. system is unsafe (it is stated also on its own man page!). strtok use spaces to split the string into pieces, modify the original string, do not support quotation or escape characters, It is also very annoying to use strtok to parse the arguments for an exec.

Then often there is the need to run a predefined command from a C program. I think that a lot of programmers prefer to write something like:

     system("ls -l");

rather than:

     pid_t pid;
     int status;
     switch (pid=fork()) {
       case -1: //error
                break;
       case 0:  execl("/bin/ls","ls","-l",(char *0);
                _exit(127);
       default: waitpid(pid, &status, 0)
     }

Unfortunately the former is nicer but unsafe and requires a lot of useless resources. In fact system runs an entire shell to run the command and this has a severe memory impact and opens the road to several kinds of threats.

I needed a clean solution to parse the arguments from a string.

So I decided to develop the libs2argv library ten monthes ago, and now I have just updated it to make it simpler and more useful. The new version generates two libraries:

  • libs2argv: the complete library including several useful features
  • libexecs: the minimal library for embedded systems or memory critical situations.

It is not a historical achievement in the story of computing, but it is something that was missing and I felt the need for.

The code has been released on GitHub under LGPL2.1+.

The API of the library is clean and straightforward.

Now there are several possibilities.

s2argv

convert the string in a dynamically allocated argv using s2argv (string to argv):

 char **s2argv(const char *args, int *pargc);

The return value is a valid argv, if pargc is non-NULL, the number of args, i.e. the argc, is stored in the integer field pointed to by pargc. s2argv uses dynamic allocation and does not modify its input parameter args.

s2argv tokenize the arguments using spaces (blanks, tabs or newlines) but it also supports simple and double quotation to specify arguments including spaces, and the escape char \.

Usually an argv is used as an argument to an execv. If the execv function succeeds, it is useless to deallocate the argv, all the memory of the calling program has been deallocated anyway. But sometimes if the execv fails the program has to do something else and the argv should be deallocated. For that the library provides the following function:

 void s2argv_free(char **argv);

So a code snippet to execute a file from a string (buf in this example) is the following:

                char **argv=s2argv(buf, NULL);
                execvp(argv[0], argv);
                s2argv_free(argv);
                printf("exec error\n");

execs and siblings

The library provides four functions (to tell the truth most of them are macros, but this is just an implementation detail, irrelevant for users):

      int execs(const char *path, const char *args);
      int execse(const char *path, const char *args, *const envp[]);
      int execsp(const char *args);
      int execspe(const char *args, char *const envp[]);

they have been designed as the "parse args from a string" counterpart of execv, execve, execvp and execvpe respectively.

These functions do not use dynamic allocation, they need to store a copy of the args string on the stack. s2argv is preferable if the same command has to be used several times as argv is parsed only once.

The return value and error cases are the same as execv, execve.

execsp and execspe use the parsed argv[0] as the filename of the executable (likewise execvp and execvpe, the file is sought for in all the directories listed in the FILE environment variable).

Here is a chunk of code using execsp:

           char buf[BUFLEN];
           printf("type in a command and its arguments, e.g. 'ls -l'\n");
           if (fgets(buf, BUFLEN, stdin) != NULL) {
                execsp(buf);
                printf("exec error\n");
           }

execs*_nocopy

The functions (or macros):

      int execs_nocopy(const char *path, char *args);
      int execse_nocopy(const char *path, char *args, *const envp[]);
      int execsp_nocopy(char *args);
      int execspe_nocopy(char *args, char *const envp[]);

do not allocate extra space on the stack as they parse the command string args on itself. Thus the original value of args is lost. These functions have been designed for embedded systems or in general when the memory footprint of the application is critical. It is not possible to use string constants as args parameter for the _nocopy functions.

system replacement

The system function provided by the libc is very useful as easy to use. It is a real pity that it is unsafe. The new libs2argv-execs include a number of functions (or macros) to overcome this problem.

       int system_noshell(const char *command);
       int system_execsp(const char *command);
       int system_execs(const char *path, const char *command);
       int system_execsrp(const char *command, int *redir);
       int system_execsr(const char *path, const char *command, int *redir);

These functions run a commands specified along with its parameters in a string, as system does. The difference is that these functions do not run a shell to start the command. They requires less resources, they are faster, and they are more secure than system. system_noshell and system_execsp are synonyms, and are almost drop in replacement for system.

They support the argument parsing including the management of quoting (single, double quotes and backslash). They do not support evironment variable substitution, command substitution, > and < redirection, | pipes etc.

The name system_noshell is easier to remember than the more naming consistent system_execsp.

system_noshell or system_execsp use the PATH environment variable to search the executable file, so they can be still unsafe for setuid programs. system_execs requires the path of the executable to be specified as a separate arg, solving this latter security problem.

system_execsrp and system_execsr are extensions to system_execsp and system_execs providing support for standard streams redirection. (In fact, the 'r' stands for redirection). These function have a trailing redir argument, which is an array of three int variables. The standard input of the command will be ridirected to redir[0] if it is positive, the standard output will be redirected to redir[1] provided it is non negative and not one, in the same way the standard error will be redirected to redir[2] if it is non negative and not two.

popen replacement

The new library provides also safe alternatives to popen/pclose:

      FILE *popen_noshell(const char *command, const char *type);
      pclose_noshell(FILE *stream);
      FILE *popen_execsp(const char *command, const char *type);
      pclose_execsp(FILE *stream);
      FILE *popen_execs(const char *path, const char *command, const char *type);
      int pclose_execs(FILE *stream);

popen_noshell/pclose_noshell are almost drop in replacement for popen/pclose. Clearly they do not use any shell, thus they are more efficient and safe. popen_noshell/pclose_noshell are synonyms of popen_execsp/pclose_execsp, as in system_noshell above these names have been added as they are easier to remind. popen_execs has an extra argument to specify the path of the executable file, this is safer as it avoids the search using the PATH environment variable.

coprocessing

The new s2argv-execs library provide a number of functions to support coprocessing, i.e. the execution of a command fully controlled by the calling program. Both the standard input and the standard output of the new process will be redirected to pipes such that the calling program can provide its input and read its output.

  pid_t coprocv(const char *path, char *const argv[], int pipefd[2]);
  pid_t coprocve(const char *path, char *const argv[], char *const envp[], int pipefd[2]);
  pid_t coprocvp(const char *file, char *const argv[], int pipefd[2]);
  pid_t coprocvpe(const char *file, char *const argv[], char *const envp[], int pipefd[2]);
  pid_t coprocs(const char *path, const char *command, int pipefd[2]);
  pid_t coprocse(const char *path, const char *command, char *const envp[], int pipefd[2]);
  pid_t coprocsp(const char *command, int pipefd[2]);
  pid_t coprocspe(const char *command, char *const envp[], int pipefd[2]);

The functions (macros) have different parameters:

  • those having v in their suffix use the argument array argv as the execv* functions;
  • functions with a s parse the arguments of the command from the command string;
  • if the suffix includes a p the function specifies the pathname of the executable file, otherwise the executable is sought for through the directories listed in the PATH environment variable;
  • e means that there is an extra argument to redefine the environment.

Here is an example of a coprocess:

#include <s2argv.h>

int main() {
  char buf[1024];
  int pfd[2];
  int n;
  coprocsp("cat",pfd);
  write(pfd[1],"hello\n",7);
  n=read(pfd[0],buf,1024);
  write(1,buf,n);
  write(pfd[1],"world\n",7);
  n=read(pfd[0],buf,1024);
  write(1,buf,n);
}

libexecs

libexecs is a minimal library for embedded systems or memory critical situations. It is a subset of libs2argv including all the execs* and system_nocopy.

The memory impact is limited both in terms of code to be loaded and in terms of stack usage.