The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::EvalServerAdvanced::Seccomp - Use of Seccomp to create a safe execution environment

DESCRIPTION

This is a rule generator for setting up Linux::Seccomp rules. It's used internally only, and it's API is not given any consideration for backwards compatibility. It is however useful to look at the source directly.

YAML

The yaml config file for seccomp contains two main sections, profiles and constants

CONSTANTS

    constants:
      plugins:
        - 'POSIX'
        - 'LinuxClone'
      values:
        TCGETS: 0x5401
        FIOCLEX: 0x5451
        FIONBIO: 0x5421
        TIOCGPTN: 0x80045430

This section is fairly simple with two sections of it's own plugins and values

values

Just a key value list of various names for constant values to be used later. This lets you define anything not already coming from a plugin, and avoid undocumented magic numbers in your rules. Ideally you should make sure that these come from the proper header files or documentation so that any architecture change doesn't cause the values to change.

Valid ways to represent the values are as follows:

hex

Standard perl syntax 0x0123456789_ABCDEF. Case insensitive, underscores allowed for readability.

binary

Standard perl syntax for binary values 0b1111_0000, case insensitive, underscores allowed for readability.

octal

Standard perl syntax, and YAML allowed syntax for octal values. 0777 and 0o777 are both valid. underscores allowed for readability.

decimal integers

Normal base ten integers. 1234567890, cannot begin with a 0. underscores allowed for readability.

plugins

Right now there's only two plugins provided with the distrobution, App::EvalServerAdvanced::Seccomp::Plugin::Constants::POSIX and App::EvalServerAdvanced::Seccomp::Plugin::Constants::LinuxClone. These two plugins pull constants from the POSIX and Linux::Clone modules respectively. This way things like O_EXCL and CLONE_NEWNS should always be correct for the platform you run on regardless of the kernel version. That said, they're unlikely to ever change anyway.

Plugins can be loaded by a short name as demonstrated above. It will first attempt to load them from the configured plugin base in the App::EvalServerAdvanced configuration file. If it finds it by the short name there (e.g. - 'MyPlugin' will become MyPlugin.pm) then all is fine. If it's not found then it will try to load it from @INC under the fully qualified namespace App::EvalServerAdvanced::Seccomp::Plugin::Constants::$SHORTNAME. You can also specify the full name of the module under the namespace and it will only load it from @INC.

profiles

    profiles:
      default:
        include:
          - time_calls
          - file_readonly
          - stdio
          - exec_wrapper
          - file_write
          - file_tty
          - file_opendir
          - file_temp
        rules:
    # Memory related calls
          - syscall: mmap
          - syscall: munmap
          - syscall: mremap
          - syscall: mprotect
          - syscall: madvise
          - syscall: brk

Profiles are the most important part of setting up seccomp. They are a whitelist of what programs in the sandbox are allowed to do. Anything not specified results in the termination of the process. A profile consists of a name, child profiles, and a set of rules to follow.

Profile name

Name for the profile. Any valid string can be used for the name. default is expected to exist, but if all languages in the config specify a profile then you can avoid having one named default. They are case sensitive, no other restrictions apply.

includes

A list of profiles that should be included into this one at runtime. This is useful for organizing rules into basic actions and letting you compose them into a logical groups to handle programs.

rules

This is a list of the syscalls to be allowed. See the "Rule definitions" section for details.

rule_generator

Use a plugin to generate the rules at runtime. Use a string such as "ExecWrapper::gen_exec_wrapper". It will then load the plugin ExecWrapper and call the method gen_exec_wrapper on it. It will be passed the App::EvalServerAdvanced::Seccomp object and be expected to return a set of rules to be used. Best to see the source code of App::EvalServerAdvanced::Seccomp::Plugin::ExecWrapper to see just how this works currently.

This is useful for handling some edge cases with Seccomp. Since Seccomp can't dereference pointers you can't actually handle system calls that contain them fully effectively. But what you can do is limit the specific pointers that are allowed to be passed to the system calls instead. In the ExecWrapper plugin this gets used to setup rules for the execve syscall to be allowed to be called with strings from the config singleton object inside the server. This lets you exec(...) only to specific interpreters/binaries with very little security impact after the execve call happens. It does mean that you can put a new string at those addresses and run execve again but with ASLR doing so is almost impossible as long as the seccomp syscall is not allowed to be used to get the existing eBPF program for examination.

permute
    file_write:
      include:
       - 'file_open'
      permute:
        open_modes:
          - 'O_CREAT'
          - 'O_WRONLY'
          - 'O_TRUNC'
          - 'O_RDWR'
    file_open:
      rules:
        - syscall: open
          tests:
            - [1, '==', '{{open_modes}}']

This gets used to specify flags for a syscall to use. In the example above for the file_write profile, it says that the flags O_CREAT, O_WRONLY, O_TRUNC and O_RDWR should be allowed for the permutation named open_modes. In the file_open profile, we define a syscall that open that can take any combination of the flags from open_modes by specifying the value with '{{open_modes}}'. See "Rule definitions" for more information.

Rule definitions

    file_open:
      rules:
        - syscall: open
          tests:
            - [1, '==', '{{open_modes}}']
        - syscall: openat
          tests:
            - [2, '==', '{{open_modes}}']
        - syscall: close

Rules consist of a few attributes that specify what you're allowed to do.

syscall

The most important part of a rule, without it you will end up with a fatal error. Best practice is to specify the syscall by name, i.e. open or openat. It will be resolved at runtime using the syscall map of the system automatically, so that you don't have to know the number of the syscalls. If however there's a syscall that doesn't want to resolve for you, you can specify it by number, but this is not recommended as it will be architecture dependant and cause problems if you change architectures (i.e. from x86_64 to i386).

action
tests

This is probably the least elegant part of the config file, but I couldn't come up with a better setup/syntax for it. This is a list of tests, all of which must pass, for the given syscall.

Each test is an array of three things, [argument, operator, value].

argument

Which argument to the syscall you want to test. Starting from 0 being the first argument.

operator

What operator to use for the test: == != >= <= < > or =~

The =~ operator takes the argument to the syscall and uses the value as a bit mask. It passes if all the bits from the mask are set in the argument, ignoring any not present in the mask.

value

This is the value you want to test for. It can be either a literal integer value or it can be a string containing a set of constants and bitwise operations. It uses App::EvalServerAdvanced::ConstantCalc to do the math and you should look at that for the exact operations supported.

Some examples

    'O_CLOEXEC|O_EXCL|O_RDWR'

Also supported are using automatically permutated values by using a string like '{{ open_modes }}'. In this case all possible values will be pre-generated and substituted into the rule to allow any valid set of flags in a syscall

SECURITY

This is an excercise in defense in depths. The default rulesets provide a bit of protection against accidentally running knowingly dangerous syscalls.

This does not provide absolute security. It relies on the fact that the syscalls allowed are likely to be safe, or commonly required for normal programs to function properly.

In particular there are two syscalls that are allowed that are involved in the Dirty COW kernel exploit. madvise and mmap, with these two you can actually trigger the Dirty COW exploit. But because the default rules restrict you from creating threads, you can't create the race condition needed to actually accomplish it. So you should still take some other measures to protect yourself.

KNOWN ISSUES

Compilation errors when loading plugins from the plugin base directory will result in it attempting to load the fully qualified module name. This will be fixed in future versions to be a fatal error

AUTHOR

Ryan Voots <simcop@cpan.org>