Posted in Software Engineering

SSH Tunneling – port forwarding

SSH tunneling (also referred to as SSH port forwarding) is simply routing local network traffic through SSH to remote hosts. This implies that all your connections are secured using encryption. It provides an easy way of setting up a basic VPN (Virtual Private Network), useful for connecting to private networks over unsecure public networks like the Internet.

You may also be used to expose local servers behind NATs and firewalls to the Internet over secure tunnels, as implemented in ngrok.

SSH sessions permit tunneling network connections by default and there are three types of SSH port forwarding: localremote and dynamic port forwarding.

In this article, we will demonstrate how to quickly and easily setup a SSH tunneling or the different types of port forwarding in Linux.

Testing Environment:

For the purpose of this article, we are using the following setup:

  1. Local Host: 192.168.43.31
  2. Remote HostLinode CentOS 7 VPS with hostname server1.example.com.

Usually, you can securely connect to a remote server using SSH as follows. In this example, I have configured passwordless SSH login between my local and remote hosts, so it has not asked for user admin’s password.

$ ssh admin@server1.example.com  
Connect Remote SSH Without Password
Connect Remote SSH Without Password

Local SSH Port Forwarding

This type of port forwarding lets you connect from your local computer to a remote server. Assuming you are behind a restrictive firewall, or blocked by an outgoing firewall from accessing an application running on port 3000 on your remote server.

You can forward a local port (e.g 8080) which you can then use to access the application locally as follows. The -L flag defines the port forwarded to the remote host and remote port.

$ ssh admin@server1.example.com -L 8080: server1.example.com:3000

Adding the -N flag means do not execute a remote command, you will not get a shell in this case.

$ ssh -N admin@server1.example.com -L 8080: server1.example.com:3000

The -f switch instructs ssh to run in the background.

$ ssh -f -N admin@server1.example.com -L 8080: server1.example.com:3000

Now, on your local machine, open a browser, instead of accessing the remote application using the address server1.example.com:3000, you can simply use localhost:8080 or 192.168.43.31:8080, as shown in the screenshot below.

Access a Remote App via Local SSH Port Forwarding
Access a Remote App via Local SSH Port Forwarding

Remote SSH Port Forwarding

Remote port forwarding allows you to connect from your remote machine to the local computer. By default, SSH does not permit remote port forwarding. You can enable this using the GatewayPorts directive in you SSHD main configuration file /etc/ssh/sshd_config on the remote host.

Open the file for editing using your favorite command line editor.

$ sudo vim /etc/ssh/sshd_config 

Look for the required directive, uncomment it and set its value to yes, as shown in the screenshot.

GatewayPorts yes
Enable Remote SSH Port Forwarding
Enable Remote SSH Port Forwarding

Save the changes and exit. Next, you need to restart sshd to apply the recent change you made.

$ sudo systemctl restart sshd
OR
$ sudo service sshd restart 

Next run the following command to forward port 5000 on the remote machine to port 3000 on the local machine.

$ ssh -f -N admin@server1.example.com -R 5000:localhost:3000

Once you understand this method of tunneling, you can easily and securely expose a local development server, especially behind NATs and firewalls to the Internet over secure tunnels. Tunnels such as Ngrokpagekitelocaltunnel and many others work in a similar way.

Dynamic SSH Port Forwarding

This is the third type of port forwarding. Unlike local and remote port forwarding which allow communication with a single port, it makes possible, a full range of TCP communications across a range of ports. Dynamic port forwarding sets up your machine as a SOCKS proxy server which listens on port 1080, by default.

For starters, SOCKS is an Internet protocol that defines how a client can connect to a server via a proxy server (SSH in this case). You can enable dynamic port forwarding using the -D option.

The following command will start a SOCKS proxy on port 1080 allowing you to connect to the remote host.

$ ssh -f -N -D 1080 admin@server1.example.com

From now on, you can make applications on your machine use this SSH proxy server by editing their settings and configuring them to use it, to connect to your remote server. Note that the SOCKS proxy will stop working after you close your SSH session.

Posted in Software Engineering

Redshift


In Redshift, if your query operation hangs or stops responding, below are the possible causes as well as its corresponding solution:

Connection to the Database Is Dropped

Reduce the size of maximum transmission unit (MTU). The MTU size determines the maximum size, in bytes, of a packet that can be transferred in one Ethernet frame over your network connection

Connection to the Database Times Out

Your client connection to the database appears to hang or timeout when running long queries, such as a COPY command. In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query. The results of the query might be missing or incomplete depending on when the connection stopped. This effect happens when idle connections are terminated by an intermediate network component

Client-Side Out-of-Memory Error Occurs with ODBC

If your client application uses an ODBC connection and your query creates a result set that is too large to fit in memory, you can stream the result set to your client application by using a cursor. For more information, see DECLARE and Performance Considerations When Using Cursors.

Client-Side Out-of-Memory Error Occurs with JDBC

When you attempt to retrieve large result sets over a JDBC connection, you might encounter client-side out-of-memory errors.

There Is a Potential Deadlock

If there is a potential deadlock, try the following:

– View the STV_LOCKS and STL_TR_CONFLICT system tables to find conflicts involving updates to more than one table.

– Use the PG_CANCEL_BACKEND function to cancel one or more conflicting queries.

– Use the PG_TERMINATE_BACKEND function to terminate a session, which forces any currently running transactions in the terminated session to release all locks and roll back the transaction.

– Schedule concurrent write operations carefully.

Posted in Software Engineering

AWS – System Manager

AWS System manager has powerful features to manages our EC2 – instances, following are the overview

AWS Systems Manager Patch Manager

AWS Systems Manager Patch Manager automates the process of patching managed instances with security-related updates. For Linux-based instances, we can also install patches for non-security updates. We can patch fleets of Amazon EC2 instances or our on-premises servers and virtual machines (VMs) by operating system type. This includes supported versions of Windows, Ubuntu Server, Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Amazon Linux, and Amazon Linux 2. We can scan instances to see only a report of missing patches, or we can scan and automatically install all missing patches.

AWS Systems Manager State Manager

AWS Systems Manager State Manager is primarily used as a secure and scalable configuration management service that automates the process of keeping our Amazon EC2 and hybrid infrastructure in a state that our define. This does not handle patch management, unlike AWS Systems Manager Patch Manager. With the State Manager, we can configure our instances to boot with a specific software at start-up; download and update agents on a defined schedule; configure network settings and many others, but not the patching of our EC2 instances.

AWS Systems Manager Session Manager

 AWS Systems Manager Session Manager is primarily used to comply with corporate policies that require controlled access to instances, strict security practices, and fully auditable logs with instance access details but not for applying OS patches.

AWS Systems Manager Maintenance Windows

AWS Systems Manager Maintenance Windows let us define a schedule for when to perform potentially disruptive actions on our instances such as patching an operating system, updating drivers, or installing software or patches. Each Maintenance Window has a schedule, a maximum duration, a set of registered targets (the instances that are acted upon), and a set of registered tasks. We can also specify dates that a Maintenance Window should not run before or after, and we can specify the international time zone on which to base the Maintenance Window schedule.

Posted in Software Engineering

AWS – DNSSEC

TL DR

For the web domain registration, use Amazon Route 53 and then register a 2048-bit RSASHA256 encryption key from a third-party certificate service. Enable Domain Name System Security Extensions (DNSSEC) by using a 3rd party DNS provider that uses customer managed keys. Register the SSL certificates in ACM and attach them to the Application Load Balancer. Configure the Server Name Identification extension in all user requests to the website.

Details

Attackers sometimes hijack traffic to internet endpoints such as web servers by intercepting DNS queries and returning their own IP addresses to DNS resolvers in place of the actual IP addresses for those endpoints. Users are then routed to the IP addresses provided by the attackers in the spoofed response, for example, to fake websites. You can protect your domain from this type of attack, known as DNS spoofing or a man-in-the-middle attack, by configuring Domain Name System Security Extensions (DNSSEC), a protocol for securing DNS traffic.

AWS Route 53 – Registered Domain

Amazon Route 53 supports DNSSEC for domain registration. However, Route 53 does not support DNSSEC for DNS service, regardless of whether the domain is registered with Route 53. If you want to configure DNSSEC for a domain that is registered with Route 53, you must either use another DNS service provider or set up your own DNS server.

Posted in Software Engineering

What is Backpressure?

Backpressuring is simply the process of handling a fast item producer. If an Observable produces 1_000_000 items per second how a subscriber which can handle only 100 items per second does process the items? The Observable class has an unbounded buffer size, it means it will buffer everything and pushes it to the subscriber, that’s where you get the OutOfMemoryException.
By applying Backpressure onto a stream, it’ll be possible to handle items as needed, unnecessary items can be discarded or even let the producer know when to create and push the new items.

What is the problem?

bigTable.selectAll() // <=== or a hot observable like mouse movement

.map(/** mapping **/)

.observeOn(yourSchedular())

.subscribe(data -> { doSomethingHere(data) });

The problem of the code above is, it’s fetching all the rows from the database and pushes it to downstream, which results in high memory usage because it buffers all the data into memory. Do we want all of the data? Yes! but do we need all of it at once? No

I see this kind of usage in lots of projects, I’m pretty sure most of us have done something like this before even though knowing something is wrong, querying a database and assuming there won’t be lots of data, then it will and we end up with a poor app performance in production. If we are lucky enough we’ll get an OOM Exception but most of the times the app behaves slow and sluggish.

What is the solution?

Backpressure to rescue!! back in RxJava 1, the Observable class was responsible for the backpressure of streams, since RxJava 2 there is a separate class for handling the backpressure, Flowable.

How to create a Flowable?

There are multiple ways for creating a backpressure stream:

  1. Converting the Observable to Flowable with the x.toFloawable() method
Observable.range(1, 1_000_000).toFlowable(BackpressureStrategy.Drop)

With the Drop strategy the downstream doesn’t get all the one million items, it gets the items as it handles the previous items.

example output:
1
2
3
...
100
101
drops some of the items here
100,051
100,052
100,053
...
drops again
523,020
523,021
...
and so on

Note if you subscribe without changing the schedular you will get the whole one million items, since it’s synchronous the producers is blocked by the subscriber.

BackpressureStrategy

There are a few backpressure strategies:

  • Drop: Discards the unrequested items if it exceeds the buffer size
  • Buffer: Buffers all the items from the producer, watch for OOMs
  • Latest: Keeps only the most recent item
  • Error: throws a MissingBackpressureException in case of over emission
  • Missing: No strategy, it would throw a MissingBackpressureException sooner or later somewhere on the downstream

2. Use the Flowable.create() factory method:

We won’t get much more functionality than the x.toFlowable() here. let’s skip this one.

3. Use the Flowable.generate() factory method:

This is what we were looking for, the generate method has a few overloads, this is the one which can satisfy our needs.

Flowable.generate(

() -> 0, //initial state

(state, emitter) -> { //current state

emitter.onNext(state);

return current + 1; // next state

}

)

This code generates a stream of positive numbers: 0,1,2,3,4,5,6,…

The first parameter is a Callable to return an initial state. The second one is a BiFunction which gets called upon on every request to create a new item, its parameters are the current state and an emitter. So let’s apply it to our database code:

Flowable<List<Data>> select(int page, int pageSize) {

return Flowable.generate(

() -> page, //initial page

(currentPage, emitter) -> {

emitter.onNext(table.select("Select * From myTable LIMIT $pageSize OFFSET ${page * pageSize}"));

return currentPage + 1; // next page

});

}

Why there is no string templating in recent java releases, java 9, 10, 11?!! WTH Java

Now we can call it like this:

myTable.select(1, 10)

.map(/** mapping **/)

.flatMap(items -> {}, 1)// <== 1 indicates how many concurrent task should be executed

// observeOn uses a default 128 buffer size so we overwrite it

.observeOn(Schedulers.single(), false, 1)

.subscribe(new DefaultSubscriber<List<Data>>() {

@Override

protected void onStart() {

// super.onStart(); the the default implementation requests Long.MAX_VALUE

request(1);

}



@Override

public void onNext(List<Data> data) {

doSomethingHere(data);

request(1); // if you want more data

}



@Override

public void onError(Throwable t) {

t.printStackTrace();



}



@Override

public void onComplete() {

System.out.println("onComplete");

}

});

That’s all there’s to it. 😃

Posted in Software Engineering

Transaction Isolation

As we know that, in order to maintain consistency in a database, it follows ACID properties. Among these four properties (Atomicity, Consistency, Isolation and Durability) Isolation determines how transaction integrity is visible to other users and systems. It means that a transaction should take place in a system in such a way that it is the only transaction that is accessing the resources in a database system.
Isolation levels define the degree to which a transaction must be isolated from the data modifications made by any other transaction in the database system. A transaction isolation level is defined by the following phenomena –

  • Dirty Read – A Dirty read is the situation when a transaction reads a data that has not yet been committed. For example, Let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile, Transaction 2 reads the updated row. If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
  • Non Repeatable read – Non Repeatable read occurs when a transaction reads same row twice, and get a different value each time. For example, suppose transaction T1 reads data. Due to concurrency, another transaction T2 updates the same data and commit, Now if transaction T1 rereads the same data, it will retrieve a different value.
  • Phantom Read – Phantom Read occurs when two same queries are executed, but the rows retrieved by the two, are different. For example, suppose transaction T1 retrieves a set of rows that satisfy some search criteria. Now, Transaction T2 generates some new rows that match the search criteria for transaction T1. If transaction T1 re-executes the statement that reads the rows, it gets a different set of rows this time.

Based on these phenomena, The SQL standard defines four isolation levels :

  1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may read not yet committed changes made by other transaction, thereby allowing dirty reads. In this level, transactions are not isolated from each other.
  2. Read Committed – This isolation level guarantees that any data read is committed at the moment it is read. Thus it does not allows dirty read. The transaction holds a read or write lock on the current row, and thus prevent other transactions from reading, updating or deleting it.
  3. Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows it references and writes locks on all rows it inserts, updates, or deletes. Since other transaction cannot read, update or delete these rows, consequently it avoids non-repeatable read.
  4. Serializable – This is the Highest isolation level. A serializable execution is guaranteed to be serializable. Serializable execution is defined to be an execution of operations in which concurrently executing transactions appears to be serially executing.

The Table is given below clearly depicts the relationship between isolation levels, read phenomena and locks :


Anomaly Serializable is not the same as Serializable. That is, it is necessary, but not sufficient that a Serializable schedule should be free of all three phenomena types.

Posted in Software Engineering

Threadpool executor

Overview

Original post: http://blog.csdn.net/qq_25806863/article/details/71172823

Analysis When ThreadPool Executor is constructed There is a RejectedExecutionHandler parameter.

Rejected Execution Handler is an interface:

public interface RejectedExecutionHandler {
    void rejectedExecution(Runnable r, ThreadPoolExecutor executor);
}

There is only one way. When the number of threads to be created is greater than the maximum number of threads in the thread pool, the new task is rejected and the method in this interface is called.

This interface can be implemented by itself to handle these tasks that exceed the number.

ThreadPool Executor itself has provided four rejection strategies:

  1. Caller Runs Policy
  2. AbortPolicy
  3. Discard Policy
  4. Discard Oldest Policy.

These four rejection strategies are very simple when you look at the implementation method.

AbortPolicy

The default rejection strategy in ThreadPool Executor is AbortPolicy. Throw an exception directly.

private static final RejectedExecutionHandler defaultHandler =
    new AbortPolicy();

The following is his realization:

public static class AbortPolicy implements RejectedExecutionHandler {
    public AbortPolicy() { }
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
        throw new RejectedExecutionException("Task " + r.toString() +
                                             " rejected from " +
                                             e.toString());
    }
}

Simple and rude, throw a RejectedExecutionException exception directly, and don’t perform this task.

test

First, customize a Runnable to give each thread a name. Next, use this Runnable.

static class MyThread implements Runnable {
        String name;
        public MyThread(String name) {
            this.name = name;
        }
        @Override
        public void run() {
            try {
                Thread.sleep(2000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("thread:"+Thread.currentThread().getName() +" implement:"+name +"  run");
        }
    }

Then we construct a thread pool with a core thread of 1 and a maximum number of threads of 2. The rejection strategy is AbortPolicy

ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 2, 0, 
        TimeUnit.MICROSECONDS, 
        new LinkedBlockingDeque<Runnable>(2), 
        new ThreadPoolExecutor.AbortPolicy());
for (int i = 0; i < 6; i++) {
    System.out.println("Adding the _____________"+i+"Tasks");
    executor.execute(new MyThread("thread"+i));
    Iterator iterator = executor.getQueue().iterator();
    while (iterator.hasNext()){
        MyThread thread = (MyThread) iterator.next();
        System.out.println("List:"+thread.name);
    }
}

The output is:

Analyse the process.

  1. When the first task is added, it is executed directly and the task list is empty.
  2. When adding the second task, because the LinkedBlocking Deque is used and the core thread is executing the task, the second task will be placed in the queue with thread 2 in the queue.
  3. When the third task is added, it will also be placed in the queue, where there are threads 2 and 3.
  4. When adding the fourth task, because the core task is still running and the task queue is full, Hu directly creates a new thread to perform the fourth task. At this time, there are two threads running in the thread pool, reaching the maximum number of threads. There are threads 2 and 3 in the task queue.
  5. When the fifth task is added, there is no place to store and execute it anymore, and it will be rejected by the thread pool to execute the rejected Execution method of the rejected Execution method of the AbortPolicy, which throws an exception directly.
  6. Ultimately, only four threads can run. The latter were rejected.

CallerRunsPolicy

CallerRunsPolicy calls the thread in the current thread pool to execute the rejected task after the task is refused to be added.

The following is his realization:

public static class CallerRunsPolicy implements RejectedExecutionHandler {
    public CallerRunsPolicy() { }
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
        if (!e.isShutdown()) {
            r.run();
        }
    }
}

It’s also simple, run directly.

test

ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 2, 30,
        TimeUnit.SECONDS,
        new LinkedBlockingDeque<Runnable>(2),
        new ThreadPoolExecutor.AbortPolicy());

According to the above operation, output

Note that the fifth task, Task 5, is also rejected by the thread pool, so the rejected Execution method of CallerRunsPolicy is executed, which directly executes the run method of the task. So you can see that Task 5 is executed in the main thread.

It can also be seen from this that because the fifth task runs in the main thread, the main thread is blocked, so that when the fifth task is finished and the sixth task is added, the first two tasks are finished and there are idle threads, so thread 6 can be added to the thread pool to execute.

The disadvantage of this strategy is that it may block the main thread.

DiscardPolicy

This strategy is much simpler to handle. Looking at the implementation, we can see that:

public static class DiscardPolicy implements RejectedExecutionHandler {
    public DiscardPolicy() { }
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
    }
}

This thing did nothing.

Therefore, using this rejection strategy, tasks rejected by thread pools will be discarded directly, no exception will be discarded and no execution will be performed.

test

ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 2, 30,
        TimeUnit.SECONDS,
        new LinkedBlockingDeque<Runnable>(2),
        new ThreadPoolExecutor.DiscardPolicy());

Output:

As you can see, the tasks 5 and 6 added later will not be executed at all, and there will be no response, and they will be discarded directly.

DiscardOldestPolicy

The role of the Discard Oldest Policy strategy is to abandon the oldest task in the task queue when the task refuses to be added, that is, to join the queue first, and then to add the new task.

public static class DiscardOldestPolicy implements RejectedExecutionHandler {
    public DiscardOldestPolicy() { }
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
        if (!e.isShutdown()) {
            e.getQueue().poll();
            e.execute(r);
        }
    }
}

In rejected Execution, the first joined task is popped up from the task queue, a position is vacated, and then the execute method is executed again to join the task in the queue.

test

ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 2, 30,
        TimeUnit.SECONDS,
        new LinkedBlockingDeque<Runnable>(2),
        new ThreadPoolExecutor.DiscardOldestPolicy());

The output is:

As you can see,

  1. When the fifth task is added, it is rejected by the thread pool. At this point, there are tasks 2 and 3 in the task queue.
  2. At this point, the rejection strategy will make the first task in the task queue pop up, that is task 2.
  3. Then the rejected task 5 is added to the human task queue, and then the task queue becomes task 3 and task 5.
  4. When the sixth task is added, because of the same process, task 3 in the queue is discarded and task 6 is added. Task 5 and task 6 are added to the queue.
  5. Therefore, only 1, 4, 5, 6. Task 2 and task 3 times abandoned, will not be executed.

Custom Denial Policy

Looking at the four rejection strategies provided by the previous system, we can see that the implementation of the rejection strategy is very simple. The same is true of self-writing.

For example, if you want the rejected task to be executed in a new thread, you can write as follows:

static class MyRejectedExecutionHandler implements RejectedExecutionHandler {
    @Override
    public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
        new Thread(r,"New Threads"+new Random().nextInt(10)).start();
    }
}

Then use it normally:

ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 2, 30,
        TimeUnit.SECONDS,
        new LinkedBlockingDeque<Runnable>(2),
        new MyRejectedExecutionHandler());

Output:

Tasks 5 and 6 that were rejected were found to be executed in the new thread.

Posted in Software Engineering

10 Docker security best practices.

1. Prefer minimal base images

A common docker container security issue is that you end up with big images for your docker containers. Often times, you might start projects with a generic Docker container image such as writing a Dockerfile with a FROM node, as your “default”. However, when specifying the node image, you should take into consideration that the fully installed Debian Stretch distribution is the underlying image that is used to build it. If your project doesn’t require any general system libraries or system utilities then it is better to avoid using a full blown operating system (OS) as a base image.

graph of docker security 2020 shows top ten most popular docker images that vulnerabilities
Top ten most popular docker images each contain at least 30 vulnerabilities

Taken from the open source security report 2020, as can be seen, each of the top ten Docker images we inspected on Docker Hub contained known vulnerabilities, except for Ubuntu. By preferring minimal images that bundle only the necessary system tools and libraries required to run your project, you are also minimizing the attack surface for attackers and ensuring that you ship a secure OS.

2. Least privileged user

When a Dockerfile doesn’t specify a USER, it defaults to executing the container using the root user. In practice, there are very few reasons why the container should have root privileges and it could very well manifest as a docker security issue. Docker defaults to running containers using the root user. When that namespace is then mapped to the root user in the running container, it means that the container potentially has root access on the Docker host. Having an application on the container run with the root user further broadens the attack surface and enables an easy path to privilege escalation if the application itself is vulnerable to exploitation.

To minimize exposure, opt-in to create a dedicated user and a dedicated group in the Docker image for the application; use the USER directive in the Dockerfile to ensure the container runs the application with the least privileged access possible.

A specific user might not exist in the image; create that user using the instructions in the Dockerfile.
The following demonstrates a complete example of how to do this for a generic Ubuntu image:

FROM ubuntu
RUN mkdir /app
RUN groupadd -r lirantal && useradd -r -s /bin/false -g lirantal lirantal
WORKDIR /app
COPY . /app
RUN chown -R lirantal:lirantal /app
USER lirantal
CMD node index.js

The example above:

  • creates a system user (-r), with no password, no home directory set, and no shell
  • adds the user we created to an existing group that we created beforehand (using groupadd)
  • adds a final argument set to the user name we want to create, in association with the group we created

If you’re a fan of Node.js and alpine images, they already bundle a generic user for you called node. Here’s a Node.js example, making use of the generic node user:

FROM node:10-alpine 
RUN mkdir /app
COPY . /app
RUN chown -R node:node /app
USER node
CMD [“node”, “index.js”]

If you’re developing Node.js applications, you may want to consult with the official Docker and Node.js Best Practices.

Test your docker images for vulnerabilities

3. Sign and verify images to mitigate MITM attacks

Authenticity of Docker images is a challenge. We put a lot of trust into these images as we are literally using them as the container that runs our code in production. Therefore, it is critical to make sure the image we pull is the one that is pushed by the publisher, and that no party has modified it. Tampering may occur over the wire, between the Docker client and the registry, or by compromising the registry of the owner’s account in order to push a malicious image to.

Verify docker images

Docker defaults allow pulling Docker images without validating their authenticity, thus potentially exposing you to arbitrary Docker images whose origin and author aren’t verified.

Make it a best practice that you always verify images before pulling them in, regardless of policy. To experiment with verification, temporarily enable Docker Content Trust with the following command:

export DOCKER_CONTENT_TRUST=1

Now attempt to pull an image that you know is not signed—the request is denied and the image is not pulled.

Sign docker images

Prefer Docker Certified images that come from trusted partners who have been vetted and curated by Docker Hub rather than images whose origin and authenticity you can’t validate.

Docker allows signing images, and by this, provides another layer of protection. To sign images, use Docker Notary. Notary verifies the image signature for you, and blocks you from running an image if the signature of the image is invalid.

When Docker Content Trust is enabled, as we exhibited above, a Docker image build signs the image. When the image is signed for the first time, Docker generates and saves a private key in ~/docker/trust for your user. This private key is then used to sign any additional images as they are built.

For detailed instructions on setting up signed images, refer to Docker’s official documentation.

How is signing docker images with Docker’s Content Trust and Notary different from using GPG? Diogo Mónica has a great talk on this but essentially GPG helps you with verification, not with replay attacks.

4. Find, fix and monitor for open source vulnerabilities

When we choose a base image for our Docker container, we indirectly take upon ourselves the risk of all the container security concerns that the base image is bundled with. This can be poorly configured defaults that don’t contribute to the security of the operating system, as well as system libraries that are bundled with the base image we chose.

A good first step is to make use of as minimal a base image as is possible while still being able to run your application without issues. This helps reduce the attack surface by limiting exposure to vulnerabilities; on the other hand, it doesn’t run any audits on its own, nor does it protect you from future vulnerabilities that may be disclosed for the version of the base image that you are using.

Therefore, one way of protecting against vulnerabilities in open source security software is to use tools such as Snyk, to add continuous docker security scanning and monitoring of vulnerabilities that may exist across all of the Docker image layers that are in use.

Snyk Docker Scanning and Remediation from the CLI

Scan a Docker image for known vulnerabilities with these commands:

# fetch the image to be tested so it exists locally
$ docker pull node:10
# scan the image with snyk
$ snyk test --docker node:10 --file=path/to/Dockerfile

Monitor a Docker image for known vulnerabilities so that once newly discovered vulnerabilities are found in the image, Snyk can notify and provide remediation advice:

$ snyk monitor --docker node:10

5. Don’t leak sensitive information to Docker images

Sometimes, when building an application inside a Docker image, you need secrets such as an SSH private key to pull code from a private repository, or you need tokens to install private packages. If you copy them into the Docker intermediate container they are cached on the layer to which they were added, even if you delete them later on. These tokens and keys must be kept outside of the Dockerfile.

Using multi-stage builds

Another aspect of improving docker container security is through the use of multi-stage builds. By leveraging Docker support for multi-stage builds, fetch and manage secrets in an intermediate image layer that is later disposed of so that no sensitive data reaches the image build. Use code to add secrets to said intermediate layer, such as in the following example:

FROM ubuntu as intermediate

WORKDIR /app
COPY secret/key /tmp/
RUN scp -i /tmp/key build@acme/files .

FROM ubuntu
WORKDIR /app
COPY --from=intermediate /app .

Using Docker secret commands

Use an alpha feature in Docker for managing secrets to mount sensitive files without caching them, similar to the following:

# syntax = docker/dockerfile:1.0-experimental
FROM alpine

# shows secret from default secret location
RUN --mount=type=secret,id=mysecret cat /run/secrets/mysecre

# shows secret from custom secret location
RUN --mount=type=secret,id=mysecret,dst=/foobar cat /foobar

Read more about Docker secrets on their site.

Beware of recursive copy

You should also be mindful when copying files into the image that is being built. For example, the following command copies the entire build context folder, recursively, to the Docker image, which could end up copying sensitive files as well:

COPY . .

If you have sensitive files in your folder, either remove them or use .dockerignore to ignore them:

private.key
appsettings.json

6. Use fixed tags for immutability

Each Docker image can have multiple tags, which are variants of the same images. The most common tag is latest, which represents the latest version of the image. Image tags are not immutable, and the author of the images can publish the same tag multiple times.

This means that the base image for your Docker file might change between builds. This could result in inconsistent behavior because of changes made to the base image. There are multiple ways to mitigate this issue and improve your Docker security posture:

  • Prefer the most specific tag available. If the image has multiple tags, such as :8 and :8.0.1 or even :8.0.1-alpine, prefer the latter, as it is the most specific image reference. Avoid using the most generic tags, such as latest. Keep in mind that when pinning a specific tag, it might be deleted eventually.
  • To mitigate the issue of a specific image tag becoming unavailable and becoming a show-stopper for teams that rely on it, consider running a local mirror of this image in a registry or account that is under your own control. It’s important to take into account the maintenance overhead required for this approach—because it means you need to maintain a registry. Replicating the image you want to use in a registry that you own is good practice to make sure that the image you use does not change.
  • Be very specific! Instead of pulling a tag, pull an image using the specific SHA256 reference of the Docker image, which guarantees you get the same image for every pull. However notice that using a SHA256 reference can be risky, if the image changes that hash might not exist anymore.

7. Use COPY instead of ADD

Docker provides two commands for copying files from the host to the Docker image when building it: COPY and ADD. The instructions are similar in nature, but differ in their functionality and can result in a Docker container security issues for the image:

  • COPY — copies local files recursively, given explicit source and destination files or directories. With COPY, you must declare the locations.
  • ADD — copies local files recursively, implicitly creates the destination directory when it doesn’t exist, and accepts archives as local or remote URLs as its source, which it expands or downloads respectively into the destination directory.

While subtle, the differences between ADD and COPY are important. Be aware of these differences to avoid potential security issues:

  • When remote URLs are used to download data directly into a source location, they could result in man-in-the-middle attacks that modify the content of the file being downloaded. Moreover, the origin and authenticity of remote URLs need to be further validated. When using COPY the source for the files to be downloaded from remote URLs should be declared over a secure TLS connection and their origins need to be validated as well.
  • Space and image layer considerations: using COPY allows separating the addition of an archive from remote locations and unpacking it as different layers, which optimizes the image cache. If remote files are needed, combining all of them into one RUN command that downloads, extracts, and cleans-up afterwards optimizes a single layer operation over several layers that would be required if ADD were used.
  • When local archives are used, ADD automatically extracts them to the destination directory. While this may be acceptable, it adds the risk of zip bombs and Zip Slip vulnerabilities that could then be triggered automatically.

8. Use metadata labels

Image labels provide metadata for the image you’re building. This help users understand how to use the image easily. The most common label is “maintainer”, which specifies the email address and the name of the person maintaining this image. Add metadata with the following LABEL command:

LABEL maintainer="me@acme.com"

In addition to a maintainer contact, add any metadata that is important to you. This metadata could contain: a commit hash, a link to the relevant build, quality status (did all tests pass?), source code, a reference to your SECURITY.TXT file location and so on.

It is good practice to adopt a SECURITY.TXT (RFC5785) file that points to your responsible disclosure policy for your Docker label schema when adding labels, such as the following:

LABEL securitytxt="https://www.example.com/.well-known/security.txt"

See more information about labels for Docker images: https://label-schema.org/rc1/

9. Use multi-stage build for small and secure images

While building your application with a Dockerfile, many artifacts are created that are required only during build-time. These can be packages such as development tooling and libraries that are required for compiling, or dependencies that are required for running unit tests, temporary files, secrets, and so on.

Keeping these artifacts in the base image, which may be used for production, results in an increased Docker image size, and this can badly affect the time spent downloading it as well as increase the attack surface because more packages are installed as a result. The same is true for the Docker image you’re using—you might need a specific Docker image for building, but not for running the code of your application.

Golang is a great example. To build a Golang application, you need the Go compiler. The compiler produces an executable that runs on any operating system, without dependencies, including scratch images.

This is a good reason why Docker has the multi-stage build capability. This feature allows you to use multiple temporary images in the build process, keeping only the latest image along with the information you copied into it. In this way, you have two images:

  • First image—a very big image size, bundled with many dependencies that are used in order to build your app and run tests.
  • Second image—a very thin image in terms of size and number of libraries, with only a copy of the artifacts required to run the app in production.

10. Use a linter

Adopt the use of a linter to avoid common mistakes and establish best practice guidelines that engineers can follow in an automated way. This is a helpful docker security scanning task to statically analyze Dockerfile security issues.

One such linter is hadolint. It parses a Dockerfile and shows a warning for any errors that do not match its best practice rules.

dockerfile security lint static code analysis by hadolint

Hadolint is even more powerful when it is used inside an integrated development environment (IDE). For example, when using hadolint as a VSCode extension, linting errors appear while typing. This helps in writing better Dockerfiles faster.

Be sure to print out the cheat sheet and pin it up somewhere to remind you of some of the docker image security best practices you should follow when building and working with docker images!

How do you harden a docker container image?

You may use linters such as hadolint or dockle to ensure the Dockerfile has secure configuration. Make sure you also scan your container images to avoid vulnerabilities with a severe security impact in your production containers

Posted in Software Engineering

Why Container Orchestrator

Although we can manually maintain a couple of containers or write scripts for dozens of containers, orchestrators make things much easier for operators especially when it comes to managing hundreds and thousands of containers running on a global infrastructure.Most container orchestrators can:

  • Group hosts together while creating a cluster
  • Schedule containers to run on hosts in the cluster based on resources availability
  • Enable containers in a cluster to communicate with each other regardless of the host they are deployed to in the cluster
  • Bind containers and storage resources
  • Group sets of similar containers and bind them to load-balancing constructs to simplify access to containerized applications by creating a level of abstraction between the containers and the user
  • Manage and optimize resource usage
  • Allow for implementation of policies to secure access to applications running inside containers.

With all these configurable yet flexible features, container orchestrators are an obvious choice when it comes to managing containerised applications at scale. In this course, we will explore Kubernetes, one of the most in-demand container orchestration tools available today.

 

Most container orchestrators can be deployed on the infrastructure of our choice – on bare metal, Virtual Machines, on-premise, or the public cloud. Kubernetes, for example, can be deployed on a workstation, with or without a local hypervisor such as Oracle VirtualBox, inside a company’s data center, in the cloud on AWS Elastic Compute Cloud (EC2) instances, Google Compute Engine (GCE) VMs, DigitalOcean Droplets, OpenStack, etc.

There are turnkey solutions which allow Kubernetes clusters to be installed, with only a few commands, on top of cloud Infrastructures-as-a-Service, such as GCE, AWS EC2, Docker Enterprise, IBM Cloud, Rancher, VMware, Pivotal, and multi-cloud solutions through IBM Cloud Private and StackPointCloud.

Last but not least, there is the managed container orchestration as-a-Service, more specifically the managed Kubernetes as-a-Service solution, offered and hosted by the major cloud providers, such as Google Kubernetes Engine (GKE), Amazon Elastic Container Service for Kubernetes (Amazon EKS), Azure Kubernetes Service (AKS), IBM Cloud Kubernetes ServiceDigitalOcean KubernetesOracle Container Engine for Kubernetes, etc.

Posted in Software Engineering

Istio Ingress and DNS

Istio ingress provides external access to your mesh. You can follow official documentation to find you $INGRESS_HOST:$INGRESS_PORT combination. It works perfect, but what if the service client knows nothing about the mesh implementation? All he needs is to use a unique hostname to reach a specific service.

For example, you have myapp service running in myns namespace. The goal is to expose this service outside the K8s cluster with its unique DNS name from your domain. Let’s assume that you are the owner of mycompany.com domain. In this case, the exposed myapp service URL would be:

https://myapp-myns.istio-gw.mycompany.com

Let’s follow the steps on how we can achieve this in AWS cloud by configuring:

  • Istio Gateway
  • AWS ELB/NLB
  • AWS Route 53
  • Istio VirtualService

1. Issue Certificates for Istio Ingress

You can follow this guide to issue certificates or ask your security team to provide you ones.

We’re looking for a ‘*’ wildcard certificate in your domain to match all the service endpoints

In this demo, we’ll use *.istio-gw.mycompany.com certificate and the guide above:

./generate.sh *.istio-gw.mycompany.com <password>

2. Import certificate to Istio ingress trust store

We’ll continue with TLS but you can also use mTLS instead

$ kubectl create -n istio-system secret tls istio-ingressgateway-certs --key *.istio-gw.mycompany.com/3_application/private/\*.istio-gw.mycompany.com --cert *.istio-gw.mycompany.com/3_application/certs/\*.istio-gw.mycompany.com

3. Configure Istio Gateway Object

Create a Gateway object that expects connections to *.istio-gateway.aeg.cloud hosts

cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: my-istio-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway # use istio default ingress gateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
      privateKey: /etc/istio/ingressgateway-certs/tls.key
    hosts:
    - "*.istio-gateway.aeg.cloud"
EOF

4. Create AWS ELB with TCP listener or NLB

Create the AWS Load Balancer and configure the listener 443 port. Use K8s minions as target hosts and 31390 port (default Istio ingress TLS port)

5. Configure Route 53 with ELB/NLB record

Navigate to Route 53 page in AWS

NOTE: mycompany.com hosted zone must be added to AWS

Choose mycompany.com hosted zone and create a Record Set

Configure the Record Set with wildcard name and add ELB/NLB DNS name from step 4. to Alias Target

6. Configure Istio VirtualService Object for Myapp

In this step, we’ll configure the mapping between the external resolvable DNS name

cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp
  namespace: myns
spec:
  hosts:
  - "myapp-myns.istio-gw.mycompany.com"
  gateways:
  - my-istio-gateway.istio-system.svc.cluster.local
  http:
  - route:
    - destination:
        port:
          number: 8443
        host: myapp.myns.svc.cluster.local
EOF

This VirtualService will listen for https://myapp-myns.istio-gw.mycompany.com:443 requests and convert them to https://myapp.myns.svc.cluster.local:8443 internal cluster calls

Testing

Use REST client test tool for testing

curl https://myapp-myns.istio-gw.mycompany.com:443/service/hello -o /dev/null -s -w '%{http_code}\n'
200