Posted in Information Technology

AWS vs Azure vs Google Cloud

The competition for leadership in the public cloud computing is fierce three-way race: AWS vs. Azure vs. Google. Clearly, for infrastructure as a service (IaaS) and platform as a service (PaaS), Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) hold a commanding position among the many cloud companies.

Service-to-Service Comparison

Enterprises typically look to CSPs for three levels of service: Infrastructure as a Service (IaaS, i.e., outsourcing of self-service compute-storage capacity); Platform as a Service (PaaS, i.e., complete environments for developing, deploying, and managing web apps); and secure, performant hosting of Software as a Service (SaaS) apps. Keeping these levels in mind, we have chosen to compare:

  1. Storage (IaaS)
  2. Compute (IaaS)
  3. Management Tools (IaaS, PaaS, SaaS)

Note:   We won’t be comparing pricing since it is quite difficult to achieve apples-to-apples comparisons without a very detailed use case. Once you have determined your organization’s CSP requirements, you can use the CSP price calculators to check if there are significant cost differences: AWSAzureGCP. And we’ve written more about AWS EBS pricing here.

Storage

The CSPs offer a wide range of object, block, and file storage services for both primary and secondary storage use cases. You will find that object storage is well suited to handling massive quantities of unstructured data (images, videos, and so on), while block storage provides better performance for structured transactional data. Storage tiers offer varying levels of accessibility and latency to cost-effectively meet the needs of both active (hot) and inactive (cold) data. In terms of differentiators, Azure takes the lead in managed DR and backup services. When it comes to managing hybrid architectures, AWS and Azure have built-in services, while GCP relies on partners.

AWS Azure GCP
Object storage Amazon Simple Storage Services (Amazon S3): The very first AWS public service Blob Storage Google Cloud Storage
VM disk storage Amazon Elastic Block Store (Amazon EBS) Azure Managed Disks Persistent Disk (both HDD and SSD)
File storage Amazon Elastic File System (Amazon EFS) Azure Files Cloud Filestore
Disaster recovery Provides a set of cloud-based disaster recovery services Site Recovery (DRaaS) Does not provide out-of-the-box DR or backup services
Backup Amazon S3 is often used for secondary backup storage Backup (built into the Azure platform)
Archive storage ●      S3 One Zone-Infrequent Access(introduced April 2018) ●      Amazon Glacier, with data querying capabilities Azure Long-Term Storage: ●      Cool Blob Storage (slightly lower availability than Hot) ●      Archive Storage (offline blob storage) Archival Cloud Storage: ●    Nearline (low frequency) ●    Coldline (lowest frequency)
Bulk data transfer ●      AWS Import/Export Disk: Shipping disk drives ●      AWS Snowball(device-based) ●      AWS SnowMobile: Exabyte-scale data transfer via ruggedized shipping container ●      Azure Import/Export service:Shipping disk drives ●      Azure Data Box Disk service (in preview) Storage Transfer Service
Hybrid support AWS Storage Gateway: Provides a managed virtual tape infrastructure across hybrid environments StorSimple: Enterprise-grade hybrid cloud storage Relies on partners such as Egnyte

Compute

The CSPs offer a range of predefined instance types that define, for each virtual server launched, the type of CPU (or GPU) processor, the number of vCPU or vGPU cores, RAM, and local temporary storage. The instance type determines compute and I/O speeds and other performance parameters, allowing you to optimize price/performance according to different workload requirements. It should be noted that GCP, in addition to its predefined VM types, also offers Custom Machine Types. The CSPs offer pay-as-you-go PaaS options that automatically handle the deployment, scaling, and balancing of web applications and services developed in leading frameworks such as Java, Node.js, PHP, Python, Ruby, and more. AWS offers auto scaling at no additional charge, based on scaling plans that you define for all the relevant resources used by the application. Azure offers auto scaling per app, or as part of platforms that manage groups of apps or groups of virtual machines. GCP offers auto scaling only within the context of its Managed Instance Groups platform. Both AWS and Azure offer services that let you create a virtual private server in a few clicks, but GCP does not yet offer this capability.

AWS Azure GCP
Virtual servers Amazon Elastic Compute Cloud(Amazon EC2) Virtual Machines(Windows or Linux servers) Compute Engine
PaaS Elastic Beanstalk Azure Cloud Services Google App Engine
Scaling AWS Auto Scaling ●      Azure Autoscale(per app or for a group of apps as part of an Azure App Service plan) ●      Virtual Machine Scale Sets (for hyperscale, high-availability apps) Through managed instance groups
Virtual private server support Lightsail   Virtual machine (VM) image N/A

Management Tools

As you may have already experienced, managing and orchestrating cloud resources across multiple business units and complex infrastructures can be a daunting challenge. All three CSPs offer platforms and services to streamline and provide visibility into the organization, configuration, provisioning, deployment, and monitoring of cloud resources. These offerings range from predefined deployment templates and catalogs of approved services to centralized access control. However, AWS and Azure seem to have invested more heavily in this area than GCP, and AWS even offers outsourced managed services (AWS Managed Services).

AWS Azure GCP
Server manage- ment services AWS Systems Manager: Visibility & automation across groups of resources Azure Operational Insights: Operational data analysis, SaaS N/A
Cloud deployment templates AWS CloudFormation: Text files for modeling & provisioning cloud resources Azure Resource Manager: Deploy & control access to categorized resources; includes templates (Azure Building Blocks) ●    Resource Manager: Group, organize, & control access to resources; track & manage projects ●    Cloud Deployment Manager: Template- driven deployment
Logging & monitoring ●    Amazon CloudWatch: Real-time visibility into apps & infrastructure ●    AWS CloudTrail: Logging & monitoring of AWS accounts Azure Monitor,including Log Analytics(data collection & proactive insights) and Application Insights(Application Performance Management platform) Google StackDriver, including monitoring, logging, error reporting, tracing, & debugging
Server automation ●    AWS OpsWorksManaged instances of Chef & Puppet ●    AWS Service CatalogCatalog of IT services approved for AWS ●    Azure Resource Manager (see above) ●    Azure Automation ●    VM extensions: Post-deployment configuration & automation N/A
Posted in Information Technology

Spring Cloud Config

 

1. Overview

Spring Cloud Config is Spring’s client/server approach for storing and serving distributed configurations across multiple applications and environments.

This configuration store is ideally versioned under Git version control and can be modified at application runtime. While it fits very well in Spring applications using all the supported configuration file formats together with constructs like EnvironmentPropertySource or @Value, it can be used in any environment running any programming language.

In this write-up, we’ll focus on an example of how to setup a Git-backed config server, use it in a simple REST application server and setup a secure environment including encrypted property values.

2. Project Setup and Dependencies

To get ready for writing some code, we create two new Maven projects first. The server project is relying on the spring-cloud-config-server module, as well as the spring-boot-starter-security and spring-boot-starter-web starter bundles:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-config-server</artifactId>
    <version>1.1.2.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-security</artifactId>
    <version>1.4.0.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>1.4.0.RELEASE</version>
</dependency>

However for the client project we’re going to only need the spring-cloud-starter-config and the spring-boot-starter-web modules:

1
2
3
4
5
6
7
8
9
10
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-config</artifactId>
    <version>1.1.2.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>1.4.0.RELEASE</version>
</dependency>

3. A Config Server Implementation

The main part of the application is a config class – more specifically a @SpringBootApplication – which pulls in all the required setup through the auto-configure annotation @EnableConfigServer:

1
2
3
4
5
6
7
8
@SpringBootApplication
@EnableConfigServer
public class ConfigServer {
    
    public static void main(String[] arguments) {
        SpringApplication.run(ConfigServer.class, arguments);
    }
}

Now we need to configure the server port on which our server is listening and a Git-url which provides our version-controlled configuration content. The latter can be used with protocols like httpssh or a simple fileon a local filesystem.

Tip: If you are planning to use multiple config server instances pointing to the same config repository, you can configure the server to clone your repo into a local temporary folder. But be aware of private repositories with two-factor authentication, they are difficult to handle! In such a case, it is easier to clone them on your local filesystem and work with the copy.

There are also some placeholder variables and search patterns for configuring the repository-url available; but this is beyond the scope of our article. If you are interested, the official documentation is a good place to start.

We also need to set a username and a password for the Basic-Authentication in our application.propertiesto avoid an auto-generated password on every application restart:

1
2
3
4
5
server.port=8888
spring.cloud.config.server.git.uri=ssh://localhost/config-repo
spring.cloud.config.server.git.clone-on-start=true
security.user.name=root
security.user.password=s3cr3t

4. A Git Repository as Configuration Storage

To complete our server, we have to initialize a Git repository under the configured url, create some new properties files and popularize them with some values.

The name of the configuration file is composed like a normal Spring application.properties, but instead of the word ‘application’ a configured name, e.g. the value of the property ‘spring.application.name’ of the client is used, followed by a dash and the active profile. For example:

1
2
3
4
5
$> git init
$> echo 'user.role=Developer' > config-client-development.properties
$> echo 'user.role=User'      > config-client-production.properties
$> git add .
$> git commit -m 'Initial config-client properties'

Troubleshooting: If you run into ssh-related authentication issues, double check ~/.ssh/known_hostsand ~/.ssh/authorized_keys on your ssh server!

5. Querying the Configuration

Now we’re able to start our server. The Git-backed configuration API provided by our server can be queried using the following paths:

1
2
3
4
5
/{application}/{profile}[/{label}]
/{application}-{profile}.yml
/{label}/{application}-{profile}.yml
/{application}-{profile}.properties
/{label}/{application}-{profile}.properties

In which the {label} placeholder refers to a Git branch, {application} to the client’s application name and the {profile} to the client’s current active application profile.

So we can retrieve the configuration for our planned config client running under development profile in branch master via:

1
$> curl http://root:s3cr3t@localhost:8888/config-client/development/master

6. The Client Implementation

Next, let’s take care of the client. This will be a very simple client application, consisting of a REST controller with one GET method.

The configuration, to fetch our server, must be placed in a resource file named bootstrap.application, because this file (like the name implies) will be loaded very early while the application starts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@SpringBootApplication
@RestController
public class ConfigClient {
    
    @Value("${user.role}")
    private String role;
    public static void main(String[] args) {
        SpringApplication.run(ConfigClient.class, args);
    }
    @RequestMapping(
      value = "/whoami/{username}",
      method = RequestMethod.GET,
      produces = MediaType.TEXT_PLAIN_VALUE)
    public String whoami(@PathVariable("username") String username) {
        return String.format("Hello!
          You're %s and you'll become a(n) %s...\n", username, role);
    }
}

In addition to the application name, we also put the active profile and the connection-details in our bootstrap.properties:

1
2
3
4
5
spring.application.name=config-client
spring.profiles.active=development
spring.cloud.config.uri=http://localhost:8888
spring.cloud.config.username=root
spring.cloud.config.password=s3cr3t

To test, if the configuration is properly received from our server and the role value gets injected in our controller method, we simply curl it after booting the client:

1
$> curl http://localhost:8080/whoami/Mr_Pink

If the response is as follows, our Spring Cloud Config Server and its client are working fine for now:

1
Hello! You're Mr_Pink and you'll become a(n) Developer...

7. Encryption and Decryption

Requirement: To use cryptographically strong keys together with Spring encryption and decryption features you need the ‘Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files’installed in your JVM. These can be downloaded for example from Oracle. To install follow the instructions included in the download. Some Linux distributions also provide an installable package through their package managers.

Since the config server is supporting encryption and decryption of property values, you can use public repositories as storage for sensitive data like usernames and passwords. Encrypted values are prefixed with the string {cipher} and can be generated by an REST-call to the path ‘/encrypt’, if the server is configured to use a symmetric key or a key pair.

An endpoint to decrypt is also available. Both endpoints accept a path containing placeholders for the name of the application and its current profile: ‘/*/{name}/{profile}’. This is especially useful for controlling cryptography per client. However, before they become useful, you have to configure a cryptographic key which we will do in the next section.

Tip: If you use curl to call the en-/decryption API, it’s better to use the –data-urlencode option (instead of –data/-d), or set the ‘Content-Type’ header explicit to ‘text/plain’. This ensures a correct handling of special characters like ‘+’ in the encrypted values.

If a value can’t be decrypted automatically while fetching through the client, its key is renamed with the name itself, prefixed by the word ‘invalid’. This should prevent, for example the usage of an encrypted value as password.

Tip: When setting-up a repository containing YAML files, you have to surround your encrypted and prefixed values with single-quotes! With Properties this is not the case.

7.1. Key Management

The config server is per default enabled to encrypt property values in a symmetric or asymmetric way.

To use symmetric cryptography, you simply have to set the property ‘encrypt.key’ in your application.properties to a secret of your choiceAlternatively you can pass-in the environment variable ENCRYPT_KEY.

For asymmetric cryptography, you can set ‘encrypt.key’ to a PEM-encoded string value or configure a keystore to use.

Because we need a highly secured environment for our demo server, we chose the latter option and generating a new keystore, including a RSA key-pair, with the Java keytool first:

1
2
3
4
5
$> keytool -genkeypair -alias config-server-key \
       -keyalg RSA -keysize 4096 -sigalg SHA512withRSA \
       -dname 'CN=Config Server,OU=Spring Cloud,O=Baeldung' \
       -keypass my-k34-s3cr3t -keystore config-server.jks \
       -storepass my-s70r3-s3cr3t

After that, we’re adding the created keystore to our server’s application.properties and re-run it:

1
2
3
4
encrypt.key-store.location=classpath:/config-server.jks
encrypt.key-store.password=my-s70r3-s3cr3t
encrypt.key-store.alias=config-server-key
encrypt.key-store.secret=my-k34-s3cr3t

As next step we can query the encryption-endpoint and add the response as value to a configuration in our repository:

1
2
3
4
5
$> export PASSWORD=$(curl -X POST --data-urlencode d3v3L \
       http://root:s3cr3t@localhost:8888/encrypt)
$> echo "user.password=$PASSWORD" >> config-client-development.properties
$> git commit -am 'Added encrypted password'
$> curl -X POST http://root:s3cr3t@localhost:8888/refresh

To test, if our setup works correctly, we’re modifying the ConfigClient class and restart our client:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@SpringBootApplication
@RestController
public class ConfigClient {
    ...
    
    @Value("${user.password}")
    private String password;
    ...
    public String whoami(@PathVariable("username") String username) {
        return String.format("Hello!
          You're %s and you'll become a(n) %s, " +
          "but only if your password is '%s'!\n",
          username, role, password);
    }
}

A final query against our client will show us, if our configuration value is being correct decrypted:

1
2
3
$> curl http://localhost:8080/whoami/Mr_Pink
Hello! You're Mr_Pink and you'll become a(n) Developer, \
  but only if your password is 'd3v3L'!

7.2. Using Multiple Keys

If you want to use multiple keys for encryption and decryption, for example: a dedicated one for each served application, you can add another prefix in the form of {name:value} between the {cipher} prefix and the BASE64-encoded property value.

The config server understands prefixes like {secret:my-crypto-secret} or {key:my-key-alias} nearly out-of-the-box. The latter option needs a configured keystore in your application.properties. This keystore is searched for a matching key alias. For example:

1
2
user.password={cipher}{secret:my-499-s3cr3t}AgAMirj1DkQC0WjRv...
user.password={cipher}{key:config-client-key}AgAMirj1DkQC0WjRv...

For scenarios without keystore you have to implement a @Bean of type TextEncryptorLocator which handles the lookup and returns a TextEncryptor-Object for each key.

7.3. Serving Encrypted Properties

If you want to disable server-side cryptography and handle decryption of property-values locally, you can put the following in your server’s application.properties:

1
spring.cloud.config.server.encrypt.enabled=false

Furthermore you can delete all the other ‘encrypt.*’ properties to disable the REST endpoints.

8. Conclusion

Now we are able to create a configuration server to provide a set of configuration files from a Git repository to client applications. There are a few other things you can do with such a server.

For example:

  • Serve configuration in YAML or Properties format instead of JSON – also with placeholders resolved. Which can be useful, when using it in non-Spring environments, where the configuration is not directly mapped to a PropertySource.
  • Serve plain text configuration files – in turn optionally with resolved placeholders. This can be useful for example to provide an environment-dependent logging-configuration.
  • Embed the config server into an application, where it configures itself from a Git repository, instead of running as standalone application serving clients. Therefore some bootstrap properties must be set and/or the @EnableConfigServer annotation must be removed, which depends on the use case.
  • Make the config server available at Spring Netflix Eureka service discovery and enable automatic server discovery in config clients. This becomes important if the server has no fixed location or it moves in its location.
Posted in Information Technology

Architectural Pattern

What is an Architectural Pattern?

According to Wikipedia,

An architectural pattern is a general, reusable solution to a commonly occurring problem in software architecture within a given context. Architectural patterns are similar to software design pattern but have a broader scope.

In this article, I will be briefly explaining the following 10 common architectural patterns with their usage, pros and cons.

  1. Layered pattern
  2. Client-server pattern
  3. Master-slave pattern
  4. Pipe-filter pattern
  5. Broker pattern
  6. Peer-to-peer pattern
  7. Event-bus pattern
  8. Model-view-controller pattern
  9. Blackboard pattern
  10. Interpreter pattern

1. Layered pattern

This pattern can be used to structure programs that can be decomposed into groups of subtasks, each of which is at a particular level of abstraction. Each layer provides services to the next higher layer.

The most commonly found 4 layers of a general information system are as follows.

  • Presentation layer (also known as UI layer)
  • Application layer (also known as service layer)
  • Business logic layer (also known as domain layer)
  • Data access layer (also known as persistence layer)

Usage

  • General desktop applications.
  • E commerce web applications.
Layered pattern

2. Client-server pattern

This pattern consists of two parties; a server and multiple clients. The server component will provide services to multiple client components. Clients request services from the server and the server provides relevant services to those clients. Furthermore, the server continues to listen to client requests.

Usage

  • Online applications such as email, document sharing and banking.
Client-server pattern

3. Master-slave pattern

This pattern consists of two parties; master and slaves. The master component distributes the work among identical slave components, and computes a final result from the results which the slaves return.

Usage

  • In database replication, the master database is regarded as the authoritative source, and the slave databases are synchronized to it.
  • Peripherals connected to a bus in a computer system (master and slave drives).
Master-slave pattern

4. Pipe-filter pattern

This pattern can be used to structure systems which produce and process a stream of data. Each processing step is enclosed within a filter component. Data to be processed is passed through pipes. These pipes can be used for buffering or for synchronization purposes.

Usage

  • Compilers. The consecutive filters perform lexical analysis, parsing, semantic analysis, and code generation.
  • Workflows in bioinformatics.
Pipe-filter pattern

5. Broker pattern

This pattern is used to structure distributed systems with decoupled components. These components can interact with each other by remote service invocations. A broker component is responsible for the coordination of communication among components.

Servers publish their capabilities (services and characteristics) to a broker. Clients request a service from the broker, and the broker then redirects the client to a suitable service from its registry.

Usage

Broker pattern

6. Peer-to-peer pattern

In this pattern, individual components are known as peers. Peers may function both as a client, requesting services from other peers, and as a server, providing services to other peers. A peer may act as a client or as a server or as both, and it can change its role dynamically with time.

Usage

  • File-sharing networks such as Gnutella and G2)
  • Multimedia protocols such as P2PTV and PDTP.
Peer-to-peer pattern

7. Event-bus pattern

This pattern primarily deals with events and has 4 major components; event sourceevent listenerchannel and event bus. Sources publish messages to particular channels on an event bus. Listeners subscribe to particular channels. Listeners are notified of messages that are published to a channel to which they have subscribed before.

Usage

  • Android development
  • Notification services
Event-bus pattern

8. Model-view-controller pattern

This pattern, also known as MVC pattern, divides an interactive application in to 3 parts as,

  1. model — contains the core functionality and data
  2. view — displays the information to the user (more than one view may be defined)
  3. controller — handles the input from the user

This is done to separate internal representations of information from the ways information is presented to, and accepted from, the user. It decouples components and allows efficient code reuse.

Usage

  • Architecture for World Wide Web applications in major programming languages.
  • Web frameworks such as Django and Rails.
Model-view-controller pattern

9. Blackboard pattern

This pattern is useful for problems for which no deterministic solution strategies are known. The blackboard pattern consists of 3 main components.

  • blackboard — a structured global memory containing objects from the solution space
  • knowledge source — specialized modules with their own representation
  • control component — selects, configures and executes modules.

All the components have access to the blackboard. Components may produce new data objects that are added to the blackboard. Components look for particular kinds of data on the blackboard, and may find these by pattern matching with the existing knowledge source.

Usage

  • Speech recognition
  • Vehicle identification and tracking
  • Protein structure identification
  • Sonar signals interpretation.
Blackboard pattern

10. Interpreter pattern

This pattern is used for designing a component that interprets programs written in a dedicated language. It mainly specifies how to evaluate lines of programs, known as sentences or expressions written in a particular language. The basic idea is to have a class for each symbol of the language.

Usage

  • Database query languages such as SQL.
  • Languages used to describe communication protocols.
Interpreter pattern

Comparison of Architectural Patterns

The table given below summarizes the pros and cons of each architectural pattern.

Comparison of Architectural Patterns
Posted in Information Technology

Code Monkey

"Code Monkey" 'code mon.key' (/koʊd/ /ˈmʌŋki/)

A "Code Monkey" is a derogatory term used to describe a programmer that:
  • Preforms programming tasks that are considered extremely simple or of no real challenge.
  • Not really allowed to solve problems, or take part in design of the application.
Now "real" programmers sometimes also have preform these types of coding from time to time.

However the main difference is that a "code monkey" doesn't have a choice in doing anything else.

A "Code Monkey" could be used to either imply a programmer's position OR ability

 

To me, at least, the distinction is that a code monkey merely produces code without really thinking about it, where as a “proper” programmer is a professional. They use engineering techniques to produce higher-quality code and have an awareness of the system as a whole, do better planning and more thorough design.

For example some features of a “proper” programmer (although be aware of cargo cultism) might be:

  • A programmer is involved, to a certain extent, with the entire software development lifecycle, not just coding. Code monkeys may be coding up designs or to requirements that were dumped on them, rather than created in consultation with them.
  • Programmers create extensive designs (including tests) before writing any code. They are fairly certain that the design is good (fast, efficient etc.) before they start writing it. Code monkeys jump straight in. They don’t know if the design is good until they run it.
  • Programmers take responsibility for planning their own work. Code monkeys just do what their manager tells them, when they’re told to.
  • Programmers are valued as an individual for their creativity and skills. Code monkeys are seen as interchangeable black boxes that output code.
  • Programmers are adaptable; they can apply their skills to numerous areas, languages etc. Code monkeys over-specialise, and get lost if they have to work with a new framework.
  • Programmers always look to develop themselves as a professional. Code monkeys stay where they are in terms of skills and experience.

I’ve used two points at opposite ends of a spectrum here – I suspect most jobs will lie somewhere in between. In addition, it’s unlikely that an entire career will stay at the same place – a good company will strive to move its employees towards the programmer end of the scale through training and professional development. It may be worth taking a junior programmer job at the code monkey end if the employer has a graduate scheme or similar that will result in “proper” programmer status eventually.

Posted in Information Technology

AWS S3

The public cloud and Amazon Web Services in particular have seen massive growth over the last few years.  In April 2015, Amazon broke out the revenue figures of AWS for the first time, showing that the subsidiary was a $7.3 billion business with over 1 million active customers, accounting for 8% of Amazon’s total revenue.   At the heart of AWS is S3, the Simple Storage Service, an online object store that is now ten years old and stores trillions of objects (latest figures published in 2013 showed 2 trillion objects, with the amount stored doubling each year).

S3 has been remarkably successful and is the foundation for many well known services such as Dropbox and Pintrest.  Part of the reason for this success has been the flexibility of object stores compared to standard block and file protocols.  From a user perspective, these protocols give little in the way of controlling how the data is stored and managed (they only support basic I/O commands like read, write, open and close).

S3 on the other hand is all about object level management and manipulation, with S3 you can describe how you want to store objects, encrypt them, present them (even as a website) and much more.  Each object is validated during I/O operations unlike a file system (NFS/SMB) which does data integrity checking only at the entire file system level.

In addition to the management capabilities is the relative ease in which data can be stored in the system.  The underlying storage infrastructure isn’t exposed to the customer.  Instead access is provided through a set of programming interfaces, commonly called the S3 API.  It’s through a combination of features, simplicity and ubiquity of this API that S3 has been so successful.

S3 Described

The S3 API is an application programming interface that provides the capability to store, retrieve, list and delete objects (or binary files) in S3.  When first released in 2006, the S3 API supported REST, SOAP and BitTorrent protocols as well as development through an SDK for common programming languages such as Java .NET, PHP and Ruby.  Storing and retrieving data is remarkably simple; objects are grouped into logical containers called buckets and accessed through a flat hierarchy that simply references the object name, bucket name and the AWS region holding the data.  When using the REST protocol, these pieces combine into a URL that provides a unique reference for the object.  Actions on the object are executed with simple PUT and GET commands that encapsulate the data and response into the HTTP header and body.

S3 features are reflected in the API and have matured over time to include:

  • Metadata – this includes system metadata and additional information created by the user when the object is stored.
  • Multi-tenancy – S3 is divided into many customers, each of which sees an isolated, secure view of their data.
  • Security & Policy – access is controlled at the account, bucket and object level.
  • Lifecycle Management – objects can be both versioned and managed across multiple tiers of storage over the object lifetime.
  • Atomic Updates – objects are uploaded, updated or copied in a single transaction/instruction.
  • Search – accounts and buckets can be searched with object-level granularity.
  • Logging – all transactions can be logged within S3 itself.
  • Notifications – changes to data in S3 can be used to generate alerts.
  • Replication – data can be replicated between AWS locations.
  • Encryption – data is encrypted in flight and can be optionally encrypted at rest using either system or user generated keys.
  • Billing – service charges are based on capacity of data stored and data accessed.

Due to it’s longevity in the market and maturity of features, the S3 API has become the ‘de facto’ standard for object-based storage interfaces.  In addition to their own proprietary APIs, pretty much every object storage vendor in the market place supports S3 in some form or other.  Having support for S3 provides a number of benefits:

  • Standardisation – users/customers that have already written for S3 can use an on-premises object store simply by changing the object location in the URL (assuming security configurations are consistent). All of their existing code should work with little or no modification.
  • Maturity – S3 offers a wealth of features (as already discussed) that cover pretty much every feature needed in an object store. Obviously there are some gaps (including object locking, full consistency and bucket nesting), which could be implemented as a superset by object storage vendors.
  • Knowledge – end users who are looking to deploy object stores don’t have to go to the market and acquire specific platform skills. Instead they can use resources that are already familiar with S3, whether they are individuals or companies.

S3 Compatibility

The current S3 API Developer Guide runs to 625 pages and has updates monthly, so vendors’ claims of compatibility could mean many things.  Both Eucalyptus and SwiftStack claim S3 API support, however looking at the specific feature support we see many gaps, especially around bucket-related features and object-based ACLs (rather an important security requirement).  When establishing security credentials, AWS currently uses two versions for signing (v2 and v4), each of which provide slightly different functionality (such as being able to verify the identity of the requestor).  We will go into the specifics of support in future posts.

As well as features/functionality, there are questions of compatibility in terms of performance and the way in which the S3 interface is implemented.  Some vendors will translate S3 API calls into their own native API, rather than processing them directly.  This can lead to performance issues where on-premises object stores don’t behave and respond with the same error codes or response levels expected when using S3 directly.

Summary

The S3 API is the standard way in which data is stored and retrieved by object stores.  Mature S3 support provides end users with significant benefits around compatibility and simplicity.  In this series of posts we will dig deeper into the S3 API, including a look at the security and policy features, some of the advanced functionality and how S3 is supported across the wider industry.

Posted in Information Technology

AMI or Docker

https://medium.com/aws-activate-startup-blog/a-better-dev-test-experience-docker-and-aws-291da5ab1238

You’ve been tasked with creating the REST API for a mobile app for tracking health and fitness. You code the first endpoint using the development environment on your laptop. After running all the unit tests and seeing that they passed, you check your code into the Git repository and let the QA engineer know that the build is ready for testing. The QA engineer dutifully deploys the most recent build to the test environment, and within the first few minutes of testing discovers that your recently developed REST endpoint is broken.

How can this be? You’ve got thorough code coverage with your unit tests, and all were passing before the handoff to QA. After spending a few hours troubleshooting alongside the QA engineer, you discover that the test environment is using an outdated version of a third-party library, and this is what is causing your REST endpoints to break.

This is an all too common problem in software development. Slight differences between development, test, stage, and production environments can wreak mayhem on an application. Traditional approaches for dealing with this problem, such as change management processes, are too cumbersome for today’s rapid build and deploy cycles. What is needed instead is a way to transfer an environment seamlessly from development to test, eliminating the need for manual and error prone resource provisioning and configuration.

AWS has long offered services that address the need to reliably and efficiently automate the creation of an environment. Services like Amazon EC2 and AWS CloudFormation allow infrastructure to be managed as code. Through the CloudFormation service, AWS resources can be provisioned declaratively using JSON. CloudFormation templates can be versioned right alongside the application code itself. Combined with the automation capabilities of EC2, this allows for a complex environment to be spun up and torn down quickly and reliably. These are just some of the reasons why AWS is such a good choice for development and test workloads.

Container technology, like that being developed by Docker, takes the concept of declarative resource provisioning a step further. Similar to the way CloudFormation can provision an EC2 instance, Docker provides a declarative syntax for creating containers. But Docker containers don’t depend on any specific virtualization platform, nor do they require a separate operating system. A container simply requires a Linux kernel in order to run. This means they can run almost anywhere — be it on a laptop or EC2 instance.

The Docker container architecture is as follows:

Docker containers use an execution environment called libcontainer, which is an interface to various Linux kernel isolation features, like namespaces and cgroups. This architecture allows for multiple containers to be run in complete isolation from one another while sharing the same Linux kernel. Because a Docker container instance doesn’t require a dedicated OS, it is much more portable and lightweight than a virtual machine.

The Docker platform architecture consists of the following:

A Docker client doesn’t communicate directly with the running containers. Instead, it communicates with the Docker daemon via TCP sockets or REST. The daemon communicates directly with the containers running on the host. The Docker client can either be installed local to the daemon, or on a different host altogether.

There are three key concepts to understand when working with Docker: images, registries, and containers.

An image is the build component of a container. It is a read-only template from which one or more container instances can be launched. Conceptually, it’s similar to an AMI.

Registries are used to store images. Registries can be local or remote. When we launch a container, Docker first searches the local registry for the image. If it’s not found locally, then it searches a public remote registry, called DockerHub. If the image is there, Docker downloads it to the local registry and uses it to launch the container. DockerHub is similar to Github, in that we can create both public and private image repositories. This makes it easy to distribute images efficiently and securely.

Finally, a container is a running instance of an image. Docker uses containers to execute and run the software contained in the image.

You can create Docker images from a running container, similar to the way we create an AMI from an EC2 instance. For example, one could launch a container, install a bunch of software packages using a package manager like APT or yum, and then commit those changes to a new Docker image.

But a more powerful and flexible way of creating images is through something called a DockerFile, which allows images to be defined declaratively. The DockerFile syntax consists of a set of commands that we can use to install and configure the various components that comprise the image. Writing a DockerFile is not at all unlike using UserData to configure an EC2 instance after launch. Like a CloudFormation template, a DockerFile can be tracked and distributed using a version control system. You can think of a DockerFile as the build file for an image.

So how could Docker help with our mobile health & fitness application example? The application architecture consists of the following components:

First, let’s define a Docker image for launching a container for running the REST endpoint. We can use this to test our code on a laptop, and the QA engineer can use this to test the code in EC2. The REST endpoints are going to be developed using Ruby and the Sinatra framework, so these will need to be installed in the image. The back end will use Amazon DynamoDB. To ensure that the application can be run from both inside and outside AWS, the Docker image will include the DynamoDB local database. Here’s what the DockerFile looks like:

FROM ubuntu:14.04
MAINTAINER Nate Slater <slatern@amazon.com>
RUN apt-get update && apt-get install -y curl wget default-jre git
RUN adduser --home /home/sinatra --disabled-password --gecos '' sinatra
RUN adduser sinatra sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER sinatra
RUN curl -sSL https://get.rvm.io | bash -s stable
RUN /bin/bash -l -c "source /home/sinatra/.rvm/scripts/rvm"
RUN /bin/bash -l -c "rvm install 2.1.2"
RUN /bin/bash -l -c "gem install sinatra"
RUN /bin/bash -l -c "gem install thin"
RUN /bin/bash -l -c "gem install aws-sdk"
RUN wget -O /home/sinatra/dynamodb_local.tar.gz https://s3-us-west-2.amazonaws.com/dynamodb-local/dynamodb_local_2013-12-12.tar.gz
RUN tar -C /home/sinatra -xvzf /home/sinatra/dynamodb_local.tar.gz

The contents of the DockerFile should be pretty self-explanatory. The RUN keyword is used to execute commands. By default, commands execute as the root user. Since we’re using RVM to install Ruby, we switch to the Sinatra user with the USER keyword, so that the Ruby distribution is installed under the user’s home directory. From the point at which the USER command is specified, all subsequent RUN commands will be executed as the Sinatra user. This also means that when the container is launched, it will execute commands as the Sinatra user.

The Docker daemon is responsible for managing images and running containers, and the Docker client is used to issue commands to the daemon. So to build our image from the above DockerFile, we’ll execute this client command:

$ docker build --tag=”aws_activate/sinatra:v1" .

Full documentation of the Docker client commands can be found on the docker.io website. But let’s take a closer look at the command used to build our image. The tag option sets an identifier on the image. The typical value for the tag option is owner/repository:version. This makes it easy to identify what an image contains and who owns it when viewing the images in a registry.

After we execute the build command, we should have an image configured using the declarations in the DockerFile:

$ docker images
REPOSITORY                  TAG            IMAGE ID       CREATED                     VIRTUAL SIZE
aws_activate/sinatra        v1             84b6d4a5a22b   
36 hours ago                942.2 MB
ubuntu                      14.04          96864a7d2df3    
6 days ago                  205.1 MB

Sure enough, we can see that Docker created our image, assigned it the tag we specified on the command line, as well as a unique image ID. Now let’s launch a container from this newly created image:

$ docker run -it aws_activate/sinatra:v1 /bin/bash

This command launches the container and drops us into a bash shell. From here, we can interact with the container just as we would a Linux server. Since we’re developing a web application, we’ll clone our latest version into the container from the Git repository, run our unit tests, and get ready to hand it off to QA. Once the code has been cloned into the container and is ready for testing, we’ll commit our changes in the running container to a new image. To do this, we need to determine the container ID:

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b9d03d60ba89 aws_activate/sinatra:v1 "/bin/bash" 11 minutes ago Up 11 minutes nostalgic_davinci

Next, we run the commit command:

$ docker commit -m “ready for testing” b9d03d60ba89 aws_activate/sinatra:v1.1

Now we have a new container in our local registry:

$ docker images
REPOSITORY                 TAG               IMAGE ID 
CREATED                    VIRTUAL SIZE
aws_activate/sinatra       v1.1           40355be9eb8f 
21 hours ago               947.5 MB
aws_activate/sinatra       v1             84b6d4a5a22b 
3 days ago                 942.2 MB
ubuntu                     14.04          96864a7d2df3 
8 days ago                 205.1 MB

Version 1.1 of our container includes the Sinatra application that will serve up our REST endpoint. We can run this web application as follows:

$ docker run -d -w /home/sinatra -p 10001:4567 aws_activate/sinatra:v1.1 ./run_app.sh

This tells Docker to do the following:

  1. Create a container from image aws_activate/sinatra:v1.1
  2. Run the container in detached mode (-d)
  3. Set the working directory to be /home/sinatra (-w)
  4. Map the container port of 4567 to the host port of 10001 (-p)
  5. Execute a shell script called run_app.sh in the container

The shell script starts up the local DynamoDB database in the container and launches the Sinatra application using the Thin webserver on port 4567. Now if we point our browser on the laptop running this Docker container to http://localhost:10001/activity/1, we should see the following:

{"activity_id":"1",
"user_id":" db430d35-92a0-49d6-ba79-0f37ea1b35f7",
"type":"meal",
"calories":100,
"date":"2014-09-26 15:33:58 +0000"}

Our endpoint seems to be working properly — the activity record was pulled from the local DynamoDB database and returned as JSON from the Sinatra application code.

To make this container available to the QA engineer for further testing, we can push it into DockerHub, the public registry. Similar to Github, DockerHub offers both public and private options if we don’t want to make this container available to the general public.

The QA engineer will be running this container in EC2, which means we’ll need an EC2 instance configured with the Docker daemon and client software. Assuming we’re going to bring up the EC2 instance and the DynamoDB table using CloudFormation, we can bootstrap in the Docker software installation using the UserData property of the CloudFormation AWS::EC2::Instance type. Here’s what the JSON for provisioning the EC2 instance in CloudFormation looks like:

"DockerInstance": {
     "Type": "AWS::EC2::Instance",
     "Properties": {
          "InstanceType": "t2.micro",
          "ImageId": {"Fn::FindInMap" : ["RegionMap",{"Ref" :
          "AWS::Region"}, "64"]},
 "KeyName": {"Ref": "KeyName"},
 "SubnetId": {"Ref": "SubnetId"},
 "SecurityGroupIds": [{"Ref": "SecurityGroupId"}], 
 "Tags": [{"Key": "Name", "Value": "DockerHost"}],
 "UserData": {"Fn::Base64":
"#include https://get.docker.io"}
 }}

Now when the QA engineer logs into the EC2 instance created by the CloudFormation stack, the image can be pulled from the remote DockerHub registry:

$ docker pull aws_activate/sinatra:v1.1

The command used to start the container from this image is virtually identical to the one shown above. The one difference is that an environment variable will be set using the “-e” option to start the Sinatra application using the “test” environment configuration. This configuration will use the regional endpoint for connecting to DynamoDB, instead of the local endpoint:

$ docker run -d -w /home/sinatra –e “RACK_ENV=test” -p 10001:4567 aws_activate/sinatra:v1.1 ./run_app.sh

Now the QA engineer can access the REST endpoint over HTTP using the public DNS name of the EC2 instance and port number 10001 (this requires a security group rule that allows ingress on port 10001). If any bugs are found, the running container can be committed to a new image, tagged with an appropriate version number, and pushed to the registry. The state of the container will be completely preserved, making it easier for us (the software developers) to reproduce any issues found in QA, examine log files, and generally troubleshoot the problem.

We hope this has been a good introduction to Docker, and that you’ve seen how Docker and AWS are a great combination. The portability of Docker containers makes them an excellent choice for dev and test workloads, because we can so easily share the containers across teams. EC2 and CloudFormation make a great combination for running containers in AWS, but the story doesn’t end there. Other services, like AWS ElasticBeanstalk, include support for deploying entire application stacks into Docker containers. Be sure to check out this and the other AWS blogs for more information about running Docker in AWS!

 

Posted in Information Technology

Continous Integration Tools Comparison

As you build a product, your codebase keeps growing and, unless properly managed, can become a virtual Rubik’s cube for future developers to solve. Back in the day, when waterfall methodology ruled, it could take months or even years to deliver a product’s first shippable version.

Switching to Agile methods helped reduce programming cycles to weeks and introduced steady-interval delivery. Today’s practice of continuous integration (CI) rolls out program updates even faster, within days or hours. That’s the result of the frequent submission of code into a shared repository so that developers can easily track defects using automated tests, and then fix them as soon as possible. In our dedicated article, we explain in detail the benefits of continuous integration, how to approach its adoption, and what challenges to expect along the way.

How Continuous Integration works

How Continuous Integration works, starting from triggering a build in a version control system (VCS) Source: Django Stars

Now, CI isn’t some magic wand that automatically fixes your workflow, you’ll need a corresponding tool. So, let’s dive into details by asking practical questions: What CI tool do I need? How hard is it to learn? What will it cost me? Will it work with my infrastructure? Which tool is the right fit for me?

Today’s market is flooded with many continuous integration tools, so choosing the right one can be difficult. Even though some tools seem to be more popular than others, they won’t necessarily work for you. Take an easy route and read this article for a clear picture of the best-of-breed CI tools: their functionality and licensing compared. Then, you’ll see which tool meets your business needs best.

How to choose a continuous integration tool

To guide you through the many options on the way to making a choice, we suggest using the following criteria:

Hosting options. Software tools differ in their infrastructure management. Cloud-based tools are hosted on a provider’s side, require minimal configurations, and can be adjusted on demand, depending on your needs. There are also self-hosted solutions. The responsibility for deploying and maintaining them rests solely on your shoulders, or rather on your in-house DevOps team. While on-premises services benefit building process flexibility, hosted solutions spare the setup hardships offering greater scalability.

Integrations and software support. How well is a CI tool integrated with other software used in development? Integration examples may include project management software (e.g. Jira), incident filling tools (e.g. PagerDuty and Bugzilla), static analysis tools, legal compliance tools, etc. A CI tool must be flexible enough to support various types of build tools (Make, Shell Scripts, Ant, Maven, Gradle), and version control software or VCS (Subversion, Perforce, Git) etc.

Usability. Some tools can make a build process much easier than others given their clear-cut and straightforward GUI and UX. A well-designed interface can save your time at the onboarding stage.

Container support. Having a deployment plugin or configuration for container orchestration tools like Kubernetes and Docker makes it easy for a CI tool to connect to the application’s target environment.

Library of reusable code. It’s preferred when a solution has a varied public store of assorted plugins, usable build steps, which could be open-source or commercially available.

Now, we’ve selected the best performers in the CI market for further analysis: Jenkins, TeamCity, Bamboo, Travis CI, CircleCI, and CodeShip. To comprehend their popularity, we compared ratings from StackShareG2 Crowd, and Slant.co.

G2 Crowd Grid of the mid-market CI tools divided into leaders (CircleCI), high performers (Travis CI and Jenkins), niche (CodeShip, TeamCity, Bamboo, and Chef), and contenders represented by TFS

To get a brief overview of the CI tools, take a look at the following table of comparison. Read on for the detailed analysis of each tool.

Comparing the most popular CI tools by price, integrations, support, and main use cases

Comparing the most popular CI tools by price, integrations, support, and main use cases

Jenkins: the most widely adopted CI solution

Jenkins is an open-source project written in Java that runs on Windows, macOS, and other Unix-like operating systems. It’s free, community-supported, and might be your first-choice tool for continuous integration. Primarily deployed on-premises, Jenkins can run on cloud servers as well. Its integrations with Docker and Kubernetes take advantage of containers and perform even more frequent releases.

Main selling points

No expenses required. Jenkins is a free CI tool and that can save you money on the project.

Limitless integrations. Jenkins can integrate with almost any external program used for developing applications. It allows you to use container technology such as Docker and Kubernetes out-of-the-box. G2 Crowd reviewers claim: “No better tool exists for integrating your repositories and code bases with your deployment infrastructure.”

A rich library of plugins is available with Jenkins: Git, Gradle, Subversion, Slack, Jira, Redmine, Selenium, Pipeline, you name it. Jenkins plugins cover five areas: platforms, UI, administration, source code management, and, most frequently, build management. Although other CI tools provide similar features, they lack the comprehensive plugin integration that Jenkins has. Moreover, the Jenkins community encourages its users to extend the functionality with new features by providing teaching resources.

Active community. The Jenkins community provides a guided tour to introduce the basics and advanced tutorials for more sophisticated use of the tool. They also hold an annual conference DevOps World | Jenkins World.

Distribution of builds and test loads on multiple machines. Jenkins uses a Master-Slave architecture, where master is the main server that monitors slaves – remote machines used for distributing software builds and test loads.

Main weaknesses

Documentation isn’t always sufficient. For instance, it lacks info on pipelines creation. This adds time-consuming tasks to the list, as engineers have to work through them.

Poor UI. Its interface seems a bit outdated as it doesn’t follow modern design principles. The absence of whitespaces makes the views look crowded and confusing. A lot of the progress features and icons are super pixelated and don’t refresh automatically when jobs finish.

Jenkins dashboard

It takes manual effort to monitor the Jenkins server, and its slaves, to understand interdependencies among the plugins, and to upgrade them from time to time.

All in all, Jenkins serves best for big projects, where you need a lot of customizations that can be done by usage of various plugins. You may change almost everything here, still, this process may take a while. However, if you are planning the quickest start with the CI system, consider different options.

TeamCity: another major CI player

TeamCity by JetBrains is a reliable and high-quality CI server. Teams often choose TeamCity for a good number of authentication, deployment, and testing features out-of-the-box, plus Docker support. It’s cross-platform, supports all the recent versions of Windows, Linux, and macOS and works with Solaris, FreeBSD, IBM z/OS, and HP-UX. TeamCity works right after installation, no additional setup or customization necessary. It boasts a number of unique features such as detailed history reports, instant feedback on test failures, and reusing settings so you don’t have to duplicate your code.

Pricing models. TeamCity offers a free version with full access to all product features, but it’s limited to 100 build configurations and three build agents. Adding one more build agent and 10 build configurations currently costs $ 299. TeamCity offers a 60-day cloud trial that bypasses on-premises installation.

There’s also a paid enterprise edition. Its price varies depending on the number of agents included. TeamCity gives 50 percent off for startups and free licenses for open source projects.

Main selling points

.NET support. TeamCity integrates with .NET tooling better than any other CI tool out here. There are many important .NET tools included in TeamCity, such as code coverage analysis, several .NET testing frameworks, and static code analysis.

Extensive VSC support. TeamCity allows for creating projects from just a VCS repository URL. The tool supports all of the popular VCS: AccuRev, ClearCase, CVS, Git, Gnu bazaar, Mercurial, Perforce, Borland StarTeam, Subversion, Team Foundation Server, SourceGear Vault, Visual SourceSafe, and IBM Rational ClearCase. A number of external plugins are available to help with support for other version control systems.

Platforms and environments supported by TeamCity

Competent docs. The overall functionality can be easily understood by going through the provided user guide, which is thorough and extensive.

Ease of setup. TeamCity is easy to set up and ready to work right after installation. It provides a good set of out-of-the-box functions for building a project.

Built-in features. TeamCity provides a good set of out-of-the-box functions for building a project: detailed history reports for builds, failures, and any additional changes made, source control, and build chain tools. One of the best features of TeamCity is “Publish Artifacts” which allows for deploying the code or even building directly in any environment. It shows the progress of the build at every step and the number of tests remaining to pass before the build is complete. It also lets you rerun any failed tests right after overnight execution, so you don’t have to waste time on that the next morning.

Main weaknesses

Difficult learning curve. Although TeamCity is well-known for its visual-aesthetic UI, it still can be a bit complex and overwhelming for newcomers, while offering a wide range of configuration options. It may take developers some serious study before they are ready to use the tool in production.

Manual upgrading process. Moving from one major version to another is a long process that has to be done manually on your server.

Given its complexity and price, TeamCity will serve best for enterprise needs and self-supporting teams ready to build their own plugins when needed.

Bamboo: out-of-the-box integration with Atlassian products

Beyond assisting with integration, Bamboo has features for deployment and release management. While Bamboo has fewer out-of-the-box options, it integrates natively with other Atlassian products: Bitbucket, Jira, and Confluence. In fact, the same integration takes Jenkins a huge plugin scheme. Just like the previous two tools, Bamboo runs on Windows, Linux, Solaris, and macOS. For those who run Bamboo on Linux, they insist on creating a dedicated user to prevent any potential abuse.

Pricing models. Bamboo is free to try for 30 days. After that, its user tier options include unlimited local agents and 10 jobs and scale up to 1000 remote agents with the price ranging from 10$ to 126,500$ accordingly. There’s a 12-month maintenance period included. This period can be doubled or tripled for more money. Atlassian software is free for any open source project that meets their defined criteria.

Main selling points

Bitbucket Pipelines. After Atlassian discontinued Bamboo Cloud in 2016, the tool became available only on-premises. However, Atlassian produced Bitbucket Pipelines, a cloud alternative built into Bitbucket – a Git repository management solution. Pipelines can be fully integrated with Bamboo. As a custom configured system, Bitbucket Pipelines enables automatic building, testing, and deploying of the code, based on a configuration file in the repository. By utilizing the power of Docker, Bitbucket Pipelines is offering very efficient and fast builds. Eventually, the Bamboo server is still available for on-prem installation and can be hosted in a cloud-based container or VM.

Multiple notification methods. Bamboo Wallboard shows build results on a dedicated monitor and sends build results to your inbox or your dev chat room (e.g. HipChat, Google Talk).

Bamboo wallboard displays the status of all the branches and the plan that the branches belong to

Rich and simple integration. Bamboo supports most major technology stacks such as CodeDeply, Ducker, Maven, Git, SVN, Mercurial, Ant, AWS, Amazon S3 Buckets. In addition, it identifies the new branches in these technologies and automatically applies customization of triggers and variables. Bamboo’s per-environment permissions feature allows for deploying to their environments.

Documentation and support. Bamboo documentation is rich and detailed. Atlassian provides skilled support. However, community size is far from Jenkins’ user reach.

Main weaknesses

Poor plugin support. In contrast to Jenkins and TeamCity, Bamboo doesn’t support that many plugins. There are only 208 apps currently listed on the Atlassian repository.

Complicated first work experience. Some users complain that the setup process of the first deploy task isn’t quite intuitive and it takes time to understand all the different options and how to use them.

Bamboo will work well for people who look for out-of-the-box integration with Atlassian products.

Travis CI: a mature CI solution with simple GitHub integration

As one of the oldest CI solutions, Travis has won the trust of many users. It has a large and helpful community that welcomes new users and provides a great number of tutorials.

Travis CI can test on Linux and macOS. Meanwhile, its documentation warns that enabling multi-OS testing can lead to some tools or languages becoming unavailable, or test failures due to the peculiarities of each file system’s behavior.

Offering many automated CI options, Travis eliminates the need for a dedicated server, as it is hosted in the cloud. However, it also has an on-premises product for companies that want to keep using the same features of the CI tools topped with on-site security needs.

Pricing models. The first 100 builds are free. Otherwise, there are four pricing plans for hobby projects ($69/month), and for small, growing, and larger teams (from $129 to 489$ per month). They differ in the number of concurrent jobs that can be run. You can also contact Travis CI to get a customized plan.

Main selling points

Easy setup and configuration. Travis CI requires no installation – you can begin testing by simply signing up and adding a project. The software can be configured with a simple YAML file, which you place in the root directory of the development project. The user interface is very responsive, most users say that it’s convenient for monitoring builds.

Direct connectivity with GitHub. Travis CI works seamlessly with popular version control systems, GitHub in particular. Moreover, the tool is free for GitHub open source projects. The CI tool registers every push to GitHub and automatically builds the branch by default.

Travis CI branch build flow and pull request build flow with GitHub

Backup of the recent build. Whenever you run a new build, Travis CI clones your GitHub repository into a new virtual environment. This way you always have a backup.

Main weaknesses

No CD. Unlike other CI tools on the list, Travis CI doesn’t allow for continuous delivery.

GitHub-only hosting. Since Travis only offers support for GitHub-hosted projects, the teams that use GitLab or any other alternative, are forced to rely on another CI tool.

To sum up, Travis CI is the best solution for open-source projects that need testing in different environments. In addition, it’s the right tool for small projects, where the main goal is to start the integration as soon as possible.

CircleCI: an easy and useful CI tool for early-stage projects

CircleCI is a flexible CI tool that offers up to 16x parallelization. It intelligently notifies users providing only relevant information via email, HipChat, Campfire, and other channels. CircleCI provides easy setup and maintenance.

CircleCI is a cloud-based system that also offers an on-prem solution with security and configuration for running the tool in your private cloud or data center.

Pricing models. CircleCI has a free 2-week macOS trial that allows you to build on both Linux and macOS. The free Linux package includes one container. On adding more containers ($50/month each), you can also choose the level of parallelization (from one up to 16 concurrent jobs). MacOS plans range from $39/month for 2x concurrency to $249/month for 7x concurrency and email support.

If your enterprise has specialized needs, you can contact CircleCI directly to discuss a personal usage-based price plan. For open source projects, CircleCI allocates four free Linux containers, as well as the macOS Seed plan free at 1x concurrency.

Main selling points

Simple UI. CircleCI is recognized for its user-friendly interface for managing builds/jobs. Its single-page web app is clean and easy to understand.

CircleCI dashboard

High-quality customer support. StackShare’s community members highlight CircleCI’s speedy support: They respond to requests within 12 hours.

CircleCI runs all types of software tests including web, mobile, and container environments.

Caching of requirements installation and third-party dependencies. Instead of installing the environments, CircleCI can take data from multiple projects using the granular check-out-key options.

No need for manual debugging. CircleCI has debugging feature Debug via SSH, but Jenkins users have to debug manually by clicking on jobs.

Main weaknesses

Excessive automation. CircleCI changes environment without warning, which may be an issue. With Jenkins, it will run changes only when the user instructs.

No caching of Docker images. In Jenkins, for instance, we can cache Docker images using a private server; with CircleCI that is not possible.

No testing in Windows OS. CircleCI already supports building most applications that run on Linux, not to mention iOS and Android. However, the CI tool doesn’t yet allow for building and testing in a Windows environment.

All in all, if you need something built fast and money is not the issue, CircleCI with its parallelization features is your go-to CI tool.

CodeShip: a cloud-based CI tool with fast builds

CodeShip by CloudBees gives complete control over customizing and optimizing CI and CD workflow. This CI tool helps manage the team and streamline the projects. By means of parallel pipelines, concurrent builds, and caching, CodeShip allows for setting up powerful deployment pipelines that enable you to deploy with ease multiple times per day.

CodeShip dashboard

Pricing models. CodeShip is available in two versions: Basic and Pro. The Basic version offers pre-configured CI service with a simple web interface, but without Docker support. CodeShip Basic comes in several paid packagesfrom $49 a month to $399: The more parallelization power the package has, the higher the price tag. The Pro version supports Docker and is more flexible. You can pick your instance type and parallelization up to 20x.

There’s also a free plan limited to 100 builds/month, one concurrent build, and one parallel test pipeline. Educational projects and non-profits get 50 percent off. Open source projects are always free.

Main selling points

Centralized team management. CodeShip allows for setting up teams and permissions for your organizations and team members. Currently, they’re offering the following roles: owners, managers, PMs, and contributors.

CodeShip role permissions

Parallel tests. CodeShip has a special Parallel CI feature for running tests in parallel.

Simplified workflow. Intuitive UI for effortless configuration and file management get the workflow going quickly. Almost all G2 Crowd reviews point out that CodeShip setup is very fast and easy to get started. They say that in CodeShip all things are much easier and the deploying process much simpler than in Jenkins.

Code shipping with one button. CodeShip makes it easy to continually push out code to all the environments in one click.

Native Docker support. CodeShip is highly customizable with native support for Docker instances.

Main weaknesses

Lack of documentation. Based on reviews, there’s a lack of documentation, especially in comparison with Jenkins, and it should be extended.

Narrow OS support. CodeShip runs on Windows, macOS, and web-based devices, while the majority of the CI tools support Linux, Android, and iOS as well.

Due to its flexible plans, CodeShip can work for small teams and enterprises alike. Still, many prefer CodeShip for its reliability: “CodeShip allows us to have peace of mind with every push, to ensure that all tests and any other necessary preprocessing are completed every time,” says a review at G2 Crowd.

Final recommendations on choosing a CI tool

How to select from among the many CI solutions the one that will not only meet your current integration requirements but also evolve along with your product’s future roadmap? Going through the following checklist will make your choice easier:

Listen to your team’s needs. It’s important that the team can quickly learn how to use the chosen tool and start developing the product using its benefits. Depending on your team’s expertise level and the programming software they already work with, the range of CI tools can be narrowed down.

Keep in mind the functionality you need. Jenkins is a great solution for continuous integration, but it’s not that pointful if you already have a CI system and look for a CD tool. Smaller tools like Spinnaker are great for testing and delivery, but not meant for integration. These examples show that each tool is good at some particular functions. So first you have to figure out your primary business goal and what functionality can satisfy it.

Know your company’s data storage rules. Although it’s very convenient to leverage hosted services and tools, sometimes it may not be reasonable to hand infrastructure management to a third party. Legal and statutory requirements could turn out to be a deciding criterion if your company requires strict control over processes severely regulating access to data. In this case, your only option is to store data locally on an on-premises server.

Make sure the solution fits your budget. Consider your project’s size and go for an option that satisfies its needs at a reasonable price. For example, startups can make better use of the cloud-based CI solution, as it includes the basic essential features for a small number of users with minimal or no fees.

Open-source CI tools are mostly community-driven with plugins and support available via online tutorials, blogs, chats, and forums. However, if there’s a need for support for pipeline maintenance and no budget constraints, pick a proprietary option. Alternatively, you can stick with an open-source tool, as long as there are organizations offering commercial support for it.

Test the workflow with different CI tools. It’s very rare that a single CI tool will suffice for all scenarios. Even the prominent CI tools covered in our article can hardly meet 80 percent of the automation requirements since projects range from monolithic, monster software systems to microservices-based architectures.

A good strategy is to use multiple CI tools for different needs instead of struggling to fit all in one tool. This approach will also contribute to business continuity, securing projects if a CI tool is discontinued or its support turns out to be insufficient.

Posted in Information Technology

Openstack vs Openshift

A question I am often asked when I talk about OpenShift and OpenShift Origin is “How does OpenShift compete with OpenStack?” Or sometimes, the better question “How does OpenShift relate to OpenStack?”

Both OpenStack and OpenShift Origin are open source projects, and both provide cloud computing foundations. However, they do not compete with each other.

OpenStack provides “Infrastructure-as-a-Service”, or “IaaS”. It provides bootable virtual machines, networking, block storage, object storage, and so forth. Some IaaS service providers based on OpenStack are HP Cloud and Rackspace Cloud. Red Hat is fully engaged with the larger OpenStack user and developer community. Our developers are the source of a significant amount of code to implement features and fix bugs in the OpenStack projects. In addition, we will be on the OpenStack Foundation board, as described in this FAQ and announcement.

The OpenShift hosted service provides “Platform-as-a-Service” or “PaaS”. It provides the necessary parts to quickly deploy and run a LAMP application: the web server, application server, application runtimes and libraries, database service, and so forth.

OpenShift Origin is the open source project of the software that enables the OpenShift hosted service. Using OpenShift Origin, you can build your own PaaS.

A PaaS typically runs on top of an IaaS provider. For example, both the OpenShift hosted service and the Heroku hosted service run on top of Amazon’s AWS IaaS service.

OpenShift Origin can run on top of OpenStack. For example, our Krishna Raman has written a blog post: OpenShift Origin on OpenStack which describes step by step how to set it up.

In addition to OpenShift Origin being able to run on top of OpenStack, it can run on top of AWS, it can run on top of KVM or VMware in your own data center, it can run on top of Virtual Box in your personal laptop, and it can even run on top of “bare metal” unvirtualized Linux hosts.

So, when I am asked “How does OpenShift relate to OpenStack?”, I answer “OpenShift Origin can run on top of OpenStack. They are complementary projects that work well together. OpenShift Origin is not presently part of OpenStack, and does not compete with OpenStack. If you stand up your own OpenStack system, you can make it even more useful by installing OpenShift Origin on top of it.”

Posted in Information Technology

Puppet vs Chef vs Ansible vs SaltStack

https://www.intigua.com/blog/puppet-vs.-chef-vs.-ansible-vs.-saltstack

 Puppet, Chef, Ansible and SaltStack present different paths to achieve a common goal of managing large-scale server infrastructure efficiently, with minimal input from developers and sysadmins. All four configuration management tools are designed to reduce the complexity of configuring distributed infrastructure resources, enabling speed, and ensuring reliability and compliance. This article explores the mechanism, value propositions and concerns pertaining to each configuration management solution.

If you use any of these tools (or other config management tools such as MS SCCM, Tivoli Provisioning Manager or BladeLogic), you’ll want to see how Intigua fills a big gap they have: managing server tool agents. Learn more.

Puppet


img30.pngPuppet is a pioneering configuration automation and deployment orchestration solution for distributed apps and infrastructure. The product was originally developed by Luke Kanies to automate tasks for sysadmins who would spend ages configuring, provisioning, troubleshooting and maintaining server operations.


This open source configuration management solution is built with Ruby and offers custom Domain Specific Language (DSL) and Embedded Ruby (ERB) templates to create custom Puppet language files, and offers a declarative paradigm programming approach. Puppet uses an agent/master architecture—Agents manage nodes and request relevant info from masters that control configuration info. The agent polls status reports and queries regarding its associated server machine from the master Puppet server, which then communicates its response and required commands using the XML-RPC protocol over HTTPS. This resource describes the architecture in detail. Users can also set up a master-less and de-centralized Puppet setup, as described here.


The Puppet Enterprise product offers the following capabilities:


  • Orchestration
  • Automated provisioning
  • Configuration automation
  • Visualization and reporting
  • Code management
  • Node management
  • Role-based access control


Pros:


  • Strong compliance automation and reporting tools.
  • Active community support around development tools and cookbooks.
  • Intuitive web UI to take care of many tasks, including reporting and real-time node management.
  • Robust, native capability to work with shell-level constructs.
  • Initial setup is smooth and supports a variety of OSs.
  • Particularly useful, stable and mature solution for large enterprises with adequate DevOps skill resources to manage a heterogeneous infrastructure.


Cons:

  • Can be difficult for new users who must learn Puppet DSL or Ruby, as advanced tasks usually require input from CLI.
  • Installation process lacks adequate error reporting capabilities.
  • Not the best solution available to scale deployments. The DSL code can grow large and complicated at higher scale.
  • Using multiple masters complicates the management process. Remote execution can become challenging.
  • Support is more focused toward Puppet DSL over pure Ruby versions.
  • Lacks push system, so no immediate action on changes. The pull process follows a specified schedule for tasks.


Pricing


Puppet Enterprise is free for up to 10 nodes. Standard pricing starts at $120 per node. (Get more info here.)

Chef


img29.pngChef started off as an internal end-to-end server deployment tool for OpsCode before it was released as an open source solution. Chef also uses a client-server architecture and offers configuration in a Ruby DSL using the imperative programming paradigm. Its flexible cloud infrastructure automation framework allows users to install apps to bare metal VMs and cloud containers. Its architecture is fairly similar to the Puppet master-agent model and uses a pull-based approach, except that an additional logical Chef workstation is required to control configurations from the master to agents. Agents poll the information from master servers that respond via SSH. Several SaaS and hybrid delivery models are available to handle analytics and reporting.


Chef products offer the following capabilities:


  • Infrastructure automation
  • Cloud automation
  • Automation for DevOps workflow
  • Compliance and security management
  • Automated workflow for Continuous Delivery


Pros:


  • One of the most flexible solutions for OS and middleware management.
  • Designed for programmers.
  • Strong documentation, support and contributions from an active community.
  • Very stable, reliable and mature, especially for large-scale deployments in both public and private environments.
  • Chef offers hybrid and SaaS solutions for Chef server, analytics and reporting.
  • Sequential execution order.

Cons:


  • Requires a steep learning curve.
  • Initial setup is complicated.
  • Lacks push, so no immediate action on changes. The pull process follows a specified schedule.
  • Documentation is spread out, and it can become difficult to review and follow.


Pricing


A free solution is available to get you started. Pricing starts at $72 per node for the standard Hosted Chef, and is $137 per node for the top-of-the-range Chef Automate version. (Get more info here.)

Ansible


ansible-logo.pngAs a latest entrant in the market compared with Puppet, Chef and Salt, Ansible was developed to simplify complex orchestration and configuration management tasks. The platform is written in Python and allows users to script commands in YAML as an imperative programming paradigm. Ansible offers multiple push models to send command modules to nodes via SSH that are executed sequentially. Ansible doesn’t require agents on every system, and modules can reside on any server. A centralized Ansible workstation is commonly used to tunnel commands through multiple Bastion host servers and access machines in a private network.


Ansible products offer the following capabilities:


  • Streamlined provisioning
  • Configuration management
  • App deployment
  • Automated workflow for Continuous Delivery
  • Security and Compliance policy integration into automated processes
  • Simplified orchestration


Pros:

  • Easy remote execution, and low barrier to entry.
  • Suitable for environments designed to scale rapidly.
  • Shares facts between multiple servers, so they can query each other.
  • Powerful orchestration engine. Strong focus on areas where others lack, such as zero- downtime rolling updates to multi-tier applications across the cloud.
  • Easy installation and initial setup.
  • Syntax and workflow is fairly easy to learn for new users.
  • Sequential execution order.
  • Supports both push and pull models.
  • Lack of master eliminates failure points and performance issues. Agent-less deployment and communication is faster than the master-agent model.
  • High security with SSH.


Cons:

  • Increased focus on orchestration over configuration management.
  • SSH communication slows down in scaled environments.
  • Requires root SSH access and Python interpreter installed on machines, although agents are not required.
  • The syntax across scripting components such as playbooks and templates can vary.
  • Underdeveloped GUI with limited features.
  • The platform is new and not entirely mature as compared to Puppet and Chef.


Pricing


The Self-Support offering starts at $5,000 per year, and the Premium version goes for $14,000 per year for 100 nodes each. (Get more info here.)

SaltStack


img39.pngSalt was designed to enable low-latency and high-speed communication for data collection and remote execution in sysadmin environments. The platform is written in Python and uses the push model for executing commands via SSH protocol. Salt allows parallel execution of multiple commands encrypted via AES and offers both vertical and horizontal scaling. A single master can manage multiple masters, and the peer interface allows users to control multiple agents (minions) directly from an agent. In addition to the usual queries from minions, downstream events can also trigger actions from the master. The platform supports both master-agent and de-centralized, non-master models. Like Ansible, users can script using YAML templates based on imperative paradigm programming. The built-in remote execution system executes tasks sequentially.


SaltStack capabilities and use cases include:


  • Orchestration and automation for CloudOps
  • Automation for ITOps
  • Continuous code integration and deployment
  • Application monitoring and auto-healing
  • DevOps toolchain workflow automation with support for Puppet, Chef, Docker, Jenkins, Git, etc.
  • … And several other use cases.


Pros:

  • Effective for high scalability and resilient environments.
  • Easy and straightforward usage past the initial installation and setup.
  • Strong introspection.
  • Active community and support.
  • Feature-rich and consistent YAML syntax across all scripting tasks. Python offers a low learning curve for developers.


Cons:


  • Installation process may not be smooth for new users.
  • Documentation is not well managed, and is challenging to review.
  • Web UI offers limited capabilities and features.
  • Not the best option for OSs other than Linux.
  • The platform is new and not entirely mature as compared to Puppet and Chef.


Pricing:


Contact SaltStack for pricing.

Conclusion

Each platform is aimed at a different user segment within the same target market. DevOps teams investing in configuration management solutions must consider unique requirements around their workflows to maximize ROI and productivity. To select the right configuration management solution that fits your organization, consider the architecture and operation model, features, and usability and support, among other key technical and business aspects.

Posted in Information Technology

Comparing 11 IoT Development Platforms

https://dzone.com/articles/iot-software-platform-comparison

1. Abstract

This article presents a general survey of the current IoT software platform landscape based on a detailed analysis we conducted on IoT vendors. We first create a list of key features which are important for any IoT software platform. Next, we compare the extent to which those key features have been implemented in the current IoT software platforms. Finally, we list down the desired features of an IoT software platform based on our observations.

2. Introduction

The Internet of Things (IoT) has undergone rapid transformation since the term was first coined in 1999 by Kevin Ashton. Since the variety – and the number – of devices connected to the Internet has increased exponentially in recent years, IoT has become a mainstream technology with a significant potential for advancing the lifestyle of modern societies.

In terms of the technology and engineering aspects of IoT, there currently exists a clear separation between the hardware and software platforms, with the majority of vendors focused on the hardware. Few vendors in the industry currently offer IoT software platforms: for example, out of the top 100 IoT startups ranked by Mattermark (based on the total funding they received), only about 13 startups provide IoT software platforms [5].

The aim of this article is to make a general survey of the current IoT software platform landscape based on a detailed analysis we conducted on IoT vendors. Shortlisting the IoT vendors for this article was based purely on the criteria whether the vendors provide software solutions that allow for processing information from IoT devices/sensors. Note that while we try to be as comprehensive as possible, the article may not reflect some of the latest improvements made to the listed IoT software platforms.

3. Important Features Expected from an IoT Software Platform

Based on several recent surveys, [2][7], we’ve selected the following features as being crucial for an IoT software platform: device management, integration, security, protocols for data collection, types of analytics, and support for visualizations as example features for comparison. In the next half of this article we give a brief introduction to these characteristics.

3.1 Device Management and Integration Support

Device management is one of the most important features expected from any IoT software platform. The IoT platform should maintain a list of devices connected to it and track their operation status; it should be able to handle configuration, firmware (or any other software) updates and provide device level error reporting and error handling [2]. At the end of the day, users of the devices should be able to get individual device level statistics.

Support for integration is another important feature expected from an IoT software platform. The API should provide access to the important operations and data that needs to be exposed from the IoT platform. It’s common to use REST APIs to achieve this aim.

3.2 Information Security

The information security measures required to operate an IoT software platform are much higher than general software applications and services. Millions of devices being connected with an IoT platform means we need to anticipate a proportional number of vulnerabilities [3]. Generally, the network connection between the IoT devices and the IoT software platform would need to be encrypted with a strong encryption mechanism to avoid potential eavesdropping.

However, most of the low-cost, low-powered devices involved in modern IoT software platforms cannot support such advanced access control measures [3]. Therefore the IoT software platform itself needs to implement alternative measures to handle such device level issues. For example, separation of IoT traffic into private networks, strong information security at the cloud application level [3], requiring regular password updates and supporting updateable firmware by way of authentication, signed software updates [4], and so on can be followed to enhance the level of security present in an IoT software platform.

3.3 Data Collection Protocols

Another important aspect which needs attention is the types of protocols used for data communication between the components of an IoT software platform. An IoT platform may need to be scaled to millions or even billions of devices (nodes). Lightweight communication protocols should be used to enable low energy use as well as low network bandwidth functionality.

Note that while (in this article) we use protocols as a blanket term, the protocols  used for data collection can be categorized under several categories – such as application, payload container, messaging, and legacy protocols [2].

3.4 Data Analytics

The data collected from the sensors connected to an IoT platform  needs to be analysed in an intelligent manner in order to obtain meaningful insights.

There are four main types of analytics which can be conducted on IoT data: real-time, batch, predictive, and interactive analytics [6]. Real-time analytics conduct online (on-the-fly) analysis of the streaming data. Example operations include window based aggregations, filtering, transformation and so on.

Batch analytics runs operations on an accumulated set of data. Thus, batch operations run at scheduled time periods and may last for several hours or days. Predictive analytics is focused on making predictions based on various statistical and machine learning techniques. Interactive analytics runs multiple exploratory analysis on both streaming and batch data. The last is real-time analytics, which weighs heavier on any IoT software platform.

4. Current IoT Software Platforms

A careful investigation into the current IoT software platform landscape reveals that each of the above mentioned features have been implemented — to different extents. We’ve listed the relevant platforms below, with a summarized feature comparison:

IoT Software Platform Device management? Integration Security Protocols for data collection Types of analytics Support for visualizations?
2lemetry – IoT Analytics Platform** Yes Salesforce, Heroku, ThingWorx APIs Link Encryption (SSL), Standards ( ISO 27001, SAS70 Type II audit) MQTT, CoAP,
STOMP,M3DA
Real-time analytics (Apache Storm) No
Appcelerator No REST API Link Encryption (SSL, IPsec, AES-256) MQTT, HTTP Real-time analytics (Titanium [1]) Yes (Titanium UI Dashboard)
AWS IoT platform Yes REST API Link Encryption (TLS), Authentication (SigV4, X.509) MQTT, HTTP1.1 Real-time analytics (Rules Engine, Amazon Kinesis, AWS Lambda) Yes (AWS IoT Dashboard)
Bosch IoT Suite – MDM IoT Platform Yes REST API *Unknown MQTT, CoAP, AMQP,STOMP *Unknown Yes (User Interface Integrator)
Ericsson Device Connection Platform (DCP) – MDM IoT Platform Yes REST API Link Encryption (SSL/TSL),Authentication (SIM based) CoAP *Unknown No
EVRYTHNG – IoT Smart Products Platform No REST API Link Encryption (SSL) MQTT,CoAP,
WebSockets
Real-time analytics (Rules Engine) Yes (EVRYTHNG IoT Dashboard)
IBM IoT Foundation Device Cloud Yes REST and Real-time APIs Link Encryption ( TLS), Authentication (IBM Cloud SSO), Identity management (LDAP) MQTT, HTTPS Real-time analytics (IBM IoT Real-Time Insights) Yes (Web portal)
ParStream – IoT Analytics Platform*** No R, UDX API *Unknown MQTT Real-time analytics, Batch analytics (ParStream DB) Yes (ParStream Management Console)
PLAT.ONE – end-to-end IoT and M2M application platform Yes REST API Link Encryption (SSL), Identity Management (LDAP) MQTT, SNMP *Unknown Yes (Management Console for application enablement, data management, and device management)
ThingWorx – MDM IoT Platform Yes REST API Standards (ISO 27001), Identity Management (LDAP) MQTT, AMQP, XMPP, CoAP, DDS, WebSockets Predictive analytics(ThingWorx Machine Learning), Real-time analytics (ParStream DB) Yes (ThingWorx SQUEAL)
Xively- PaaS enterprise IoT platform No REST API Link Encryption (SSL/TSL) HTTP, HTTPS, Sockets/ Websocket, MQTT *Unknown Yes (Management console)

* The cells marked with Unknown indicates that the relevant information could not be found from the available documentation.

** 2lemetry has been acquired by AWS IoT

***ParStream has been acquired by Cisco

It’s clear from the IoT startups listed above that not many have fully fledged device management capabilities. This is a significant void which needs to be addressed by the IoT software platform vendors.

Furthermore, there’s relatively little support for analyzing the generated IoT data in terms of both computation and visualization. Most of them support real-time analytics – a must-have feature in any IoT framework. However, only few IoT software platforms provide support for other three types of analytics. In terms of the visual interfaces, most of them are focused on the simple patterns of a web portal. These dashboards allow for management of IoT ecosystems, but very few provide the capabilities of visual data analytics.

A few more features commonly observed across different IoT software platforms include REST API based integration, support for MQTT protocol as a means of data collection, and link encryption using SSL. While not mentioned in the Table 1, only ParStream has reported a 3-4 million
rows/second throughput in its documentation. This indicates that most of the IoT software platforms are designed without much consideration for the system performance aspects of an IoT deployment — critical in the real world operation.

5. Features to improve on

It’s clear that there exist several venues in which improvements are needed. In this section we first provide a list of features for improvement. Some of these items have already been implemented by IoT software platform vendors; there are several new features that are not. Next, we provide a list of such new features which have not been addressed by any IoT software platform vendor currently.

5.1 Existing Features

Data Analytics

Most of the current IoT software platforms support real time analytics, but batch and interactive data analytics may be just as important.

One may argue on this point saying that such types of analytics are available in other well known data processing platforms, and that it is simply a matter of configuring such software systems for the analysis scenario. However, that’s easier said than done; famous data processing systems for real-time (Storm, Samza, etc.), batch (Hadoop, Spark, etc.), predictive (Spark MLLIB, etc.), and interactive (Apache Drill and so on) cannot be applied directly as they are to IoT use cases.

Benchmarks

The IoT software platforms need to be scalable and should encompass facilities to characterize and evaluate the system performance. Well defined performance metrics need to be devised to model and measure the performance of IoT systems, taking into account  network characteristics, energy consumption characteristics, system throughput, computational resource consumption, and other operational characteristics.

Edge Analytics

Measures need to be taken to reduce the huge network bandwidth consumption possible between the sensor devices and the IoT server. Use of lightweight communication protocols is one solution. The other approach is edge analytics, which can reduce the amount of raw data transmitted to the IoT server. Edge analytics could be implemented even in simple hardware embedded systems, such as an Arduino.

Other Issues

It should be noted that there are multiple other ethical, moral, and legal concerns associated with IoT software platforms which we have not covered in this article. While important, addressing such issues is out of the scope of this article.

5.2 Features to add

Handling out-of-order processing

An out-of-order event arrival is possible in any IoT application; a disorder of tuples within an event stream emitted by an IoT sensor may be caused by network latency, clock drift, and more. Order-less IoT event processing may result in a system failure. Handling the disorder consists of a trade-off between result accuracy and result latency.

There are four main techniques of disorder handling: Buffer-based, Punctuation-based, Speculation-based, and Approximation-based techniques. IoT solutions should implement one or more of these in order to handle out-of-order events.

Support for IoT context

Context is primarily made out of the location of an individual, their stated preferences, or their past behaviors. For example, in the case of a mobile phone, we have access to rich context information because of various different types of sensors present in the current mobile phones. IoT analytics should be able take these contextual data into consideration.

6. Conclusion

The rapid growth of the IoT paradigm needs powerful IoT software platforms which addresses the needs presented by the IoT use cases. In this article we have investigated the features of the current state-of-the-art IoT software platforms. The investigation focused on aspects such as device management, integration, security, protocols for data collection, types of analytics, support for visualizations. From this study it was clear that areas such as device management, IoT data analytics, and IoT software system scalability and performance characteristics need special attention from IoT software platform community.

References

[1]    Appcelerator, Inc. (2015), Appcelerator Open Sourcehttp://www.appcelerator.org/

[2]    Gazis, V.; Gortz, M.; Huber, M.; Leonardi, A.; Mathioudakis, K.; Wiesmaier, A.; Zeiger, F.;
Vasilomanolakis, E. (2015), A survey of technologies for the internet of things, in Wireless
Communications and Mobile Computing Conference (IWCMC), 2015 International , vol., no.,
pp.1090-1095, 24-28 Aug. 2015

[3]    Jasper (2014), Achieving End-to-End Security in the Internet of Things,
http://pages.jasper.com/White-Paper-Cellular-IoT-Security_Cellular-IoT-Security.html

[4]    LogMeIn (2015), A Guide To Designing Resilient Products for the Internet of Things,
LogMeIn

[5]    Louis Columbus (2015), Mattermark Lists The Top 100 Internet Of Things Startups For 
2015http://www.forbes.com/sites/louiscolumbus/2015/10/25/the-top-100-internet-of-
things-startups-of-2015/

[6]    Perera, S. (2015), IoT Analytics: Using Big Data to Architect IoT Solutions, WSO2 White
Paper, http://wso2.com/whitepapers/iot-analytics-using-big-data-to-architect-iot-solutions/

[7]    Progress (2015), State of IoThttps://www.progress.com/docs/default-source/default-
document-library/progress/documents/papers/iot_surveyreport.pdf 

[8]    WSO2, Inc. (2015), WSO2 Unveils Open Source WSO2 Data Analytics Server 3.0, Delivering
Comprehensive Analysis Optimized for The Internet of Things ,
http://wso2.com/about/news/wso2-unveils-open-source-wso2-data-analytics-server-3.0-
delivering-comprehensive-analysis-optimized-for-iot/

[9]    WSO2, Inc. (2015), Open Platform for Internet of Things,
          http://wso2.com/landing/internet-of-things/

[10]    Wijewantha, D.(2014), Demonstration on Architecture of Internet of Things – An Analysis,
WSO2 Library Article
http://wso2.com/library/articles/2014/09/demonstration-on-
architecture-of-internet-of-things-an-analysis/

Disclaimer

Note that the content of this article is up-to-date as of 23rd December, 2015. The article will not be updated to reflect any changes made to the IoT software platforms since then. If you need any clarifications or need to make any changes to the article’s content please get in touch with the author via miyurud at wso2 dot com.