How to view boscoin transactions and wallet balance?

Boscoin the SELF-EVOLVING CRYPTOCURRENCY PLATFORM has recently launched tokennet to let boscoin investors send and receive boscoin. The boscoins are still being distributed by boscoin team. Now you…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Deploying Dagster to AWS

Dagster logo

Dagster is a data orchestration tool that aims to ease the flow of data for machine learning, analytic and ETL tasks. To deploy Dagster, two services need to run: a web interface called Dagit and a background process called Dagster Daemon. Dagit is a GUI that acts as the front end of Dagster and displays information about pipelines, their runs and other Dagster services. Dagit can also be used to launch pipeline runs and check their results, as well as all the in-between steps. On the other hand, Dagster daemon is used to run schedulers, sensors, and manage run queues.

General architecture for Dagster.

Dagster introduces a new fundamental layer of abstraction into the data application development process and aims to solve the problems expressed above. Let’s look in the next section how this layer is built with the components Dagster propose.

The most basic Dagster object is a solid. Everything is built by combining solids together. A solid takes an input, performs an action and outputs the result. Hence, they are the functional unit of work in Dagster. One or more solids connected together can then form a pipeline, which is another name for directed acyclic graph or DAG. In the following image, a 4-solids pipeline is shown.

A 4-solids pipeline.

Multiple pipelines can be grouped into a Dagster repository. The concept of a repository allows Dagster tools to target multiple data pipelines at the same time. Finally, scheduler and sensor are key concepts in Dagster as they allow to launch pipelines at fixed intervals or when a new pipeline run is required due to an external event.

Now that Dagster’s core concepts are laid out, let’s try to illustrate these with a simple pipeline. First, Dagster modules need to be installed in the environment:

For this example pipeline, the following dataset as a csv file will be used:

Then, the average age and number of team leaders is computed using the two following solids:

Finally, a solid, which displays the obtained results, is added and all solids are put together to create a pipeline which is added to a repository:

To run this example, run the command:

Finally, one can head over to the Dagit web hosted UI where a visual representation of the pipeline is shown. In the playground tab, it is possible to run the pipeline and see the results. If everything went well, something similar to the next image should be visible:

This simple example pipeline ran on a local machine, but what if the pipelines need to leverage the power of the cloud to run? The next sections explain how to easily deploy Dagster to AWS using Infrastructure-as-Code.

The increasing popularity of public cloud platforms (e.g. AWS, Azure, GCP) to develop, use and maintain IT infrastructures went hand in hand with an increased use of Infrastructure-as-Code (IaC). IaC is the concept of using a programming-based approach to provision and manage IT infrastructure with cloud platform services.

Using IaC has multiple benefits, main one being the ability to incorporate it with continuous integration/continuous delivery (CI/CD) pipelines. This way, the infrastructure can be automatically deployed after conducting thorough tests.

Furthermore, codifying the infrastructure allows for an increased reproducibility. Using IaC makes it possible to create multiple cloud environments (staging, testing,..) with the same configuration. In this project, Terraform was used as an IaC tool to deploy the IT infrastructure for the Dagster module in AWS.

Terraform is an open-source infrastructure-as-code tool helping with the deployment and management of hundreds of cloud services. Terraform allows for creating resources in a concise manner using resource blocks which means that instead of writing ‘how it should be done’, terraform code should express ‘what should be done’. The following example illustrates the deployment of an S3 bucket in an AWS environment.

In order to create the terraform Dagster module, an appropriate cloud architecture should be chosen. This section will discuss the Dagster module’s cloud architecture and its different components in AWS.

The average cost of this Dagster environment in AWS (minimal setup) is illustrated below:

Estimated minimal cost of the Dagster deployment on AWS.

This module was thoroughly tested with automated tests to check if no errors occurred during the deployment of the defined AWS architecture. Furthermore, additional tests were written to make sure all services were healthy and working as instructed These functionality tests consisted of the following checks:

For now only the syncing pipeline is present and the workspace configuration file will look as follows:

The dagster.yaml file will hold the additional, default configuration information, necessary for a successful Dagster deployment. These configuration files can be used together with the terraform module, to create the AWS infrastructure, as defined earlier.

Dagit will be running continuously and will be accessible via the internet, allowing users to see and manage all their data pipelines/schedules/sensors/… Since managing data pipelines implies creating new, additional pipelines or adjusting existing ones, the module should support this. Newly created/updated pipelines should be added to the S3 storage bucket. Furthermore, the workspace.yaml configuration file, in S3, should be updated such that the running Dagit web server knows where to look for the newly created pipeline. The updated configuration file will now look as follows:

Finally, the user should use the Dagit web server to launch a syncing pipeline run. This pipeline has one sole purpose, which is to sync files from the S3 storage bucket with the shared volume, used by both Dagster containers. Hence, by using this pipeline, the updated files will be present in the shared volume and accessible by both Dagster containers. As a result, the newly created data pipeline will show up in the Dagit UI web server. This enables users to manage multiple data pipelines, create new pipelines or adjust/change existing ones.

This article presented a short introduction to Dagster, using an example data pipeline. Next to that, the concepts of infrastructure-as-code and Terraform were introduced. The following section illustrated and discussed the different components of the proposed AWS architecture, needed for a successful Dagster deployment. Finally, instructions on how to manage data pipelines using the AWS Dagster module were given.

We thank you for reading this article and hope it helped you to understand the deployment of Dagster to the cloud. Similar to this module, we also created a Terraform module for deploying Dagster to Azure. More information on both modules can be found in one of the following places:

Add a comment

Related posts:

Home Security While Traveling

Going on vacation soon? Summer is the prime time to get out and treat yourself to a holiday, but July and August are also when the majority of residential break-ins happen. So you might be wondering…

Why You Should Watch What You Are Saying To Others

When I was 14 years old, my teacher encourages me to be an MC for a graduation ceremony. It was one of the most festive events held by my school because it wasn’t only for graduation but also for art…

Concerning the Spiritual in Art

Before kids are expected to make their scribbles into figures–before they can even draw a stick figure, they are experimenting with just two things: color and form. It can take an entire lifetime to…