DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Securing the Cloud: Navigating the Frontier of Cloud Security
  • Why Continuous Monitoring of AWS Logs Is Critical To Secure Customer and Business-Specific Data
  • How To Reduce the Impact of a Cloud Outage
  • Simplify Database Geo-Redundancy Backup With Cloud Storage Services

Trending

  • Distributed Caching: Enhancing Performance in Modern Applications
  • ChatGPT Code Smell [Comic]
  • Securing Cloud Storage Access: Approach to Limiting Document Access Attempts
  • Secure Your API With JWT: Kong OpenID Connect
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Implementation of DataOps With Databrew

Implementation of DataOps With Databrew

AWS Glue DataBrew is a new visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data.

By 
Ankur Srivastava user avatar
Ankur Srivastava
·
Jan. 03, 24 · Tutorial
Like (3)
Save
Tweet
Share
2.2K Views

Join the DZone community and get the full member experience.

Join For Free

Organizations nowadays are looking beyond mere economics and leveraging cloud-native capabilities to instill stability, scalability, accuracy, and speed in their applications. 

Organizations are contemplating the best strategy for modernizing their legacy application and creating an advanced and automated DataOps platform. 

One of the biggest challenges the client is facing is the time consumed in data preparation, validation, and accuracy, which in turn adds to the unexpected increase in cost, lowers the quality of data, and decreases the precision percentage.

Taking a step forward to address these challenges and deliver cloud transformation solutions, we are leveraging DataOps with Databrew.

What Is DataBrew?

AWS Glue DataBrew is a new visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning. 

DataBrew helps reduce the time it takes to prepare data for analytics and machine learning (ML) by up to 80 percent compared to custom-developed data preparation. We can choose from over 250 ready-made transformations to automate data preparation tasks, such as filtering anomalies, converting data to standard formats, correcting invalid values, etc.

After your data is ready, you can immediately use it for analytics and machine learning projects. You only pay for what you use — no upfront commitment.

The following are its capabilities:

Data Profiling

Evaluate the quality of your data by profiling it to understand data patterns and detect anomalies.

Data Lineage

Visually map the lineage of your data to understand the various data sources and transformation steps that the data has been through.

Data Cleaning Automation

Automate data cleaning and normalization tasks by applying saved transformations.

Solution Overview

The solution proposed is using the Data Brew service with the DataOps implementation model. The heart of the solution is using the Data Brew in the orchestration framework guided by AWS Step Function.

Solution Overview

Data Flow

  1. The document upload engine uploads the respective files/documents from the source system repositories to the designated S3 bucket. 
  2. The event-based architecture triggers the event as an S3 pushes to call the respective Lambda function to start the orchestration process.
  3. The Step function will orchestrate in a series or parallel fashion to initiate the step-by-step process. Cloudwatch will be used for monitoring and alerting mechanisms. CloudTrail will be used for audit logging.
  4. DataBrew will run the recipe or defined set of transformations on the data files received from the source. Refined and processed data will then be stored in another S3 bucket.
  5. Processed data will again be transformed using DataBrew for storage into Redshift for reporting use after all data checks have been performed.
  6. Quicksight will connect to the Redshift database to provide self-service reports/dashboards to the end customers.

The Various Capabilities of the Solution

  • Ability to effectively perform data preparation activities. 
  • Reusability of the effective recipes created in DataBrew.
  • A large number of available ready-made transformations will save us time in data preparation.
  • Cost-effective solution based on serverless architecture.
  • DataOps-driven automated framework will provide the fully integrated skeleton for reuse.
  • Efficiency and effectiveness in data preparation.
  • Ability to handle and redirect PII data.
  • Event-based trigger for pipeline processing.
  • Integrated with a monitoring mechanism for timely alerts.
  • Less manual intervention in the fully integrated solution.
  • End-to-end data delivery using cloud-agnostic solutions provides scalability and cost-effectiveness.

Benefits

  • Process efficiency: Increases overall efficiency in data preparation by up to 80%
  • Effort optimization: Up to 30-40% reduction in involvement of in-house teams required for data preparation activities.
  • 250+ ready-made transformations to choose from for data preparation tasks.

Industrial Usage

The DataBrew with DataOps as a solution has benefits across industries as efficient data preparation is required by most of the industry processes for operational functions. For example, for the Manufacturing industry, monthly sales analysis; for health care, it could be medical records used for future prediction of upcoming health challenges; for the Media industry, finding out the TRP-driven content in real-time, etc. So, the overall solution will deliver cloud transformation at scale with data preparation in a speedy manner needed for most organizations and implementing the solution leveraging Amazon Cloud services, which offers the benefit of Quick Win, optimal cost, and unlimited scalability.  

AWS Cloud Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Securing the Cloud: Navigating the Frontier of Cloud Security
  • Why Continuous Monitoring of AWS Logs Is Critical To Secure Customer and Business-Specific Data
  • How To Reduce the Impact of a Cloud Outage
  • Simplify Database Geo-Redundancy Backup With Cloud Storage Services

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: