DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • The Evolutionary Adaptation of Data Integration Tools for Regulatory Compliance
  • The API-Centric Revolution: Decoding Data Integration in the Age of Microservices and Cloud Computing
  • Using Open Source for Data Integration and Automated Synchronizations
  • Data Integration

Trending

  • Dapr For Java Developers
  • DZone's Cloud Native Research: Join Us for Our Survey (and $750 Raffle)!
  • PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration
  • OWASP Top 10 Explained: SQL Injection
  1. DZone
  2. Data Engineering
  3. Big Data
  4. The Benefits of Open-Source ELT

The Benefits of Open-Source ELT

Discover the benefits of open-source ELT for data integration: improved control, efficient processing, cost savings and growing trend in the industry.

By 
John Lafleur user avatar
John Lafleur
·
Feb. 27, 23 · Opinion
Like (1)
Save
Tweet
Share
3.3K Views

Join the DZone community and get the full member experience.

Join For Free

Open-source technology is becoming increasingly popular in the data integration industry, and for good reasons. Open source creates the right incentives, allowing users to own their data entirely, unlike closed source, where you build knowledge in a proprietary tool with a price tag. Open source also creates communities around common problems, allowing for the exchange of valuable knowledge and collaborative problem-solving. 

In this article, we will start investigating the reasons behind the adoption success of open source before delving deeper into the data integration industry, more specifically focusing on open-source vs. closed-source ELT (Extract, Load, Transform) solutions. We will discuss how open-source ELT allows for greater control over the data integration process, more efficient data processing, and cost savings for organizations. Additionally, we will explore the growing trend of open-source ELT adoption in the industry and examine the future of open-source data integration. 

If you're ready to consider open source, Airbyte is a great place to start. Its platform solves the long tail of connectors that closed-source solutions often neglect. We’ll explore its easy-to-use Connector Development Kit and more.

Why Open Source: From Visibility To Open Standards and Flexible Deployments Options

Open source means you have visibility and flexibility. Given that a single organization can't solve data problems with the ever-growing data ecosystem market, open source is the approach to tackle the challenge collaboratively and in a sustainable way as data tools/frameworks get created once for everyone, following DRY.

Open source allows fast interactions as different companies use the same tools, report back in case of error, or even fix it for everyone else. The best example is security patches that must be resolved quickly.

With open source, you are in full control. Whether you process the data through the fully open system and have the code of it saved and version controlled for full transparency.

You know the alternative: building a custom-built tool for your employer where the one initially created left a couple of years ago — or having a close source solution but missing a critical feature or connector that you cannot add yourself, even though you'd have the skills.

Open source also creates communities around a common problem. You can exchange valuable knowledge and find solutions collaboratively. Now you are not alone in fighting all these problems; suddenly, you have peers at the same stage, just in a different company.

Besides the community, open source creates open standards that are crucial for integration across-company efforts. With many close source vendors, it's hard to agree on standards, code is hidden, and everyone wants to be the standard.

Lastly, flexible deployment options. As it's open, you can deploy it on-premise in your infrastructure if you have sensitive data or work in sensitive sectors such as health care or banking, which also have high regulation by the law. But also in terms of security and GDPR, open source helps tremendously, open source ELT as you can use things like EtLT (we will get into it in a minute).

Why NOT Open-Source?

Although open source is an appreciated buzzword, if your audience is not engineers, open source can be overwhelming at first. The community is one key argument for open source; if you do not have an overlap between your developers and that community, the benefits are more minor. If you have a small need for customization and have simple use cases, it is better to use a standardized closed-source solution and pay for that. Open source requires a lot of education. If that piece of software is outside the core of your value proposition, it might be better not to use open source.

But with the above consideration, keep in mind that with the closed source, you are building knowledge in a proprietary tool rather than something generic and easily transferable (e.g., coding in Python). It's powerful for a simple pipeline, but it isn't easy to extend and maintain when it grows. It takes work to follow the best software engineering practices like testing or versioning. Licensing is usually rather expensive.

What About Open-Source ELT?

Let's briefly recap what ELT (Extract Load and Transform) stands for. ELT is in contrast to the more traditional ETL data integration approach, in which data is transformed before it arrives at the destination.

Read more About the Differences between ETL vs. ELT
ETL and ELT are two paradigms for moving data from one system to another. We detailed comparisons, including images in our Data Glossary on ETL vs. ELT.

The ETL approach was once necessary because of the high costs of on-premises computation and storage. With the rapid growth of cloud-based data warehouses such as Snowflake and the plummeting price of cloud-based computation and storage, there is lesser reason to continue doing transformation before loading at the final destination.

Indeed, flipping the two enables analysts to do a better job autonomously and support agile decision-making. You are letting them develop insights based on existing data instead of coming up with ideas beforehand, defining schemas, and transforming.

ETL has several disadvantages compared to ELT. Generally, only transformed data is stored in the destination system, so analysts must know beforehand how to use it and every report they produce, creating slower development cycles.

Changes to requirements can be costly, often resulting in re-ingesting data from source systems. Every transformation performed on the data may obscure some underlying information, and analysts only see what was kept during the transformation phase. 

Building an ETL-based data pipeline is often beyond the technical capabilities of analysts. On the contrary, ELT solutions tend to be simpler to understand.

ELT promotes data literacy across a data-driven company, as with cloud-based business intelligence tools, everyone in the company can explore and create analytics on all data. Dashboards become accessible even for non-technical users.

ELT/ETL Tool Comparison
Need to find the best data integration tool for your business? Which platform integrates with hour data sources and destinations? Which one provides the features you’re looking for? We made it simple for you and collected them in a spreadsheet with a comparison of all those actors. Or an extensive detailed comparison between the tools on Top ETL tools compared in detail.

Why Airbyte?

Airbyte is the open source platform that unifies data integration with 300+ connectors (and growing fast) to tackle the long tail of connectors, which makes it the most connectors in the industry. And more than 35,000 companies have used Airbyte to sync data from sources such as PostgreSQL, MySQL, Facebook Ads, Salesforce, and Stripe and connect to destinations that include Redshift, Snowflake, Databricks, and BigQuery over the past year and a half.

Most closed-source companies stagnate at 150 connectors as the most challenging part is not building the connectors; it is maintaining them. That is costly, and any closed-source solution is constrained by ROI (return on investment) considerations. As a result, ETL suppliers focus on the most popular integrations, yet companies use more and more tools every month, and the long tail of connectors needs to be addressed.

When it comes to the cost of ownership, Airbyte shines in the long run. Closed-source solutions grow more and more expensive over time as more edge cases emerge that aren't supported. Besides paying for the connectors, you also need to maintain an in-house team to create non-supported but essential connectors. Airbyte and open-source ELT make data integration future-proof as you get both in one with a wide variety of out-of-the-box connectors, plus an easy way to extend or create custom connectors.

Furthermore, in the event that you can't find an ELT connector that suits your requirements, Airbyte makes it easy to build a connector with the Airbyte CDK (Connector Developer Kit), which generates 75% of the code required. Here is the complete list of connectors currently available for Airbyte. Included are templates for building new connectors in Java or Python.

Airbyte offers robust pre-built features that otherwise need to be added by your engineers. You can configure replications to meet your needs: Schedule full-refresh, incremental, and log-based CDC replications across all your configured destinations.

What’s Next for Open-Source ELT?

As we've seen, open-source ELT is rapidly gaining popularity in the data ecosystem and the data integration industry precisely due to its numerous benefits. The increased transparency, openness, and customizability allow for faster interactions and more efficient problem-solving, making open source an ideal solution for businesses of all sizes.

As the industry continues to evolve and data becomes an even more integral part of business operations, it is no surprise that open-source ELT is the future of data integration. Companies that take advantage of these solutions will be better equipped to handle the demands of a data-driven world in the long term. Collaboration and knowledge-sharing within communities also allow for more efficient problem-solving and innovation.

Data integration Extract, load, transform Extract, transform, load Open source Connector (mathematics) Software development kit

Published at DZone with permission of John Lafleur. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Evolutionary Adaptation of Data Integration Tools for Regulatory Compliance
  • The API-Centric Revolution: Decoding Data Integration in the Age of Microservices and Cloud Computing
  • Using Open Source for Data Integration and Automated Synchronizations
  • Data Integration

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: