Systems | Development | Analytics | API | Testing

Latest Posts

Xplenty: The AWS Solution Architect's Secret Weapon

AWS Solution Architects are in red-hot demand, and the AWS Certification is the highest-paying certification in the United States. As such, you wear many hats as a Solution Architect for Amazon Web Services. You're a problem-solver, a creative genius, a multitasker, and a big-picture thinker. And you design AWS implementations better than anyone else you know. But there are some things about being an AWS Solution Architect that aren't so rosy. Amazon's ever-changing recommendations.

How to Offload ETL from Redshift to Xplenty

Amazon Redshift is great for real-time querying, but it's not so great for handling your ETL pipeline. Fortunately, Xplenty has a highly workable solution. Xplenty can be used to offload ETL from Redshift, saving resources and allowing each platform to do what it does best: Xplenty for batch processing and Redshift for real-time querying. Redshift is Amazon’s data warehouse-as-a-service, a scalable columnar DB based on PostgreSQL.

5 Customer Data Integration Best Practices

For the last few years, you have heard the terms "data integration" and "data management" dozens of times. Your business may already invest in these practices, but are you benefitting from this data gathering? Too often, companies hire specialists, collect data from many sources and analyze it for no clear purpose. And without a clear purpose, all your efforts are in vain. You can take in more customer information than all your competitors and still fail to make practical use of it.

Protecting Personal Data: GDPR, CCPA, and the Role of ETL

The growth of data has been exponential. By 2023, it's anticipated that approximately 463 exabytes (EB) will be created every day. To put this into perspective, one exabyte is a unit equivalent to 1 billion gigabytes. By 2021, 320 billion emails will be sent daily, many of which contain personal information. Data collected around the globe contains the type of information that businesses leverage to make more informed decisions.

Using Xplenty with Parquet for Superior Data Lake Performance

Building a data lake in Amazon S3 using AWS Spectrum to query the data from a Redshift cluster is a common practice. However, when it comes to boosting performance, there are some tricks that are worth learning. One of those is using data in Parquet format, which Redshift considers a best practice. Here's how to use Parquet format with Xplenty for the best data lake performance.

What Is a Data Stack?

These days, there are two kinds of businesses: data-driven organizations; and companies that are about to go bust. And often, the only difference is the data stack. Data quality is an existential issue—to survive, you need a fast, reliable flow of information. The data stack is the entire collection of technologies that make this possible. Let's take a look at how any company can assemble a data stack that's ready for the future.

Introducing Component Previewer

The component previewer is a feature that allows you to preview your data at each component step without having to validate packages and run full-scale production jobs. It gives you the ability to extract, transform and preview your data on any transformation component, allowing you to debug your pipeline and/or to confirm and validate your data flow logic. Component previews are similar to the data previews available on source components, which you might already be familiar with.

Scheduling With Cron Expressions in Xplenty

One of the most requested features in a data integration tool is greater flexibility around the scheduling of packages and workflows. With Xplenty, this can be achieved through the use of our Cron Expression scheduling feature. Cron is a software utility that enables Unix-based operation systems, such as Linux, to use a job scheduler. You can create cron jobs, which execute a script or command at a time of your choosing. Cron has broad applications for tasks that need time-based automation.

How to Check CloudFront Logs for Big Data Collection

AWS provides many solutions for managing business data. There’s Amazon Relational Database, or Amazon RDS, which is ideal for scaling your databases on the cloud. There’s Amazon Redshift for warehousing your data. For collecting big data, we’ve looked at a number of modern data integration platforms, but Amazon CloudFront is more of a content delivery platform. So, why are we talking about CloudFront in terms of big data right now?