SQLServerCentral - www.sqlservercentral.com

A community of more than 1,600,000 database professionals and growing

Featured Contents

The Voice of the DBA

Self-Healing ETL

The process of building ETL flows is cumbersome and time consuming. Many consultants have made a lot of money by building ETL processes that manage the movement of data in and out of databases, including to data warehouses. There is a lot of importance placed on consistent, quick, and accurate data loading. In fact, one of the most popular job needs these days is still for ETL developers. Even as data science grows and becomes popular, quite a bit of the work in data science is data preparation, which often involves ETL, or ELT.

One of the struggles with an ETL process is the tedious nature of building lots of flows that are very similar. There may be a wide variety of ways to get data to move, especially as we have a crazy set of inconsistent formats in which data appears, but once we have the flow, we often repeat that for many different files or tables. I think Biml makes this better, but it's still not simple, and there can be plenty of issues that still arise as the source inputs change.

There was an interesting article that describes the use of AI to assist in ETL flows. While there is some basic matching up of source and target data right now, what if AI would add a few things. The idea of looking at the data you've matched and suggesting alterations that can potentially help with data quality is interesting. The system would have to learn what issues occur over time and what good data looks like, but perhaps this could even extend when the source data gets slightly mangled in formatting. Perhaps detecting fixed width fields that have grown and shifted all columns would be possible, and even adjust the data flow.

I am actually more interested in detecting PII data and alerting developers, or even operations people that the ETL process needs to be changed or secured in some way. AI should be able as well to help Operations people with scheduling, even pausing or moving ETL work to other machines to better scale the flow. My view would be that any AI system ought to be looking for anomalies and detecting those, as pattern recognition is one thing AI is good at.

I'd think this would be more of an alert capability as I'm not sure I trust an AI system to actually adjust data. ETL flows are often too important to have data mangled in an unexpected way. Perhaps that's too cautious, as that's what developers do all the time. They mandle data, and we find ways to recover. Ultimately, I think that having lots of logging on what a developer or AI system does is more important. This might be especially true if voice commands are used to build the flows. I could see lots of ambiguities and mistakes from business analysts that are trying to describe how to build an ETL flow.

Steve Jones from SQLServerCentral.com

Join the debate, and respond to today's editorial on the forums


The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.2MB) podcast or subscribe to the feed at iTunes and Libsyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music.

ADVERTISEMENT
RGUni

Redgate University

Self-paced online training courses with easy to follow classes on getting started, exploring advanced features, and making the most of Redgate products. Learn the fundamentals, best practices, and top tips from the experts - Redgate’s Microsoft Data Platform MVPs and engineers.
Start Learning

Database DevOps

Benchmark your Database DevOps maturity level

Get a better understanding of how advanced your current processes are, receive recommendations for improvements, and see how your maturity level compares with that of your peers. Complete the Database DevOps Maturity Assessment

Featured Contents

 

HTTP Requests Using SQLCLR

Eilert Hjelmeseth from SQLServerCentral.com

Query web APIs directly in SQL Server with its CLR integration More »


 

Fifty Different Ways to Enact Data Privacy Laws?

Additional Articles from SimpleTalk

Now that the GDPR has gone into effect, many are wondering what will happen in the United States. California is the first state to enact similar legislation, called the California Consumer Privacy Act of 2018 (CCPA). In this article, William Brewer explains the history of the law, what it means for companies doing business with California residents, and how it compares to the GDPR. More »


 

Collecting Data with Elastic Database Jobs in SQL Azure

Additional Articles from MSSQLTips.com

In this tip we look at how to collect data for Elastic Database Jobs and easily store the data in a central location for analysis and review. More »


 

From the SQLServerCentral Blogs - The SQL Agents’ view of SQL Server

david.fowler 42596 from SQLServerCentral Blogs

I recently came across a really odd issue with the SQL Agent, there were two agent jobs attached to one... More »


 

From the SQLServerCentral Blogs - Power BI Dashboards, Reports, and Datasets with the SQL Chefs

Bert Wagner from SQLServerCentral Blogs

This week I had the opportunity to learn about dashboards, reports, and datasets in Power BI using the best kind... More »

Question of the Day

Today's Question (by Evgeny Garaev):

What is the new property for the COLUMNPROPERTY() function that appeared in SQL Server 2016?

Think you know the answer? Click here, and find out if you are right.


We keep track of your score to give you bragging rights against your peers.
This question is worth 1 point in this category: T-SQL.

We'd love to give you credit for your own question and answer.
To submit a QOTD, simply log in to the Contribution Center.

ADVERTISEMENT

The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win

The company's new IT initiative, code named Phoenix Project, is critical to the future of Parts Unlimited, but the project is massively over budget and very late. The CEO wants Bill to report directly to him and fix the mess in ninety days or else Bill's entire department will be outsourced.  Get your copy from Amazon today.

Yesterday's Question of the Day

Yesterday's Question (by Steve Jones):

I have a data frame that contains the top ten home run  hitters in baseball. The data frame looks like this:

 > HR.hitters rank players Hr 1 1 Barry Bonds 762 2 2 Hank Aaron 755 3 3 Babe Ruth 714 4 4 Alex Rodriguez 696 5 5 Willie Mays 660 6 6 Ken Griffey, Jr. 630 7 7 Albert Pujols 619 8 8 Jim Thome 612 9 9 Sammy Sosa 609 10 10 Frank Robinson 586 

I want to get just the all time ranking and the number of home runs. How can I do this in R to return this data?

 rank Hr 1 1 762 2 2 755 3 3 714 4 4 696 5 5 660 6 6 630 7 7 619 8 8 612 9 9 609 10 10 586

Answer: HR.hitters[1-3]

Explanation:

I can use brackets and number the columns, separated by a dash. I could actually use 

 HR.hitters[1-2-3]

to return all three columns.

Ref: Extract part of a data frame - click here


» Discuss this question and answer on the forums

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.

SQL Server 2017 : SQL Server 2017 - Administration

TempDB keeps growing for no reason. - Hi All, Thank you for looking into my question. I have SQL Server 2017 Enterprise server FailOver cluster (No Shared storage though)...

How else can I view last queries run by anyone on an instance of SQL Server or a particular DB - Is there a way / TSQL to see  last <N> queries run by anyone on an instance of SQL Server or...


SQL Server 2017 : SQL Server 2017 - Development

SQL Server 2017, 2012 Express Edition Insert Select Performance Issue with very less data. - Command 1: (Get Count from View1)

More than one check constraint per column? - In the documentation for CREATE TABLE (Transact-SQL), under check constraints, I see: A column can have any number of CHECK constraints,...

Init-cap the characters in SQL server - Hi Sir, How to Init-cap the characters in SQL server in SQL query without creating any function? create table abc (Month_Name nvarchar(10)) [code...


SQL Server 2016 : SQL Server 2016 - Administration

Windows Reboot time in sys.dm_os_sys_info does not match systeminfo - I've got an unusual situation. to get the Operating System Reboottime (NOT the SQL Service restart time) i've used this query...

Connectivity issue with a multisubnet AlwaysOn - Hello, I have a problem with a multisubnet that I can't figure out how to resolve. I have a 2 replicas AlwaysOn...

Not able to see SSIS packages when SQL Instance and Integration Services are on two separate nodes. - I have a three nodes in Failover Clustered Instance 2016 SQL Server. Also I have SSIS installed on all three...


SQL Server 2016 : SQL Server 2016 - Development and T-SQL

Need help in converting Nvarchar to Datetime - Hi Experts, Please help me in Nvarchar to Datetime, my scenario is as below I'm getting data in NVARCHAR and format is...

What are some really dumb things you could write in a SQL Query - We all know there are plenty of recommendations as far as what to avoid when writing T-SQL. For example, it's...

Difference between two join method - Hi guys, Can anybody tell me the difference between the two below examples and why I would use one over the...

Select Query Assistance (Comparison) - I am modifying a Query I have.  (I am using dummy data below as the set i have is quite...

Algorithm to combine 2 integer into one value and then disintegrate back into two - Hi, I may not be the first one to ask this, but I'm looking to find a way to combine two...

Importing from XML to SQL 2014 - This is my 4th day on SQL so I am a newbie. I created a query to import from an...


SQL Server 2014 : Development - SQL Server 2014

Showing all records that appear more than once - Hi everyone I'm trying to write some SQL that will allow me to pull all records where the NUMBER column appears...

How to flip tables in sql server 2014 - Hello, I have a requirement wherein there are 2 tables (Staging & Target) in the same database. Everytime data is first loaded in...


SQL Server 2012 : SQL 2012 - General

Database Management vs Database Administration Meaning - I have been thinking about this terminology and just wanted to get some opinions. My first effort is below:  Database Management...


SQL Server 2012 : SQL Server 2012 - T-SQL

HOW TO GET FIRST DATE - Dear all, I have data like:      NAME                             SPONSOR_NAME     DATE_OF_CALL        A                                    FI


SQL Server 2008 : SQL Server 2008 - General

Information from ALL SQL Servers - Hi   I would need to grab all in


Career : Employers and Employees

Evolution of career from perm/consulting to freelancing? - Hi,    I am working as DBA for 17+years.   permanent -> consultant->  permanent -> consultant ...   at some point I start thinking about...

This email has been sent to newsletter@newslettercollector.com. To be removed from this list, please click here.
If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com.
This newsletter was sent to you because you signed up at SQLServerCentral.com.
Feel free to forward this to any colleagues that you think might be interested.
If you have received this email from a colleague, you can register to receive it here.
This transmission is ©2018 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
Contact: webmaster@sqlservercentral.com