SQLServerCentral - www.sqlservercentral.com

A community of more than 1,600,000 database professionals and growing

Featured Contents

The Voice of the DBA

Unstructured Data

Is unstructured data a bad term? I saw some data professionals complaining about this, saying all data is structured. That's usually true. A CSV, even a ragged one has structure. XML and JSON have structure, even if it might vary node to node. Certainly our relational tables are structured and some formats can be rigidly mandated between organizations (like EDI). Even data in PDF, Word, MP3, MP4 or other audio/video mediums is structured in that we know the format.

Given that, is it a misnomer to use the term, unstructured, when describing flexible formats, such as XML? Is it OK for a PDF? I have had a presentation called Unstructured Data in SQL Server. This is primarily about FileStream, FileTable, and searching those objects. In the talk, I classify data in known formats as structured. These would be SQL Server tables and similar objects. At any point in time, we know what all data in the table looks like, even though we can have NULLs or missing data in rows.

I call XML and JSON semi-structured formats. We can certainly determine the format for any node or section, but we wouldn't know without querying or examining the data. It's semi-structured in that there is a hierarchy, but the structured from section to section (essentially row to row) can vary. There can even be depths to hierarchies that vary. In many ways, that makes these great formats for flexibility in data exchange.

I tend to view data in Word, PDF, MP4s, as unstructured. We don't necessarily know where the data is, or how to separate it. We can get pages in Word or PDF, but those can vary and don't necessarily help us extract information. They are XML, but the XML tags don't relate to the content, unlike many other XML documents. Scenes or tracks in audio/video files might be separators, but those aren't necessarily helpful in gathering information. Instead, we need other tools that can help deal with that data, finding words, concepts, or more inside of the binary stream.

I like the term unstructured data because it helps me understand where the information is. While the tables in a database might be full of nonsensical information in some rows, or be poorly designed with data combined into text fields, at least I know where the fields are. Actually, in that case, I'd argue the data in varchar(max) text fields is really unstructured. You might disagree, but give me a better term to describe there the information is stored in a data format.

Steve Jones from SQLServerCentral.com

Join the debate, and respond to today's editorial on the forums


The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.7MB) podcast or subscribe to the feed at iTunes and Libsyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music.

ADVERTISEMENT
SQL Clone

SQL Clone: Now supporting databases up to 64TB

Create copies of production databases and SQL backups in seconds and save up to 99% of disk space using SQL Clone. Redgate’s new tool removes much of the time and resource needed to create and manage database copies, allowing teams to work on local environments to develop, test and diagnose issues faster. Try it free

SQL Compare

The industry standard for comparing and deploying SQL Server database schemas

Trusted by 71% of Fortune 100 companies, SQL Compare is the fastest way to compare changes, and create and deploy error-free scripts in minutes. Plus you can easily find and fix errors caused by database differences. Download your free trial

Featured Contents

 

Distributed Computing Principles & SQL-on-Hadoop Systems

Frank A. Banin from SQLServerCentral.com

A look at SQL-On-Hadoop systems like PolyBase, Hive, Spark SQL in the context Distributed Computing Principles and new Big Data system design approach like the Lambda Architecture. More »


 

How to Hire a Junior DBA

Additional Articles from Brent Ozar Unlimited Blog

Brent Ozar covers some questions you need to ask yourself before hiring a Junior DBA. More »


 

SQL Monitor Quick Tip: Using the SQL static code analysis performance rules

This quick tip will review how SQL Monitor 7 has incorporated SQL Code Guard’s built-in set of Performance Rules for static code analysis. These rules are designed to highlight SQL syntax that could potential cause performance problems, and so indicate ways to improve the overall quality and performance of the workload, over time. More »


 

From the SQLServerCentral Blogs - Permissions required for developing with Temporal Tables

Kenneth Fisher from SQLServerCentral Blogs

Temporal tables are one of those new (2016+) cool features that recently came across my desk. Basically, a temporal table... More »


 

From the SQLServerCentral Blogs - And YOU Get a Deadlock and YOU Get a Deadlock and EVERYBODY GETS A DEADLOCK!

Andy Galbraith from SQLServerCentral Blogs

https://memegenerator.net/img/instances/400x/59507872/you-get-a-deadlock-everybody-gets-a-deadlock.jpgWe all get them at one point or another - deadlocks. I have Object 1 and need Object 2, you have Object... More »

Question of the Day

Today's Question (by Steve Jones):

In CosmosDB (as of Oct 2017), I have the chocie of a few different types of data APIs. Which of these is not available in CosmosDB?

Think you know the answer? Click here, and find out if you are right.


We keep track of your score to give you bragging rights against your peers.
This question is worth 1 point in this category: CosmosDB.

We'd love to give you credit for your own question and answer.
To submit a QOTD, simply log in to the Contribution Center.

ADVERTISEMENT

Design and configure SQL Server instances and databases in support of high-throughput applications that are mission-critical and provide consistent response times in the face of variations in user numbers and query volumes. Learn to configure SQL Server and design your databases to support a given instance and workload.

Pick up your copy of this great book today at Amazon today.

Yesterday's Question of the Day

Yesterday's Question (by Steve Jones):

I have a file containing some data for the top NFL Quarterbacks from 2016.  The data is stored in a tab separated format in the file 2016QB.tsv. How can I read this in?

Answer: read.table("2016QB.tsv", sep="\t")

Explanation:

The read.table function will read data from a file. The SEP column is used to list the type of separator. The \t is a tab separator.

Ref: Reading a tab separated file - click here


» Discuss this question and answer on the forums

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.

SQL Server 2016 : SQL Server 2016 - Administration

script to verify index fragmentation - Dear all, Can someone help me with t-sql code on how can I see the fragmentation of my indexes (relevant ones),...

Create maintenance Jobs (Data Warehouse) - Dear all, I was requested to create a maintance plan keep our data warehouses working properly. This means that I will create...


SQL Server 2016 : SQL Server 2016 - Development and T-SQL

Nested insert statements problem - Hello all. fairly new to SQL. but not that new. I have 3 stored procedures. The first one has a statement like...

Help with date ranges - Hi all, I'd like to know if anyone can help me with a good suggestion on how to achieve this. I have...


SQL Server 2014 : Development - SQL Server 2014

How to only select 11pm to 7am in DateTime (many months of data) DDM&DDL included :) - Looking to run a report for a few years but only between 11pm and 7am. CREATE TABLE mytable(  i_ticket_id  INTEGER NOT...

Set up job to convert word document to pdf - I'm working on a SQL job to transfer some data along with some word documents. After copying the word documents...

Trying to speed up this 'GetWorkingDays' function. - I'm trying to redesign one of our working days functions. The idea is to calculate weekends (easy enough) but holidays...


SQL Server 2012 : SQL 2012 - General

Out of memory exception - If I run DBCC MEMORYSTATUS, it fails to finish and throws this error. An error occurred while executing batch. Error message...

User Not able to access Schema owned tables - Hi Experts, Need a Small information about user access. one of our users is in a group and the group is...


SQL Server 2012 : SQL Server 2012 - T-SQL

SSIS : "The value could not be converted because of a potential loss of data.". - Dear all, I have a data conversion task and I am getting the following error message: while converting column "VALUE_IN_FUND_CURR" (488) to...

Compare 2 rows using tsql code - Hi All, A questions which is there with me a long long time. Comparing 2 rows in a table. How...


SQL Server 2008 : SQL Server 2008 - General

Select vaue column for one date and another for another date - Hi Masters, how can I select a value from a column for a date and another value from another column...

Parameter in sp_delete_jobsteplog don't work - Hi, in our organisation we normally set up maintenance plans for backups and so on. As we cannot deploy packages...


Cloud Computing : SQL Azure - Administration

How to continue our SQL agent scheduled daily backup in Azure? - Hi, we just migrated our SQL database to Azure. However, I have no idea how to continue our SQL agent scheduled...


Programming : Powershell

Getting started with Powershell - Hi all I'm just getting started with PowerShell and following the series by Ben Miller here.  I'm trying to do things...


Data Warehousing : Integration Services

Execute Process Task: How SSIS loads executable files at runtime - Please help me understand how SSIS loads executable files at runtime. Does it copy the file to RAM at runtime...


Data Warehousing : Analysis Services

Design considerations for a tabular model - I have a star schema already in existence (SQL Server fact table and numerous dimension tables). For a given tabular...


Database Design : Design Ideas and Questions

Table Index - I have a potential table CREATE TABLE dbo.Preferences  (  EmailAddress VARCHAR(100) NOT NULL,  ChannelID INTEGER NOT NULL,  ContactTypeID INTEGER NOT NULL,  AllowContact BIT NOT NULL,  GDPR...


SQL Server 2005 : SQL Server 2005 Performance Tuning

Query taking long time - Hi, The below query is taking around 1000 Seconds. SELECT ar.MOVE_QTY,ar.DM12,ar.AL_OR,ar.OR_CODE,ar.CAI_CODE,ar.CAI_DESCRIPTION,ar.CST_CODE,ar.AL_NO AL_NO,ar.OPEN_QTY,ar.WH_CODE,ar.ETA_DATE, ar.CATEGORY,ar.AL_DATE,ar.OD_CODE, ar.SHIPTO_CODE,ar.AL_LINE AL_LINE,ar.ORIG_QTY,ar.CNCL_QTY,b.SHIP_QTY SHIP_QTY,ar.LOCAL_CODE,0 AL_TXT_LINE,b.AVIEXP_NO, b.UC_NUMBER, b.AVIEXP_DAT


SQL Server 7,2000 : In The Enterprise

Project Plan - I'm looking for a decent Project Plan (in any format) for moving from SQL Server 2000 to a SQL Server...

This email has been sent to newsletter@newslettercollector.com. To be removed from this list, please click here.
If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com.
This newsletter was sent to you because you signed up at SQLServerCentral.com.
Feel free to forward this to any colleagues that you think might be interested.
If you have received this email from a colleague, you can register to receive it here.
This transmission is ©2017 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
Contact: webmaster@sqlservercentral.com