In this special newsletter we bring you up to date on all the new content and news related to Data Engineering on InfoQ. We are also maintaining a portal page for this content on InfoQ at: https://www.infoq.com/ai-ml-data-eng.
What Machine Learning Can Learn from DevOps (article, Dec 15, 2018)
Microsoft Announces AI-Assisted IntelliCode for TypeScript and JavaScript in VS Code (news, Dec 10, 2018)
TensorSpace.js Delivers Neural Network 3D Visualization Framework (news, Dec 06, 2018)
Amazon Introduces Intelligent-Tiering for S3 Storage to Automatically Optimize Costs (news, Dec 05, 2018)
Azure Machine Learning Services Now Generally Available (news, Dec 05, 2018)

Apache Kafka: Ten Best Practices to Optimize Your Deployment (articles, Oct 19, 2018)
Back to the Future with Relational NoSQL (articles, Dec 04, 2018)
The Evolution of Uber's 100+ Petabyte Big Data Platform (news, Nov 10, 2018)
Scaling Apache Kafka at Pinterest (news, Dec 09, 2018)
Amazon Announces Managed Streaming for Kafka in Public Preview (news, Dec 06, 2018)

Face-api.js: JavaScript Face Recognition Leveraging TensorFlow.js

Face-api.js is a JavaScript API for face detection and face recognition in the browser implemented on top of the tensorflow.js core API. It implements a series of convolutional neural networks (CNNs), optimized for the web and for mobile devices.

Google Open-Sources BERT: A Natural Language Processing Training Technique

In a recent blog post, Google announced they have open-sourced BERT, their state-of-the-art training technique for Natural Language Processing (NLP) . Google has decided to do this, in part, due to a lack of public data sets that are available to developers. In addition, optimizations have been made to Cloud TPUs to reduce the amount of time required for training NLP.

Netflix Keystone Real-Time Stream Processing Platform

Netflix recently published a post in their tech blog discussing the design considerations and insights of Keystone, their Real-time stream processing platform. Keystone has been operational since December 2015 and has grown significantly over the years as Netflix subscribers have grown from 65 to over 130 million in the past 3 years. This article follows on the latest state of Keystone platform.

Redis 5.0 Released with New Streams Data Type

Redis recently announced version 5 of its popular database, 15 months after the release of Redis 4. Probably the most important feature of this version is the support for a new data type, Streams. Sorted set functionality has also improved and Redis modules have also been expanded, with the introduction of Clusters and Timers APIs. LOLWUT and other improvements are reviewed in the article.

Concept and Object Modeling Notation for Data Modeling NoSQL Databases

Ted Hills hosted a workshop at the recent Data Architecture Summit 2018 Conference about data modeling for relational and NoSQL databases. He said that the NoSQL movement helped the database community realize two things. First, not every application needs ACID properties. Second, the tabular data organization is still a good choice for much data, although not for all datasets.

Spark Application Performance Monitoring Using Uber JVM Profiler, InfluxDB and Grafana

In this article, author Amit Baghel discusses how to monitor the performance of Apache Spark based applications using Uber JVM Profiler, InfluxDB and Grafana data visualization tool.

Natural Language Processing with Java - Second Edition: Book Review and Interview

Natural Language Processing with Java - Second Edition book covers NLP topic and various tools developers can use in their applications. InfoQ spoke with co-author Richard Reese about the book.

Sentiment Analysis: What's with the Tone?

In this article, authors discuss NLP-based sentiment analysis based on machine learning (ML) and lexicon-based approaches using KNIME data analysis tools.

Analytics Zoo: Unified Analytics + AI Platform for Distributed Tensorflow, and BigDL on Apache Spark

We describe how Analytics Zoo can help real-world users to build end-to-end deep learning pipelines for big data, including unified pipelines for distributed TensorFlow and Keras on Apache Spark.

The New Kid on the Block: Spring Data JDBC

Jens Schauder describes the current state of Spring Data JDBC, its features and some of the underlying design decisions, especially its DDD-based API.

Big Data and Deep Learning: A Tale of Two Systems

Zhenxiao Luo explains how Uber tackles data caching in large-scale DL, detailing Uber’s ML architecture and discussing how Uber uses Big Data, concluding by sharing AI use cases.

Reactive Relational Database Connectivity

Ben Hale discusses the Reactive Relational Database Connectivity (R2DBC), explaining how the API works, the benefits of using it, and how it contrasts with the ADBC proposed as a successor to JDBC.

Implementing AutoML Techniques at Salesforce Scale

Matthew Tovbin shows how to build ML models using AutoML (Salesforce), including techniques for automatic data processing, feature generation, model selection, hyperparameter tuning and evaluation.

