Recent blog posts

Insights, news and perspectives on data, analytics and Azure.

Entangled in strategies

Have you ever noticed that when you ask about data strategy, you sometimes receive a response about digitalization strategy instead? This can be confusing and frustrating, as it may not address your original question. Misunderstandings can easily happen when discussing complex concepts, even among professionals in the same field. Communication is challenging, and it's crucial to establish a shared language to ensure clarity. As data professionals, we often need to clarify terminology with business partners, even...

Enhanced data profiling with AutoML

Let's start right from the beginning with a disclaimer: Artificial intelligence still doesn't know how to figure out data quality deficiencies. It can't even search for them very well on its own. You need to show it some love for it to manage. But you can easily get quite many benefits for the task at hand from machine learning as it is today. Data profiling is one of my all-time favorite data development tools. A few years ago, I got to know the Pandas Profiling Python library, which does so much of the work...

Azure Open AI GPT-3 model first impressions

I don't recall such hype from a single technology during my career that Open AI’s Chat GPT has made. As a regular person, I am just as into that hype as the next person. But for it to revolutionize the data industry. Well, for that we might still have a bit way to go. On that path, however, the Azure Open AI is a hefty step in the right direction. Here is a story of my first impressions trying to utilize the new service offering. I wanted to do a text analysis that would, instead of just picking up words, use...

Stroll in the Azure IIoT jungle - Cloud1

There are many different services on Azure that are used for IIoT solutions. Playfully I call this versatile stack as a jungle. And why not. We can see few different layers there from usage and also from architectural perspective. But before we go into details let’s take a step back to see the forest from the trees so to speak. Download our blueprint about IIoT.

I love Pandas, or how to generate profile reports from data - Cloud1

I 💙 pandas, I really do. And I also like to do data profiling. So I can’t possibly go wrong with panda profiling, right? Well, jokes aside, in my opinion data profiling should, at least to some extent, be a standard practice in all data processing work. For me and for many of my colleagues our history with data development goes far further than to the time before we had all these nice tools we have now, though. The process of checking data issues has previously been mainly manual. Some of us who are lazy enough...

Discovering my custom expectations for Great Expectations

A week after I started to write the first blog post covering Great Expectations framework, I am back at it again. I managed to first create a custom expectation (i.e., a custom data validation rule) and after which I investigated the more formal way of using the framework. Here’s how it went and what I learned.

Rescue abandoned ML algorithm from developer’s laptop, how to succeed in ML?

Machine learning (ML) applications are already used in many businesses, especially in data or analytics-heavy industries. Whether small or large, every organization can have a lot to gain if they decide to utilize advanced analytics. For example, using an optimized supply chain will help decrease costs. Clustering customers will help to target marketing departments' efforts to the right customer group. Classifying customer churn will help target efforts to stop customers who are about to leave, which results in...

Discovering my great expectations for data quality

So, here’s the deal. One weekend I found myself stuck at home in Covid quarantine waiting for the test results for my kid. Instead of watching the TV the entire Sunday, I decided I might try to use my time doing something a bit more productive.

A Data Lakehouse with low-code, please

If you follow any trends in data world, you have probably heard about Data Lakehouse. It’s a new architecture proposed by Databricks that utilises Delta tables as data lake storage format.

A Data Lakehouse with low-code - Implementation

In a previous blog post I talked about how Databricks Data Lakehouse can be created with low code implementation only. That is almost true. System needs to be setup and for that initial configuration some code is needed. What this code does is it creates a mount to storage account that will be used as storage for delta tables. Fortunately this code is well documented and there are multiple guides to accomplish this like this one:...


Our customers

Meet some of our customers. We help companies across various industries to turn data into a competitive advantage.

Subscribe to our blog