A day in the life of a Data Engineer at Amplify Analytix
Featuring Dimitar Kyuchukov
Dimitar Kyuchukov is a Senior Data Engineer (DE) at Amplify Analytix, and we asked him to walk us through a day in his life as a DE.
Although there is no such thing as a normal day, there are some trends, sorts of days, and expectations of what you may anticipate as a data scientist.
We asked Dimitar to walk us through a day in his life as a DE.
How do you typically begin a workday?
I typically begin my day with a cup of coffee and checking my email. I examine status reports from our environments, which makes planning my priorities easier. Thus, I know if I have any emergencies to deal with or if I can go ahead with my work as planned from the previous day.
How often do you have meetings vs independent work?
It is challenging to give a definitive answer. I aim to spend up to about 30% of my usual workday in meetings or, in exceptional cases, no more than 50%. Engineers and developers should be able to spend more time working on technical assignments and tackling technical issues than participating in discussions.
What’s your favorite part about the platform you use for work? Which platform do you use?
The technologies we use for data engineering and data warehousing assignments are quite diverse. In a number of ongoing projects, we use Python and SQL. SQL is a well-established, reliable and convenient language for working with large datasets. Python contrasts to conventional ETL tools. It is capable and widely popular across diverse application types. It is object-oriented and helps create maintainable solutions. There are a lot of built-in or third-party data-related packages that make data processing easier to implement in code.
And what is your least favorite thing about the platform?
Python requires more coding in common situations where classic ETL platforms provide built-in capabilities and a different workflow that we don’t have. Implementing feature logic by writing code (and sometimes also integrating that with the rest of the solution) leads to a longer time to market. In contrast, a traditional ETL tool would provide a graphical environment where processes are represented by diagrams built from configurable blocks for each process step – enabling faster development of changes and fresh new data pipelines.
Could you tell me more about those ETL instruments?
There are many examples on the market – Ab Initio, Informatica, Microsoft SSIS, Azure Data Factory, Talend, Pentaho and many others. They vary in price, concept, and capabilities; some are aimed to work in on-premises server environments, others are part of a cloud platform, some are more performant than others, and some need some additional tools for certain tasks, while others are more self-sufficient. In data engineering activities and data warehousing, we encounter many repetitive patterns of data transport and processing – retrieving and copying files, parsing their content (e.g., JSON, CSV, XML), formatting fields, merging data from several sources, changing data capture, historisation and many others. ETL tools provide a selection of graphical diagram blocks that you can just set up with basic configuration values and write little or no code, then connect them in a chain of processing steps that data needs to undergo. The amount of manual code writing increases significantly when using a standard programming language, such as Python, for the same purpose.
Tell me more about a success story.
I will share one from very recently. Amplify Analytix has begun working on a new internally built product that is really exciting and deals with marketing spend optimization – we have finished an initial version of the ETL pipeline delivering the data to a historized dimensional model. We managed to create a flexible, cost-effective solution in the , which is almost entirely metadata-driven and quite modular.
Starting a product is quite an investment with delayed return and the cost of infrastructure can be a challenge. Meanwhile scalability is a must in order to be able to host your clients when starting from zero. It is crazy how efficient we can be with cloud solutions nowadays. Cloud native AWS services allow us to pay for resources our software utilizes – that kills two birds with one stone. We have a ridiculously cheap PoC and development environment while capacity is practically unlimited. I still remember the days of sizing virtual machines or even physical servers that were never fully utilized, then designing apps to spread load across time to avoid high peak loads – until you reach infrastructure limit and then spend more money and time on upgrades and migrations.
Another challenge of building the data pipelines for a fresh new product is that you need to be prepared that business requirements will be very volatile. The system needs to be flexible and maintainable enough to be able to quickly respond to the evolving business needs. We discovered that there is a lack of useful resources for building metadata-driven systems online. Therefore, we intend to contribute back to the community by creating some dedicated blog posts and articles describing the immediate benefits of data-driven strategy and providing practical, simple examples of code to help developers understand and more easily build up such systems. This approach is not new hype and is well proven in time by large enterprises. Nowadays, however, complex data platforms are far not an exclusive domain of big corporations – with smartphone apps, IoT devices, machine learning, various web service integration possibilities, etc. data processing is a common activity for businesses of any size. So metadata-driven solutions will be the way to go for many smaller businesses dealing with the issues that corporations have been tackling for a while already.
What is your favorite aspect of being a senior data engineer?
I have always found collecting, merging, and processing information in an automated fashion – and getting value from it, somewhat satisfying. I see it as solving a crossword or a puzzle – and then it becomes even more challenging when a business needs this implemented in a scalable, maintainable, efficient and performant system. Data, and its processing are tightly related to business, and you often need to have an overall understanding and thinking of business perspective as well – it is rarely a purely technical program implementation exercise.
And what tools do you use to stay on track with all your tasks?
Initiatives involving data engineering resemble other application development projects in several ways. Therefore, to manage an Agile or Scrum process and keep track of tasks, features, user stories, epics, etc., we typically utilize tools like JIRA or Azure DevOps. However, I don’t use anything particularly fancy when planning and managing my own time; instead, I typically use a simple text editor to jot down some notes or even scratch them on a piece of paper.
What do you often assist your team members with?
Well, it depends on the particular project; it depends on the challenges and the entire context. Sometimes it boils down to helping with requirements and analysis and client communication. I also sometimes help by participating in brainstorming on challenging tasks, making sure that, first, we all see and understand the problem in its entirety – and then providing different alternatives based on my knowledge and experience that the team could potentially consider. I also try to help by bringing in the bigger picture and helping to understand the entire context of a requirement or solution. With junior team members, I also try helping to understand the approach to a particular type of challenge – instead of immediately giving a ready practical solution when they have a difficulty. That allows them to try and learn and grow and find their way of acting independently tomorrow – and thus progress with their careers.
How do you keep a healthy work-life balance?
Nowadays, it’s important to realize that working time and free time are divided by a very thin line. And it’s very easy to blend them and cross this line and allow stress and burnout as a result of not being able to clear our minds out of work-related problems. Thanks to Amplify’s one-of-a-kind culture, I not only have the opportunity to learn and bring out my best, but I also have the work-life balance that keeps me happy and fulfilled. I always try to have certain activities that are outside work or hobbies or anything, or really activities that keep my focus away from any work-related tasks, any kind of worries or thoughts that can bring me unnecessary stress after work hours.
What are some of the skills you need to have to be a successful data engineer?
If I have to say it in just one sentence – thinking about data and data processing at scale. There is something that my father used to say, in a very different context, though, which very well applies here. With the amount of data our platforms are processing, with millions of records flowing through our systems daily, whatever issue is possible to happen in theory – will happen. The probability is just too high having the amounts of runs of each piece of logic. So, whatever (in terms of possible bad outcomes or scenarios) comes to your mind needs to be covered in your code. That needs to be part of your thinking. And another very important thing is – it’s not just the point to make something work. It needs to work efficiently to ensure that we have good performance at an acceptable price and that all should allow future modifications to be implemented quickly to allow a short time to market.
And what is your advice to someone who would like to become a data engineer?
Essentially, I think one should develop the right mindset to tackle the challenges we mentioned previously. On the practical side – the first thing is to learn SQL. Many people have tried to declare SQL dead from time to time over the years – but it is not, and that is for a reason. The reality is that this technology (note: relational databases) is backed by a lot of math, and it is just here to stay. Newer platforms like Amazon Redshift, Snowflake etc., are still incorporating SQL as a way to interact with completely new cloud-based systems. So, we can clearly see that every data engineer or data warehouse developer will have to work with SQL. And when you learn, learn it on a deeper conceptual level – try to understand that there is a machine behind the scenes that is executing your code. Try to imagine what activities the machine must do, because this will allow you to write better code. And that applies to anything additional you decide to learn – because everybody will need additional technologies other than SQL. SQL alone is usually not sufficient nowadays.
Whatever technology you decide to specialize in, let’s say Python, try not to just learn the basic and primitive syntax. Try to understand how that language works and how you build optimal code structures in that language – how to structure your programs in a maintainable way. Because, when you go to a job interview, nobody will really ask you for “if” statements or “for” loops or something like that. You really shine and stand out if you show that you are aware of how object-oriented programming works, and how you really use its baselines and concepts to make maintainable code that is easy to extend. On topics of databases – understanding how the entire database works and the different types of databases and how they differ. Familiarizing yourself with how data is structured behind the scenes and how the engine works really helps you to answer questions on performance. When you are given a more complex question, you will be able to really think in the right direction and show the capacity to analyze problems on the fly, even when you don’t have all the information in front of you.
So, in a nutshell, that is the advice that I would give – learn SQL, choose another technology as well and learn the details, not just the pure syntax.