Introduction
Hi, I am a data practitioner with experience in various data-related areas (data engineering, data analysis, data science, machine learning) and business domains (customer support, customer success, product, marketing, personalization, and sales).
I am an independent Python user, who is also proficient in multiple databases querying languages/dialects (PostgreSQL, MySQL, Snowflake, Standard SQL in BigQuery) with experience in areas ranging from data collection through its wrangling, modeling, description, and visualisation to the machine learning techniques and models (XGBoost, KMs, Word2vec).
Lately, I have gained experience with data science techniques (like segmentation or recommendation engines) and LLMs in the industry context and data engineering areas (ETL, data models, data architecture).
Data Engineer
Make, Prague, Czech Republic, 12/2022-present
- Data engineering. Development and implementation of
the CD/CI pipeline for Keboola using kb-cli, GitHub, and GitHub Actions.
Building ELT pipelines and data modelling. Building custom Extract/Load
components for various Sales/CRM platforms (Celonis, ZoomInfo) using the
request library. Addressing data quality. Code reviewing. DataOps and
FinOps. Airflow, Amazon Managed Workflows for Apache Airflow. Data
Quality using Soda.
- Machine Learning. Deploying data science projects
(LTV or personas) into production. MLOps. Strategy for the ML
infrastructure development. Code reviewing.
- Generally used tools. Python 3.11+, Snowflake, Airflow, AWS, Github,
Make, Confluence, Slack, Jira
Machine Learning Engineer
Ataccama, Prague, Czech Republic, 07/2022-10/2022
- As a part of the Data Stories team, I contributed to the back-end
(idiomatic and pydantic-driven Python, FAST API) and front-end
(TypeScript, Vue.js) codebases. The use-cases included user-defined
filters or charts and attributes recommendation.
- Unit testing via pytest (including contribution to the CI/CD
pipeline). Git using Gitlab. Linting using black, isort, and flake8.
Docker, Kubernetes, kustomize and Helm.
- Data analytics and data science - prototyping via Jupyter
Notebooks.
- Generally used tools. Python 3.10, Snowflake, MySQL, REST API, Jira,
Notion.
Data & Machine Learning Engineer
CloudTalk, Prague, Czech Republic, 04/2022-06/2022
- Data engineering. Development and implementation of
the CD/CI pipeline for Keboola using kb-cli, GitHub, and GitHub Actions.
Building ELT pipelines and data modelling. Building custom Extract/Load
components for various Sales/CRM platforms (HubSpot, Crunchbase, Apollo)
using the request library. Addressing data quality topics like entity
profiling. Code reviewing.
- Machine Learning. Developing anti-fraud prediction
models. Defined business logic with stakeholders and setting up
alerting. Implementing cookiecutter framework for developing data
science projects. Prepared a strategy for the ML infrastructure
development. Code reviewing.
- Data analytics. Ad hoc reports using Jupyter
Notebooks and Redash.
- Generally used tools. Python 3, Snowflake, ML Flow,
MySQL, REST API, Jira, Outline.
Interim Product Owner of Personalization
Rohlik Group, Prague, Czech Republic, 09/2021-03/2022
- Distributed leadership of the 4-member team (a
back-end developer, a tester, a data analyst, a machine-learning
engineer). Setting a roadmap, agile ceremonies (planning, grooming,
retrospectives, etc.).
- Feature Lifecycle Management - collecting and
prioritising business requirements based on their alignment with the
company’s business goals. Alignment with other relevant stakeholders
across the company (including preemptive identification of dependencies,
synergies, and blockers). Development and deployment of personalised
features to production, including evaluating the business impact.
- Setting up objectives, key
results, and key performance indicators of the Personalisation
squad based on the company’s OKRs (GTMHub). Reporting to the C-level
management and the board of directors.
Machine Learning Engineer
Rohlik Group, Prague, Czech Republic, 02/2021-03/2022
- Covered domains: stakeholder management in the
agile set-up (using Jira), data engineering, ML model development and
deployment in production using Keboola and AWS (both batch and real-time
predictions) and performance evaluation. Focus on the personalization of
the product and CRM, solving various classification and regression tasks
in the international context.
- Data engineering. Data models and ETL self-service
in Keboola (including REST APIs via Postman) and AWS (S3, Redshift,
MySQL).
- MLOps using AWS - Sagemaker, Lambda, CloudWatch,
Glue, API Gateway.
- A/B testing - design, execution, and evaluation.
Dashboards in Tableau, ad hoc reports using Jupyter Notebooks. Git using
GitLab and AWS Codecommit.
- Other tools - Python (including various packages like pandas or
seaborn), Snowflake, PySpark, TensorFlow.
Data Analyst
Meiro, Brno, Czech Republic, 08/2020–11/2020
- Covered domains: Data Engineering, Data Analysis, Data
Science.
- ETL pipelines management, Design of data
models.
- Integrating PostgreSQL and MySQL
with R and Python, data wrangling,
description, and visualization to advanced statistical techniques and
data science techniques (e.g. recommendation engine and
customer segmentation) in Python using
VS Code. Fundamentals of Spark.
- Linux desktop (Ubuntu), Bash, Git. Fundamentals of
Docker and CD/CI.
Data Analyst
Kiwi.com, Brno, Czech Republic, 04/2019–07/2020
- Covered domains: Product, Customer Experience, Customer
Support.
- Reports. Integrating PostgreSQL, Snowflake, and
BigQuery with R and
Python, data wrangling, description, and visualization
to advanced statistical techniques (regression models, structural
equation modelling) in R. Reporting in Markdown (R notebooks,
Jupyter Notebooks, oral presentations).
- Dashboards. I’m proficient in working with tools like Looker,
GoodData, Google Data Studio, or Retool (Plotly).
- Data engineering. Sense of data models and ETL self-service
(Keboola, dbt).
- Linux desktop (Ubuntu), Bash,
Git.
- Jira, Scrum.
Lecturer
Masaryk University, Brno, Czech Republic,
09/2015–09/2020
- Demonstrations of analyses and their interpretations, providing oral
and written feedback in the following courses:
- Statistical Data
Analysis I.
- Correlation, contingency tables, t-test, an introduction to
non-parametric tests (e.g. chi-square).
- Statistical Data
Analysis II.
- Multiple linear regression, ANOVA, logistic regression, factor
analysis, mixed effect models.
- An
Introduction to R
- An introduction to the R language, data
cleaning, wrangling, description and visualization, multivariate
analyses in the R language.
Researcher
Transport Research Centre, Brno, Czech Republic,
06/2015–03/2019
- Travel behaviour analyses, coordination of research activities,
communication with the lay public as well as the community
of experts.
- Investigator in the project Česko v pohybu. Development of
the research tool, development and implementation of the probabilistic
sampling procedure and algorithms for automated control of the data
quality.
- Methodological and analytical supervision of the research project
for the Czech Ministry of Transport related to the Public
opinion on autonomous vehicles.
- Project management of small project teams, e.g. program section for
the conference Dopravní
chování v datech.
Internships
Danmarks Tekniske Universitet, Management Engineering, Lyngby,
Denmark. 08-09/2017
Corpus Christi College, University of Cambridge, Cambridge, UK.
07-09/2015
