Grab Data Tech is enabling everyday opportunities through big data innovation in Southeast Asia. Caspian (Grab’s Data Engineering team) aims to make it easy to use Data Lake by developing efficient tools and technologies for data producers and customers to identify everyday opportunities through big data innovation.

Caspian runs the code, pipeline and infrastructure that extracts, processes and prepares every piece of data generated or consumed by Grab’s systems. We are a diverse team of software engineers (devops engineer, data engineer, full stack engineer) that not only work to solve all kinds of data related problems faced by teams from all corners of Grab but we also act as a bridge that ties everyone together through data. As data in Grab never stops growing, this team also never stops, learning, innovating and expanding so that we can bring in or build the latest and best tools, technology to ensure the company’s continued success.

Get to know the Role

Data Engineers in Grab get to work on one of the largest and fastest growing datasets of any company in South East Asia. We operate in a challenging, fast paced and ever changing environment that will push you to grow and learn. You will be involved in various areas of Grab’s Data Ecosystem including reporting & analytics, data infrastructure, and various other data services that are integral parts of Grab’s overall technical stack.

The Day-to-Day Activities

Build, deploy and manage big data tools with solid devops functions. Be able to manage
CI/CD pipelines and terraform as well as cloud infrastructure.
Deep understanding of the different data formats and table formats in distributed data processing and data storage:

○ Data Formats: Parquet, Avro, ORC, Arrow;

○ Open Table Formats: Delta Lake, Iceberg, Hudi and Hive;

Streamline data access and security to enable data scientists, analysts and backend engineers to easily access data whenever they need to.
Developing automation framework using programming languages such as python and automate the big data workflows such as ingestion, aggregation, ETL processing etc.
Maintain and optimize the performance of our data analytics infrastructure to ensure accurate, reliable and timely delivery of key insights for decision making.
Run Modern high performance analytical databases, with Solid understanding of distributed computing, and be able to build out scalable and reliable ETL pipelines and processes to ingest data from a large number and variety of data sources with high performance analytical databases and computation engines like Spark, Flink, Presto, Synapse, BigQuery, Greenplum and others.
Understand most of SQL interface to tabular, relational datasets. Some distributed analytic engines like Trino(formerly Presto), Druid, Clickhouse, Redshift, Snowflake, Synapse, BigQuery, Greenplum (and other tools commonly referred to as “data warehouses”) integrate proprietary storage services with the analytics engine, creating self-contained data lakes functionality.

The Must-Haves

A degree or higher in Computer Science, Electronics or Electrical Engineering, Software Engineering, Information Technology or other related technical disciplines.
Designed high performance scalable infrastructure stacks for Big Data Analytics.
Write unit, functional and end-to-end tests.
Real passion for data, new data technologies, and discovering new and interesting solutions to the company’s data needs.
Excellent communication skills to communicate with the product development engineers to coordinate development of data pipelines, and or any new products features that can be built on top of the results of data analysis.

The Nice-to-Haves

Experience in handling large data sets (multiple PBs) and working with structured, unstructured and geographical datasets.
Good experience in handling big data within a distributed system and knowledge of SQL in distributed OLAP environments.
Knowledgeable on cloud systems like AWS, Azure, or Google Cloud Platform.
Familiar with tools within the Hadoop ecosystem, especially Presto and Spark.
Good experience with programming languages like Python, Go, Scala, Java, or scripting languages like Bash.
Design and implement RESTful APIs, and build, deploy performant modern web applications in React, NodeJS and TypeScript.
Deep understanding on databases and best engineering practices - include handling and logging errors, monitoring the system, building human-fault-tolerant pipelines, understanding how to scale up, addressing continuous integration, knowledge of database administration, maintaining data cleaning and ensuring a deterministic pipeline.

Our Commitment

We are committed to building diverse teams and creating an inclusive workplace that enables all Grabbers to perform at their best, regardless of nationality, ethnicity, religion, age, gender identity or sexual orientation and other attributes that make each Grabber unique.

Senior Data Engineer - Data Engineering