Data Engineer

Data Engineers play a critical role at Columb Labs, they are responsible for designing and building data solutions and applications. They are known for their clean production-ready coding skills and great expertise in building data pipelines and data storage platforms.

We are constantly striving to boost our clients' business KPIs. You will design and build production data pipelines from ingestion to consumption within a big data architecture, using Python. Responsibilities include designing data storage systems, analyzing and rearchitecting existing data platforms and communication with both machine learning engineers and backend developers.
Requirements:
  • 4+ years working in back-end web development with Python
  • Excellent Python knowledge for writing scalable and effective production code
  • Extensive programming experience with PySpark, MapReduce, HDFS, NoSQL and SQL databases
  • Excellent software engineering skills: CI/CD, Docker, pytest, code documentation
  • Experience with AWS: Redshift, S3, DynamoDB, Athena, Lambda
  • Experienced in data manipulation tools: Dask, Airflow
  • Good knowledge of computer science, data structures, and algorithms
  • Exceptional problem solving and ability to work independently
  • Fluency in both oral and written English
Nice to have:
  • Basic knowledge of machine learning and deep learning algorithms
  • Knowledgable about Kafka or other stream processing frameworks
  • Experience with Scala, C++, Go
  • Contribution to open-source projects
Why you should work here:
  • Work with a strong team (ex Yandex, CERN, Columbia University)
  • No legacy, contemporary tech stack
  • Competitive salaries
  • Remote work