Key words: Data, Distributed system, Python, Hadoop, Spark, Hive, English
Responsibilities: • Design, build and maintain efficient and reliable data pipelines to move data across systems • Implement data workflows that let users aggregate and structure data for high-performance analytics • Collaborate with multiple teams in high visibility roles to implement new data workflows and lead the solution end-to-end • Design and develop new 3rd party integrations in a number of languages to facilitate effective data consumption • Work with Analytics Team to evaluate, benchmark and integrate state of the art open source data tools and technologies • Facilitate Growth team to continuously evaluate and deploy data infrastructure for detailed analysis • Monitor the data pipeline and workflow performance and optimize as needed
Requirements: • At least 5 years of software development experience • At least 2 years of experience working with data at scale, especially with distributed systems • B.S./M.S. in Computer Science or a related field, or equivalent experience • Experience with at least one scripting language like Python and one object-oriented language like Java • Experience with network programming and relational databases • Experience with Hadoop, Spark, Hive, Presto • Familiar with data engineering and analytics in Cloud environment • Experience with Airflow will be a plus • Experience with Tencent QCloud will be a plus • Able to communicate complex concepts clearly and accurately • Willingness to learn new technologies, tools, and approaches to problem solving • Sharp troubleshooting skills to identify and fix issues quickly • Fluent in English