Job openings/Systems Engineer - Data Analytics

Revision as of 20:06, 9 February 2012 by Sthommen (talk | contribs)

Template:Job opening status

YOU ARE ...

... a person who loves building scalable systems for handling a large amount of data. You enjoy working with a team using data to understand and improve complex social systems. Wikipedia and its sister projects seem like a very exciting potential playground to you, and you enjoy the reward of discovery as a result of hard work.

You are resourceful, inquisitive, and collaborative. You understand large scale systems and how to turn terabytes of unprocessed log files into a sensible data architecture. You are on top of the latest trends in database technology, love to debate the pros and cons of running Hadoop vs MongoDB vs MySQL. You know the most effective techniques for giving the Wikimedia Foundation the tools to measure the success (or failure) of key initiatives. You are self-sufficient, self-driven, focused, and organized. You are a brilliant coder and dream of writing robust, maintainable open source statistics and analytics software. You can lead a new team of data analysts in building a practice of safe, sane and useful model-development.

Above all, you are excited by the opportunity to support Wikimedia's free knowledge projects with data.

JOB TITLE

Systems Engineer - Data Analytics

JOB PURPOSE

The Systems Engineer - Data Analytics will be responsible for developing systems for gathering key metrics critical for the operation of the Wikimedia Foundation. You will work with the development staff to ensure proper (but unobtrusive) instrumentation of critical infrastructure.

JOB SUMMARY

Duties include, but are not limited to the following:

  • Develop, test, and deploy new features, improvements and upgrades to Wikimedia's statistics and analytics infrastructure (e.g. Wikistats) in cooperation with other Wikimedia Foundation engineering staff
  • Work with Wikimedia volunteers, Wikimedia Foundation research staff, and researcher community in articulating and finding answers to key strategic research questions
  • Define code and architecture standards for data analytics tools, reviewing and approving code from data analysts
  • Work with volunteers to augment our analytics capabilities
  • Recommend new methods for collection and documentation of data, and establish procedures for procurement of data
  • Coordinate and participate in the preparation and presentation of reports and analysis, capture progress, trends, and appropriate recommendations or conclusions
  • Assist end users and other developers in identifying and resolving issues with the software and configuration of Wikimedia's analytics infrastructure
  • Provide answers to ad hoc queries and one-time statistics generation requests
  • Configure, customize and develop other web-based and server-side software used to support analytics operations
  • Interface with the open source web analytics development community and other outside developers

REQUIRED QUALIFICATIONS

  • 2+ years experience working in a Linux/Unix server environment.
  • 2+ years experience with scripting languages, such as PHP, Perl, Ruby, Python or shell scripting. Experience with low-level programming languages is a plus.
  • 2+ years experience with large database storage systems, with previous experience working with MySQL or similar database in a production environment.
  • You are willing to work collaboratively and discuss methodology and conclusions with the technology team, project team, and the Wikimedia community.
  • You are willing to solicit ideas and discuss methodology and conclusions beyond direct contacts, notably the Wikimedia community.
  • You are able to learn quickly. Relevant hands-on experience and eagerness to learn and try new concepts are more important than having certificates.
  • The ideal candidate will be creative, highly motivated, and able to operate effectively in multiple cultural and technical contexts.
  • You are able to work independently where needed, and can work remotely as part of a globally distributed team.
  • You are comfortable using a wide variety of communications/collaboration tools including wikis, mailing lists and IRC.
  • You must be comfortable in a highly collaborative, consensus-oriented environment.
  • You are a proficient English speaker.

ADDITIONAL QUALIFICATIONS

  • Previous experience with the PHP scripting language is a plus
  • Experience with Hadoop, Cassandra, other NoSQL systems, and/or other distributed computing technologies is a major plus
  • Experience with Open Web Analytics, Piwik, Jaspersoft, or other open source analytics dashboard software is a major plus
  • Experience with Flot, jqPlot or other open source charting software is a major plus
  • Any other free/open source software development experience is highly welcome
  • Experience with high traffic web site architectures and operations is a plus
  • Experience with wikis and participatory production environments is a plus
  • Understanding of the free culture movement is a plus
  • Active participation as a Wikimedia volunteer would be an asset, though not a prerequisite

In your cover letter, please provide URLs to any existing open source software work you may have done (your own software or patches to other packages), if possible. We'd love to see what you can do! Provide us your technical blog URLs, if any. And let us know why this position interests you.

Template:Job openings footer