RFP/Systems & Operations Engineer

From Wikimedia Foundation Governance Wiki
Revision as of 21:46, 24 October 2011 by Jtud (talk | contribs)

Template:Job opening status

Statement of Purpose

Wikimedia is looking for a contractor who will support the Operations team in maintaining the Wikimedia system infrastructure. The position is based in the Europe, preferably in the Amsterdam area. As Systems & Operations Engineer, your job is to assist in the day-to-day technical operations of the Wikimedia Foundation web properties and its backend infrastructure, as well as developing tools to have manage the system administration of the infrastructure. Occasionally, you will be required to go into the Amsterdam Colocation data center to work on the servers there.

Background Information

The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property.

Scope of Work

Duties include, but are not limited to the following:

  • Maintain the Lab testing server infrastructure (part of the Test Virtualization Engine)
  • Assist in the design and enhancement of the Labs Virtualization Project using OpenStack technology. You will also be integrating with our backend infrastructure such as LDAP and MediaWiki.
  • Infrastructure operations and LAMP-stack administration of about 800+ servers, located at Equinix Data Centers in Ashburn (Virginia), Tampa (Florida) and Amsterdam (Holland):
    • Install/Maintain Ubuntu OS with Apache and Php
    • Install/Maintain caching servers with Memcached, Varnish and Squid software
    • Install/MaintainMySQL servers and enable database replication
    • Install/Maintain Wikimedia Search servers
    • Implement file and database backups to the 2 NetApp servers
    • Setup Puppet Master and the configurations for OS, Varnish, Squid, Memcached and ACL
  • Monitor and resolve user reported issues in Request Tracker and Bugzilla, as their arise
  • Script new monitoring modules into Nagios and Ganglia to provide application monitoring for the Mediawiki, Fundraising and Community Management applications
  • Trouble-shoot and emergency response to system outages
  • Update and/or document Operations PlayBook for OS, Caching, MySQL Database and Mediawiki Applications

Outcome and Performance Standards

You are expected to work between 30 to 40 hours a week on average. During these (flexible) hours you are required to be available online for collaboration with the (international) Foundation team. Outside these hours, you may incidentally be contacted for emergencies (e.g. during system outages). Besides maintaining regular communication with your point of contact, you may need to participate in bi-weekly online Operations meetings with the rest of the team. There will be milestone check ins with the Foundation to discuss progress and activities. You must be willing to travel occasionally for international meetings, as well as to perform your duties

Term of Contract

Your initial contract will be for a duration of 12 months, and will commence as soon as possible. Renegotiation at the termination of the contract is optional.

Payments, Incentives, and Penalties

Rate will be determined by level of experience and expertise.

Contractual Terms and Conditions

Required qualifications

Respondent parties are expected to:

  • have strong knowledge & 2+ years of scripting & programming languages like Python & C
  • have strong knowledge of and 2+ years of hands-on experience with LAMP-stack system administration, including Nagios, Puppet, MySQL administration and Data Backup software (e.g Amanda)
  • be able to work independently where needed, and can work remotely as part of a globally distributed team.
  • be able to learn quickly. Relevant hands-on experience and eagerness to learn and try new concepts
  • be comfortable in a highly collaborative, consensus-oriented environment.
  • proficient speakers in the English language.

Furthermore:

  • Experience with virtualization technologies such as OpenStack or Ganeti is a plus
  • Experience with clustered filesystems such as GlusterFS or Swift is a plus
  • Experience with high traffic web site operations is a plus.
  • Experience with on-site data center operations is a plus..
  • Understanding of the free culture movement is a plus.

The ideal candidate will be creative, highly motivated, and able to operate effectively in multiple cultural contexts.

Points of contact for future correspondence

CT Woo, Director of Technical Operations
rfp@wikimedia.org