Director Of Technical Operations [CLOSED]
The Wikimedia Foundation, Inc. is a nonprofit charitable organization committed to building a world in which every single human being can freely share in the sum of all knowledge. Wikimedia's flagship project, Wikipedia, has become the largest general reference work ever compiled in human history. Hundreds of thousands of volunteers have contributed more than 15 million encyclopedia articles in 265 languages, all of which can be freely shared and used for any purpose. It is consulted by more than 370 million people every month, making it the fifth most popular web property world-wide. The organization operates several other free knowledge projects, and is principally responsible for the development of the open source MediaWiki software powering its projects.
The organization currently relies on data centers located in Tampa in Florida and Amsterdam in the Netherlands, with the intention of building out a new data center in Ashburn, Virginia and eventually shutting down the Tampa hosting, and possibly opening another data center in Asia. The software architecture is a mix of Squid and Varnish, Apache, Memcached, MySQL, PHP, Perl and Python components running on load-balanced Ubuntu/Linux servers. The Wikimedia Projects as a group represent the highest traffic web properties running on Free and Open Source software and as such part of this role would be to continue to make choices that preserve this functional and important cultural characteristic. The hardware architecture is heterogeneous, with hardware upgraded on a regular basis. The site’s operations tools include a broad range of server, service and network monitoring tools, with many statistics published online in real time. Documentation is maintained on an off-site wiki.
The Wikimedia Foundation provides the operating and managerial infrastructure required to sustain Wikimedia’s free knowledge projects, and pursues a permanent agenda of technical innovation. The Foundation also supports the volunteer communities, including an internationally active community of engineers and developers that contribute ideas and work to continually upgrade and evolve the organization’s infrastructure. The Wikimedia Foundation is one of the world’s most significant developers of open source software components and is an active contributor to and beneficiary of open source initiatives.
A young organization, the Wikimedia Foundation is currently building out its management team. It is on the cusp of completing its first-ever strategic plan which will provide high-level direction for the coming five years, including a goal of doubling daily traffic by 2015.
Currently, the Wikimedia Foundation has approximately 40 paid staff. That number is expected to grow in coming years consistent with the Wikimedia Foundation's scope and impact. The Operations Department benefits from access to substantial volunteer resources, and currently also has 4.5 full-time-equivalent (FTE) staff that is expected to grow to 9 FTE staff by year’s end. With the Chief Technical Officer, the Director will hire, supervise, motivate, mentor, develop and evaluate staff, ensuring that staff skills are appropriate to meet the organization’s goals.
It is very important that the Director be able to strike an appropriate balance in engaging the talents of both volunteers and full-time professional staff. It is this balance that ensures that Wikimedia retains the vital support of the wiki-community, while also ensuring that Wikimedia projects are a critical inventive platform for rapidly evolving open source technologies and approaches.
The primary responsibility of the Director of Technical Operations is to provide for a stable, secure, documented, scalable and responsive systems environment. Located in San Francisco, the Director of Technical Operations will report to the Chief Technical Officer and will work closely with other senior staff, volunteers and vendors (inc. data center and network providers, hardware and software vendors, site caching vendors, among others) that work for and with the technology group.
While the objective of the Wikimedia Foundation is not to necessarily be on the bleeding edge of technology, the organization pursues a permanent agenda of technical innovation driven by its mission to make all knowledge available to every single human being. Important elements to this mission are: ensuring reliable high performance operation of all its websites and other technical services, globally; ensuring data security and safety; reliance on, and continued development of, free open source software; reliance on interchangeable hardware components that are manufactured to common standards by a range of vendors. The Director must work with the professional staff and other advisors to strike a balance between ideals, practicality and cost.
As the steward of the 5th most accessed site on the internet, the Wikimedia Foundation has a special obligation to its audience to ensure the integrity of all data and to minimize the risks of security exploits of any kind. The Director will also ensure that all key Wikimedia data and hardware components remain physically secured and that physical access is appropriately controlled.
The overall technical architecture of the Foundation’s systems will include multiple failsafe mechanisms and redundancies. Standards will be established for seamless continued operation in the event of catastrophic failure of individual components (servers, disks, network controllers, firewalls, networks etc.), clusters of components, and data centers. To this end, the ability to monitor and accommodate network, system and component failure, peaks in network traffic, CPU and disk utilization is essential, as is the implementation of standard escalation procedures.
All data and infrastructure will have on-site and off-site back-ups. A complete image of various systems components will be taken regularly and retained off-site. All systems, redundancies, and newly introduced components will be monitored and tested, and appropriate test protocols will be established. Failsafe procedures for various types of failures will be tested on a regular schedule, with a full scale test of a catastrophic datacenter failure conducted at least once every year.
The purpose of Wikimedia projects is to serve a global audience of readers and volunteers, and technical and non-technical users alike. To ensure that the community is adequately served, the Director will establish standards and measurement systems for responsiveness. Given the global reach and utilization of Wikimedia projects, and the mission to make all knowledge available to all people, it will be important to be able to compare systems responsiveness to users located in cities large and small, and in towns and rural areas throughout the world. The technical infrastructure serves and is made appropriately accessible to various types of staff and volunteer users, some highly technical and some with no technical expertise. All systems will be architected and managed with an eye toward striking an appropriate balance between accessibility, security and transparency. All parts of the work of the Technical Operations Department will be appropriately documented, including: an inventory of all hardware and software, license and warranty information, passwords, hardware and software architecture, documentation of software code, and processes and procedures. As much of this documentation as possible will be made openly accessible.
This is a roll up your sleeves, participatory organization, and staff and volunteer teams tend to form flexibly around projects. The Director of Technical Operations should be willing and able to act as an engineer and architect, collaborator, manager, strategist and tactical implementer.
- Thoroughly understand the Wikimedia Foundation – its mission, values, history, culture, traditions, projects, activities, personalities and constituencies; actively engage in understanding the strategic plan and interpreting it within the context of the technical operations;
- Develop strong relationships with, and secure the trust and confidence of the Chief Technical Officer, colleagues on the management team, key volunteers, vendors and partners that provide varying kinds of technical operations infrastructure and support to the Wikimedia Foundation;
- In the first 90 days, review and enhance Technical Operations plans, including at least the following section: hardware architecture, software architecture, network architecture, security architecture and procedures, deployment procedures, standards & metrics, load balancing processes & procedures, backup and recovery (utilities, infrastructure and procedures), staffing plan, budget;
- Immediately assess the state of play of in-process technical projects, asses short term risk to any part of the Foundation’s technical infrastructure, and review in process changes/upgrades to the operating infrastructure; review the priorities of in-process projects and in consultation with the CTO and staff shift priorities as appropriate and manage delivery to time and budgetary targets.
Candidates should have the following type of experience and qualifications:
- Multiple years of experience in leading technical operations of a high traffic web property, ideally built using open source software components;
- Familiarity with a broad range of systems development, infrastructure transformation, and technical operations management tools in Unix/Linux (particularly Ubuntu) environments, including: Squid and Varnish, Apache, Memcached, MySQL, PHP, Perl, Python and a broad range of open source components. Experience with various widely used proprietary hardware and software components is advantageous, especially when deployed in mixed environments; an understanding of network architectures and experience in identifying and responding to load balancing issues;
- An understanding of various methodologies for deploying, changing, documenting and managing projects for technical operations, including various structured, standards based and agile development methods; a deep understanding of, and experience with, setting standards and development of procedures that deliver an end-to-end, tightly monitored systems infrastructure;
- Experience in developing and deploying backup and recovery protocols, software and physical security elements and procedures, and in architecting fault-tolerant solutions;
- A track record as an exceptional communicator that is able to convey complex concepts to people of differing levels of knowledge and experience in writing as well as verbally; experience preparing and making effective presentations to diverse groups large and small with different interests and priorities;
- A strong background in the development of efficient, reusable and redeployable processes, systems and organization structures;
- Experience in distributed, multi-site environments, preferably with international experience; experience in negotiating formal contracts and informal agreements; comfort doing business with people of diverse cultures, linguistic groups, political/religious affiliations and philosophies is desirable;
- Demonstrated skills in engaging, motivating, coordinating and supporting communities, and in managing the sometimes chaotic and quasi-anarchic nature of free-thinking communities is important;
- A BS degree in computer, electrical, industrial, or systems engineering or closely related engineering field or equivalent experience, advanced (MS, MBA) degree preferred; strong structured project management experience, preferably is high transaction environments with heterogeneous hardware configurations; a minimum of 10 years experience managing complicated projects; experience in working within data center environments is a pre-requisite for this position.
The successful candidate should be:
- A mission-driven individual with an understanding of, belief in and commitment to the societal benefits of freely-available information and of the free open source software movement; a passion for how these principles are important in the development of civil societies; an ability to communicate that passion;
- An independent and open-minded individual who values and appreciates diversity, input and collaboration from various constituencies; has the ability to make unpopular decisions when necessary and explain them;
- An inveterate listener and explainer; comfortable receiving input from many sources, and able to act on information to develop strong, stable and well-functioning system infrastructure solutions;
- A strong manager who will advocate for the needs the technical operations department and the organization’s infrastructure; a practical person who will deviate from ‘ideal’ solutions in order to gain some other benefit (lower cost, increased stability, increased functionality, greater flexibility, etc.); a hard worker with a high energy level; a “doer” with a willingness to work hands-on with other staff, community members and members of the management team;
- Emotionally mature and self-reliant; someone who will thrive working in a small but growing team; an ability to tolerate a high degree of ambiguity, and to negotiate with people having sharply defined opinions while maintaining positive, respectful relationships;
- A sense of humor.