What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
Are you an animal lover? Do you have a passion for caring for pets? If so, I have an exciting opportunity for you! I am seeking a responsible... ...services including boarding, dog walking, overnight care, pet grooming, and pet sitting. I am willing to compensate $200.0 daily for...
...continue to grow and evolve. With a focus on collaboration, creativity, and long-term partnerships, Valicy is proud to be part of an industry... ...position A high level of proficiency using Maya to create fun and stylized characters/assets An eye for color and...
...expertise, enthusiasm, and passion for quality to a place that cultivates wellness and inspires hope. We are currently seeking a Creative Arts Therapy Coordinator for Riveredge Hospital. Apply to the posting and discuss your interests with the hiring manager if you...
Job Opening: Field Service Engineer (Automation & SCADA Systems) A leading service provider in the automation and process control industry is seeking a talented Field Service Engineer to support clients with the installation, calibration, and repair of advanced equipment...
...Job Description Job Description TITLE: Accounting Supervisor / Manager LOCATION: Waco, TX (full-time onsite) COMPENSATION:$95,000 On Target Earnings ($80-85K base, 10-12%+ Individual Yearly Performance Bonus) RELOCATION: $5,000+ relocation reimbursement...