2.What are the roles and responsibilities?
Answer: In a DevOps role, responsibilities include collaborating with development and operations teams, automating processes, managing infrastructure as code, ensuring continuous integration/continuous deployment (CI/CD), monitoring systems, and resolving incidents promptly.
3. What is an RCA report? How do you prepare it? What things do we need to consider to create an RCA report?
Answer: RCA (Root Cause Analysis) Report: An RCA report identifies the core cause of an issue or incident. To prepare it, follow these steps:
Collect Data: Gather information related to the incident.
Identify Causes: Analyze data to determine the root cause.
Documentation: Clearly document the identified cause and contributing factors.
Recommendations: Suggest preventive measures to avoid recurrence.
Feedback Loop: Implement feedback mechanisms for continuous improvement.
4. You are the team leader managing teams. You received an escalation from a client for any service issues. How do you handle it?
Answer:
Acknowledge: Acknowledge the escalation promptly.
Investigate: Gather information about the issue.
Communication: Keep the client informed about the investigation progress.
Resolution Plan: Develop a plan to resolve the issue.
Implement Solution: Execute the plan, ensuring minimal impact.
Post-Incident Review: Conduct a review to prevent future occurrences.
Client Communication: Update the client on the resolution and preventive measures.
5. What major challenge have you faced in your role, and how did you handle it?
Answer: Scenario: Scaling Infrastructure
Challenge: Faced with a sudden spike in traffic, existing infrastructure struggled to handle the load.
Solution:
Implemented auto-scaling to dynamically adjust resources.
Utilized CDN to distribute content globally.
Conducted load testing regularly for capacity planning.
6. Why are organizations implementing DevOps?
Answer: Organizations adopt DevOps to:
Accelerate delivery cycles.
Enhance collaboration between development and operations.
Improve deployment frequency and success rates.
Automate manual processes for efficiency.
Enhance system reliability and scalability.
7. You received a task in Jira. What information do you need, and from where can you collect it before implementing the task?
Answer: Before implementing a Jira task:
Task Details: Understand the task requirements and expected outcomes.
Dependencies: Identify any dependencies on other teams or services.
Testing Criteria: Define criteria for successful testing.
Documentation: Refer to project documentation or collaborate with stakeholders.
Environment Information: Confirm details about the deployment environment.
8. Explain the Production release process.
Answer: Production Release Process:
Planning: Schedule release window and coordinate with teams.
Code Freeze: Restrict code changes to stabilize the codebase.
Testing: Conduct thorough testing, including regression testing.
Deployment: Deploy the release to production servers.
Monitoring: Monitor systems for any anomalies or issues.
Rollback Plan: Have a rollback plan in case of issues.
Communication: Keep stakeholders informed throughout the process.
9. If my changes work fine on dev and test environments but not on prod, what could be the issues, and how do you fix it?
Answer: Issues and Solutions:
Environment Differences: Identify and replicate prod-like conditions in the lower environments.
Configuration Mismatch: Ensure configurations are consistent across environments.
Data Discrepancy: Check for data variations between environments.
Dependency Issues: Validate dependencies and versions in each environment.
Rollback or Hotfix: If critical, consider rolling back changes or applying a hotfix.
10. What is the best way to design a 3-tier architecture? Which services are included, and how do you select services to design it?
Answer: 3-Tier Architecture:
Presentation Tier: Front-end services, user interfaces.
Application Tier: Business logic, server-side processing.
Data Tier: Database storage, data management.
Service Selection:
Choose services based on scalability, performance, and security requirements.
Use load balancers for distribution and reliability.
Implement caching mechanisms for performance optimization.
Apply security groups and network ACLs to control traffic.
11. What are the strategies for infra cost optimization, and what actions will you take to reduce infra cost?
Answer: Cost Optimization Strategies:
Reserved Instances: Utilize reserved instances for predictable workloads.
Spot Instances: Leverage spot instances for temporary and flexible workloads.
Rightsizing: Match instance types to actual resource needs.
Auto Scaling: Automatically adjust resources based on demand.
Tagging: Implement tagging for resource categorization and cost allocation.
Actions:
Regularly analyze AWS Cost Explorer for cost breakdown.
Implement automated shutdown policies for non-production environments.
Optimize storage costs by selecting appropriate storage classes.
12. How does auto-scaling work? Is it possible to change AMI in an auto-scaling group?
Answer: Auto-Scaling Workflow:
Scaling Policies: Define policies based on metrics like CPU utilization.
Trigger Events: Events trigger scaling actions (e.g., launch or terminate instances).
Launch Configurations: Pre-defined configurations for launched instances.
Changing AMI in Auto-Scaling Group:
Create a new launch configuration with the desired AMI.
Update the auto-scaling group to use the new launch configuration.
Instances launched thereafter will use the updated AMI.
13. If someone created resources in AWS or deleted something, how do you get those details? Which AWS service can help find out these details?
Answer: AWS CloudTrail:
CloudTrail logs all API calls and actions within an AWS account.
Enables tracking of resource creation, deletion, and modification.
Provides detailed information, including the identity of the entity making the call and the time of the call.
14. What is the difference between Latency-Based Routing and Geo DNS?
Answer:
Latency-Based Routing: Routes traffic based on the lowest network latency to improve response times.
Geo DNS: Routes traffic based on the geographic location of the user, directing them to the nearest server for improved performance.
15. What is the difference between a Domain and a Hosted Zone?
Answer:
Domain: Represents a website or application’s address (e.g., example.com).
Hosted Zone: A container for DNS records, mapping domain names to IP addresses.
16. When we create a VPC, what components are created by default?
Answer: By default, creating a VPC in AWS includes:
Main route table.
Default security group.
Default network access control list (ACL).
Default subnet in each Availability Zone.
17. How to recover a CloudFormation stack if it’s stuck in ‘create in progress’ or ‘failed’ status?
Answer:
Check Dependencies: Ensure all dependencies are available.
Review Events: Analyze stack events for insights.
Rollback: If in ‘create in progress,’ consider rolling back and redeploying.
Troubleshoot Errors: Address issues causing failure.
Recreate Stack: In some cases, deleting and recreating the stack might be necessary.
18. What is the use case of AWS Config service?
Answer: AWS Config Service:
Tracks changes to AWS resources.
Provides a detailed inventory of configurations.
Enables compliance checking and security analysis.
19. How to track AWS cloud service changes?
Answer:
Use AWS Config to track configuration changes.
Set up AWS Config rules for real-time alerts on specific changes.
Utilize AWS CloudTrail for detailed API call logging.
20. What is the use of DynamoDB?
Answer: DynamoDB:
Fully managed NoSQL database service by AWS.
Scales seamlessly and provides low-latency access to data.
Suitable for applications with variable and high read/write workloads.
21. If you want to give someone temporary access for like 1 hour, how do you give it? How do you configure that?
Answer: Temporary Access:
Create an AWS Identity and Access Management (IAM) policy with the necessary permissions.
Use AWS Security Token Service (STS) to generate temporary credentials.
Set a time-limited session duration (e.g., 1 hour) when issuing the temporary credentials.
22. What is the difference between AWS managed policy and customer-managed policy?
Answer:
AWS Managed Policy: Created and managed by AWS, provides pre-defined permissions.
Customer-Managed Policy: Customized and managed by the customer, offering more flexibility and control over permissions.
23. How to handle large traffic in Application Load Balancer?
Answer:
Auto Scaling: Dynamically adjust the number of instances based on traffic.
Connection Draining: Gradually redirect traffic away from unhealthy instances during updates.
Use of CDN: Distribute static content globally for reduced load on the ALB.
24. You received a notification from AWS about a potential security breach. What immediate actions will you take to secure the account?
Answer:
Change Credentials: Rotate compromised access keys and passwords.
Review AWS Config: Check for unauthorized changes in AWS Config.
Disable Compromised Access: Temporarily disable compromised access points.
Investigate and Mitigate: Investigate the root cause and take actions to prevent further breaches.
Enable Multi-Factor Authentication (MFA): Enhance security by enabling MFA.
25. If a database administrator mistakenly deleted chunks of data records from the database, how do you recover that?
Answer: Data Recovery Steps:
Backup: Restore from the latest backup.
Point-in-Time Recovery: Use database features for point-in-time recovery.
Transaction Logs: Utilize transaction logs if available.
Database Replication: If applicable, consider replication from a healthy database.
26. How to migrate large data from one S3 bucket to another S3 bucket?
Answer:
AWS CLI: Use
aws s3 synccommand for efficient data transfer.AWS DataSync: For large-scale, fast, and secure data transfer.
AWS Snowball: Physical device for large-scale data transfer.
27. I need one EC2 instance only for 1 hour daily. Which instance types or options will you choose for daily use?
Answer:
Utilize AWS Spot Instances, which offer cost savings for short-term, intermittent workloads.
28. How does a load balancer work? What are the algorithms?
Answer: Load Balancer Workflow:
Distributes incoming traffic across multiple servers.
Ensures even distribution to optimize resource utilization.
Monitors server health and redirects traffic away from unhealthy instances.
Algorithms:
Round Robin: Distributes traffic equally.
Least Connections: Sends traffic to the server with the fewest active connections.
IP Hash: Bases distribution on the client’s IP address.
29. What is Control Tower and Landing Zone?
Answer:
AWS Control Tower: Service for setting up and governing a secure, multi-account AWS environment.
AWS Landing Zone: An environment pre-configured using AWS Control Tower, establishing a secure and scalable multi-account structure.
30. How to check server logs?
Answer:
Use
tailorcatcommands for real-time logs.Explore log directories like
/var/log/for specific logs.Check application-specific log locations.
31. If server performance suddenly slows down, what steps or actions do we need to follow to resolve this issue?
Answer:
Monitor Metrics: Identify performance bottlenecks.
Analyze Logs: Check system and application logs for errors.
Resource Scaling: Adjust resources based on demand.
Optimization: Optimize queries, code, or configurations.
Patch and Updates: Ensure systems are up to date.
32. How to check which installed services are running on which port using a command?
Answer: The netstat command can be used to check which services are running on which ports. For example:
netstat -tulpn33. If I’m unable to log in to my EC2 machine, how do I check the reason and fix it?
Answer: To diagnose login issues:
Check security group settings to ensure SSH access.
Review key pair settings.
Examine system logs using
cloud-init-output.logor/var/log/auth.logfor Linux instances.For Windows instances, check the Event Viewer.
34. What is the best Git branching strategy?
Answer: A popular branching strategy is Gitflow, which involves master, develop, feature, release, and hotfix branches. It ensures a structured approach to feature development, release management, and bug fixing.
35. Explain Git Commands.
Answer: Essential Git commands include:
git clone: Clone a repository.git add: Stage changes for commit.git commit: Commit changes.git pull: Fetch changes from a remote repository.git push: Push changes to a remote repository.git branch: Create, list, or delete branches.
36. How to resolve Git merge conflicts?
Answer: Resolving Git merge conflicts involves:
Identifying conflicted files using
git status.Opening conflicted files and manually resolving conflicts.
Marking conflicts as resolved using
git add.Completing the merge with
git commit.
37. Explain Git troubleshooting.
Answer: Git troubleshooting involves:
Checking for network issues.
Verifying repository permissions.
Examining local configurations using
git config --list.Debugging using
git logandgit reflog.


