<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Karthik Adari]]></title><description><![CDATA[Karthik Adari]]></description><link>https://karthiktechdairy.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg</url><title>Karthik Adari</title><link>https://karthiktechdairy.substack.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 12 Jun 2026 03:36:42 GMT</lastBuildDate><atom:link href="https://karthiktechdairy.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Karthik Adari]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[karthiktechdairy@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[karthiktechdairy@substack.com]]></itunes:email><itunes:name><![CDATA[Karthik Adari]]></itunes:name></itunes:owner><itunes:author><![CDATA[Karthik Adari]]></itunes:author><googleplay:owner><![CDATA[karthiktechdairy@substack.com]]></googleplay:owner><googleplay:email><![CDATA[karthiktechdairy@substack.com]]></googleplay:email><googleplay:author><![CDATA[Karthik Adari]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[10 Data Analyst Projects from GitHub That Can Make Your Resume Stand Out]]></title><description><![CDATA[A practical project list for SQL, Python, Excel, Tableau, Power BI, finance, HR, customer analytics, and business intelligence roles]]></description><link>https://karthiktechdairy.substack.com/p/10-data-analyst-projects-from-github</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/10-data-analyst-projects-from-github</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Wed, 29 Apr 2026 22:01:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FDZ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most data analyst resumes look the same because they only mention tools.</p><p>But strong resumes show <strong>business problems, KPIs, dashboards, SQL analysis, insights, and measurable impact</strong>.</p><p>So here are 10 GitHub projects that are worth studying, rebuilding, and customizing for your own resume. These projects cover different analyst skills like <strong>Excel, SQL, Python, Tableau, Power BI, finance analytics, HR analytics, customer behavior, data warehousing, and dashboard storytelling</strong>.</p><div><hr></div><h2>1. Pizza Sales Analysis Project</h2><p><strong>Github Link -</strong> <a href="https://github.com/ekaterinakham/SQL-Tableau-PowerBI-Excel-Pizza-Sales-Analysis-Project">https://github.com/ekaterinakham/SQL-Tableau-PowerBI-Excel-Pizza-Sales-Analysis-Project</a></p><p><strong>Skills Covered -</strong> SQL, Excel, Power BI, Tableau, KPI dashboarding, sales analysis, revenue analysis</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is one of the best all-in-one beginner-to-intermediate projects because it covers SQL queries, Excel dashboarding, Power BI dashboarding, Tableau dashboarding, sales KPIs, best/worst sellers, revenue trends, and business performance analysis. It shows that you can analyze the same business problem across multiple tools.</p><p><strong>Resume points -</strong><br>i. Built an end-to-end pizza sales analytics project using SQL, Excel, Power BI, and Tableau to analyze revenue, order volume, product performance, and customer demand trends.<br>ii. Created interactive dashboards tracking 10+ KPIs including total revenue, average order value, total pizzas sold, daily trends, monthly trends, and category-level performance.<br>iii. Wrote SQL queries to identify top-selling and low-performing pizza categories, enabling data-driven recommendations for menu optimization and sales strategy.<br>iv. Designed multi-tool dashboard reports that reduced manual sales review effort by organizing key business metrics into one clear reporting workflow.</p><div><hr></div><h2>2. HR Analytics Project</h2><p><strong>Github Link -</strong> <a href="https://github.com/ekaterinakham/PowerBI-Tableau-SQL-Excel-HR-Analytics-Project">https://github.com/ekaterinakham/PowerBI-Tableau-SQL-Excel-HR-Analytics-Project</a></p><p><strong>Skills Covered -</strong> HR analytics, SQL, Excel, Power BI, Tableau, workforce reporting, attrition analysis</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is a great project for people analytics and HR reporting roles. It covers employee metrics, attrition-style workforce analysis, Excel reporting, SQL documentation, Power BI dashboards, and Tableau dashboards. It is especially strong because HR analytics is used in almost every company.</p><p><strong>Resume points -</strong><br>i. Developed an HR analytics dashboard using SQL, Excel, Power BI, and Tableau to monitor employee count, workforce distribution, attrition patterns, and department-level trends.<br>ii. Analyzed employee data across multiple dimensions such as gender, department, job role, education field, and age group to identify workforce risk areas.<br>iii. Built 8+ HR KPIs and visual reports to help stakeholders understand attrition drivers, employee demographics, and retention opportunities.<br>iv. Transformed raw HR data into executive-ready dashboards, improving visibility into workforce health and supporting data-backed HR decisions.</p><div><hr></div><h2>3. Bank Loan Lending Data Analytics</h2><p><strong>Github Link -</strong> <a href="https://github.com/arnavchaturvedi17/Data-Analysis-Bank-Loan-Lending_Data_Analytics">https://github.com/arnavchaturvedi17/Data-Analysis-Bank-Loan-Lending_Data_Analytics</a></p><p><strong>Skills Covered -</strong> Finance analytics, SQL, Tableau, Excel, loan analysis, KPI tracking, risk reporting</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is a strong finance-domain project. It includes loan applications, funded amount, amount received, interest rate, loan status, loan purpose, SQL ETL, Tableau dashboarding, and Excel validation. This is excellent for data analyst, financial analyst, risk analyst, and business analyst resumes.</p><p><strong>Resume points -</strong><br>i. Analyzed bank loan lending data using SQL, Excel, and Tableau to evaluate loan applications, funded amounts, repayment performance, and borrower risk segments.<br>ii. Built financial KPI dashboards tracking 10+ metrics including total loan applications, funded amount, amount received, interest rate, debt-to-income ratio, and loan status.<br>iii. Used SQL to segment good loans versus bad loans and identify patterns across loan purpose, term, grade, employment length, and borrower profile.<br>iv. Created Tableau dashboards to support lending performance review and risk monitoring, helping translate raw loan records into clear business insights.</p><div><hr></div><h2>4. OLA Data Analyst Project</h2><p><strong>Github Link -</strong> <a href="https://github.com/PrajwalGpy/OLA-Data-Analyst-Project-Power-BI-And-SQL">https://github.com/PrajwalGpy/OLA-Data-Analyst-Project-Power-BI-And-SQL</a></p><p><strong>Skills Covered -</strong> SQL, Power BI, ride-booking analytics, customer analytics, driver performance, revenue reporting</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is a solid business analytics project because it analyzes ride volumes, booking status, revenue by payment method, customer behavior, driver ratings, vehicle performance, and cancellation trends. It feels like a real analytics project from a marketplace or transportation company.</p><p><strong>Resume points -</strong><br>i. Built an OLA ride-booking analytics project using SQL and Power BI to analyze booking volume, revenue trends, cancellation patterns, and ride completion performance.<br>ii. Created dashboard views for 10+ operational KPIs including total bookings, successful rides, cancelled rides, revenue by payment method, vehicle type performance, and customer ratings.<br>iii. Used SQL queries to identify customer and driver behavior trends, including cancellation reasons, ride frequency, and revenue contribution by ride category.<br>iv. Delivered Power BI insights to support marketplace operations, customer experience improvement, and driver performance monitoring.</p><div><hr></div><h2>5. Customer Shopping Behavior Analytics</h2><p><strong>Github Link -</strong> <a href="https://github.com/amlanmohanty1/customer-trends-data-analysis-SQL-Python-PowerBI">https://github.com/amlanmohanty1/customer-trends-data-analysis-SQL-Python-PowerBI</a></p><p><strong>Skills Covered -</strong> Python, SQL, Power BI, customer analytics, EDA, reporting, business presentation</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is one of the best end-to-end projects because it includes data import, exploratory analysis, cleaning, SQL loading, business question analysis, Power BI dashboarding, and reporting. It shows a full analyst workflow rather than only a dashboard.</p><p><strong>Resume points -</strong><br>i. Completed an end-to-end customer shopping behavior analysis using Python, SQL, and Power BI to uncover purchasing patterns, customer segments, and product trends.<br>ii. Cleaned and prepared customer transaction data using Python, improving dataset consistency before loading structured tables into SQL for analysis.<br>iii. Answered 15+ business questions using SQL, including sales trends, customer demographics, purchase frequency, product preferences, and revenue drivers.<br>iv. Built a Power BI dashboard and final business report to summarize key insights, helping connect customer behavior patterns with actionable retail recommendations.</p><div><hr></div><h2>6. Cyclistic Bike Share Case Study</h2><p><strong>Github Link -</strong> <a href="https://github.com/SomiaNasir/Google-Data-Analytics-Capstone-Cyclistic-Case-Study">https://github.com/SomiaNasir/Google-Data-Analytics-Capstone-Cyclistic-Case-Study</a></p><p><strong>Skills Covered -</strong> SQL, BigQuery, Tableau, business case study, customer behavior analysis, data storytelling</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This project is strong because it follows a structured analyst process: Ask, Prepare, Process, Analyze, Share, and Act. It also includes SQL queries and Tableau visualizations. It is a good project for entry-level data analyst resumes because it shows business thinking, not just technical skills.</p><p><strong>Resume points -</strong><br>i. Conducted a Cyclistic bike-share case study using SQL and Tableau to compare usage behavior between casual riders and annual members.<br>ii. Processed and analyzed 12 months of trip data to identify patterns in ride duration, weekday usage, seasonal demand, and customer segment behavior.<br>iii. Built Tableau dashboards to visualize member conversion opportunities, peak usage periods, and differences in riding habits across customer groups.<br>iv. Recommended data-backed marketing strategies to increase annual memberships by targeting high-frequency casual riders and weekend-heavy users.</p><div><hr></div><h2>7. SQL Data Warehouse and Analytics Project</h2><p><strong>Github Link -</strong> <a href="https://github.com/DataWithBaraa/sql-data-warehouse-project">https://github.com/DataWithBaraa/sql-data-warehouse-project</a></p><p>Skills Covered - SQL Server, ETL, data warehousing, star schema, data modeling, reporting, business analytics</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This project can make a resume stand out because it goes beyond regular dashboarding. It covers data warehouse architecture, ETL, bronze/silver/gold layers, fact and dimension modeling, data quality checks, and SQL-based reporting. This is especially useful for data analyst, BI analyst, and analytics engineer roles.</p><p><strong>Resume points -</strong><br>i. Designed a SQL-based data warehouse using bronze, silver, and gold layers to transform raw sales data into structured analytics-ready tables.<br>ii. Built ETL workflows and data quality checks to clean, standardize, and validate customer, product, and sales datasets before reporting.<br>iii. Created fact and dimension tables using star schema modeling to support scalable reporting across customer behavior, product performance, and sales trends.<br>iv. Developed SQL analytics reports covering revenue trends, customer segmentation, product performance, and business growth metrics for BI use cases.</p><div><hr></div><h2>8. Data Analysis Portfolio by Rebekah</h2><p><strong>Github Link -</strong> <a href="https://github.com/rebekah999/Data-Analysis-Portfolio">https://github.com/rebekah999/Data-Analysis-Portfolio</a></p><p><strong>Skills Covered -</strong> PostgreSQL, Excel, Python, EDA, sales analysis, inventory analysis, churn analysis</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is a strong reference portfolio because it includes multiple analyst-style projects. It covers SQL analysis, Excel exploration, property sales dashboards, S&amp;P 500 data pipeline work, and employee churn analysis. It is helpful if you want to understand how to organize several projects in one GitHub portfolio.</p><p><strong>Resume points -</strong><br>i. Built a multi-project data analysis portfolio covering SQL, Excel, Python, sales analytics, inventory analysis, employee churn, and financial market data.<br>ii. Used PostgreSQL to analyze business datasets across orders, revenue, customers, inventory, and employee performance, answering 20+ analytical questions.<br>iii. Created Excel and dashboard-based reports to summarize sales trends, product performance, customer behavior, and operational efficiency.<br>iv. Organized multiple analysis projects into a clean GitHub portfolio structure, improving project readability for recruiters and hiring managers.</p><div><hr></div><h2>9. Maven Toys Sales Project Analysis</h2><p><strong>Github Link -</strong> <a href="https://github.com/Yash-Yennewar/Maven_Toys_Sales_Project_Analysis">https://github.com/Yash-Yennewar/Maven_Toys_Sales_Project_Analysis</a></p><p><strong>Skills Covered -</strong> Power BI, DAX, data modeling, retail analytics, inventory analysis, sales performance</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This is a good Power BI portfolio project. It uses a realistic retail dataset and focuses on revenue, profit, inventory efficiency, store performance, DAX calculations, relationships, maps, slicers, and business storytelling. This is a strong choice for anyone targeting BI analyst or Power BI analyst roles.</p><p><strong>Resume points -</strong><br>i. Developed a Power BI sales analytics dashboard for Maven Toys to monitor revenue, profit, store performance, product demand, and inventory efficiency.<br>ii. Built DAX measures and data model relationships to calculate 10+ business KPIs including total sales, profit margin, units sold, stock levels, and store-level performance.<br>iii. Analyzed product and location-level trends to identify high-performing stores, slow-moving products, and inventory optimization opportunities.<br>iv. Designed an interactive retail dashboard with slicers, maps, and category-level drilldowns to support faster business performance review.</p><div><hr></div><h2>10. Alex The Analyst Portfolio Projects</h2><p><strong>Github Link -</strong> <a href="https://github.com/AlexTheAnalyst/PortfolioProjects">https://github.com/AlexTheAnalyst/PortfolioProjects</a></p><p><strong>Skills Covered -</strong> SQL, Python, data cleaning, Tableau, web scraping, API extraction, EDA</p><p><strong>Why it&#8217;s strong for resume -</strong><br>This repo is popular and useful for learning project structure. It includes SQL exploration, Nashville housing data cleaning, Tableau SQL queries, Python notebooks, web scraping, and API extraction. Use it as a reference, but customize your own version because many candidates already use this repo.</p><p><strong>Resume points -</strong><br>i. Completed multiple portfolio projects using SQL, Python, Tableau, web scraping, and API extraction to demonstrate end-to-end data analysis skills.<br>ii. Cleaned and transformed real-world datasets using SQL, including handling missing values, standardizing fields, removing duplicates, and preparing data for visualization.<br>iii. Performed exploratory analysis using SQL and Python to identify trends, patterns, and business insights across housing, COVID, and public datasets.<br>iv. Built Tableau-ready datasets and dashboards to communicate findings clearly through visual storytelling and stakeholder-friendly reporting.</p><div><hr></div><h4>DEMO RESUME - <a href="https://www.overleaf.com/read/zyrbvfgcxbkz#e25c44">Link</a></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FDZ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FDZ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FDZ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg" width="1275" height="1650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1650,&quot;width&quot;:1275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:867159,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://karthiktechdairy.substack.com/i/195921263?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FDZ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FDZ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384b3f89-df6b-43e2-8cb9-9e354e6430f0_1275x1650.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Ending Note</h2><p>A strong data analyst resume does not need 20 projects.</p><p>It needs <strong>3 solid projects</strong> that show:</p><p>SQL thinking,<br>dashboard storytelling,<br>business understanding,<br>clean data work,<br>and measurable insights.</p><p>My recommendation:</p><p>Build one project with <strong>SQL + Python</strong>,<br>one project with <strong>Power BI or Tableau</strong>,<br>and one project from a real business domain like <strong>finance, HR, retail, healthcare, or customer analytics</strong>.</p><p>Don&#8217;t just copy these GitHub projects. Rebuild them, improve the dashboards, add your own insights, and write stronger resume bullets around the business impact.</p>]]></content:encoded></item><item><title><![CDATA[Netflix Data Science Interview Questions: 18 Problems That Test How You Think]]></title><description><![CDATA[A practical breakdown of messy interview questions across experimentation, SQL, product analytics, statistics, and business strategy.]]></description><link>https://karthiktechdairy.substack.com/p/netflix-data-science-interview-questions</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/netflix-data-science-interview-questions</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Wed, 29 Apr 2026 15:54:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most analytics and data science interviews are not just testing formulas.</p><p>They are testing whether you can take a messy business problem, slow it down, structure it, make reasonable assumptions, and explain your thinking without getting lost.</p><p>Here are detailed, human-friendly answers to 18 common interview questions across fraud, A/B testing, regression, SQL, product analytics, experimentation, and Netflix-style business cases.</p><div><hr></div><h2>&#9312; Given a month&#8217;s worth of login data with account_id, device_id, and payment-related metadata, how would you detect fraud?</h2><p>I would not start by building a model immediately.</p><p>I would first ask: <strong>What type of fraud are we trying to catch?</strong></p><p>For example:</p><ul><li><p>Account takeover</p></li><li><p>Fake accounts</p></li><li><p>Shared or resold accounts</p></li><li><p>Payment abuse</p></li><li><p>Stolen card usage</p></li><li><p>Promo abuse</p></li><li><p>Refund abuse</p></li><li><p>Bot-driven login behavior</p></li></ul><p>Then I would structure the data around three main entities:</p><p><strong>Account level</strong></p><ul><li><p>How many devices used the same account?</p></li><li><p>How many payment methods were attached?</p></li><li><p>How many failed payments happened?</p></li><li><p>Did login location or device suddenly change?</p></li><li><p>Was there a sudden spike in activity?</p></li></ul><p><strong>Device level</strong></p><ul><li><p>How many accounts used the same device?</p></li><li><p>Did one device create or access many accounts?</p></li><li><p>Does the device switch between many payment methods?</p></li><li><p>Is the device linked to accounts with failed payments or chargebacks?</p></li></ul><p><strong>Payment level</strong></p><ul><li><p>Are multiple accounts using the same card?</p></li><li><p>Are there many failed payments before success?</p></li><li><p>Are billing country and login country very different?</p></li><li><p>Are disposable cards or suspicious BIN patterns involved?</p></li><li><p>Is the same payment method reused across unrelated accounts?</p></li></ul><p>A few useful features:</p><ul><li><p><code>accounts_per_device</code></p></li><li><p><code>devices_per_account</code></p></li><li><p><code>payment_methods_per_account</code></p></li><li><p><code>accounts_per_payment_method</code></p></li><li><p><code>failed_payment_count</code></p></li><li><p><code>chargeback_count</code></p></li><li><p><code>login_country_count</code></p></li><li><p><code>new_device_login_flag</code></p></li><li><p><code>velocity_of_logins</code></p></li><li><p><code>time_between_account_creation_and_payment</code></p></li><li><p><code>payment_failure_rate</code></p></li></ul><p>I would also use graph thinking.</p><p>For example:</p><ul><li><p>Account A uses Device X</p></li><li><p>Device X also uses Account B, C, D, and E</p></li><li><p>Account B and C both used the same payment method</p></li><li><p>Account D had a chargeback</p></li></ul><p>That cluster is more suspicious than looking at one login row alone.</p><p>A simple fraud score could look like this:</p><pre><code><code>WITH device_usage AS (
    SELECT
        device_id,
        COUNT(DISTINCT account_id) AS accounts_per_device
    FROM logins
    GROUP BY device_id
),

account_usage AS (
    SELECT
        account_id,
        COUNT(DISTINCT device_id) AS devices_per_account
    FROM logins
    GROUP BY account_id
),

payment_usage AS (
    SELECT
        payment_method_id,
        COUNT(DISTINCT account_id) AS accounts_per_payment
    FROM payments
    GROUP BY payment_method_id
),

account_payment AS (
    SELECT
        account_id,
        COUNT(*) AS total_payments,
        SUM(CASE WHEN payment_status = 'failed' THEN 1 ELSE 0 END) AS failed_payments,
        SUM(CASE WHEN payment_status = 'chargeback' THEN 1 ELSE 0 END) AS chargebacks
    FROM payments
    GROUP BY account_id
)

SELECT
    l.account_id,
    MAX(d.accounts_per_device) AS max_accounts_per_device,
    a.devices_per_account,
    MAX(pu.accounts_per_payment) AS max_accounts_per_payment,
    ap.failed_payments,
    ap.chargebacks,
    CASE
        WHEN MAX(d.accounts_per_device) &gt;= 5 THEN 1 ELSE 0
    END AS shared_device_flag,
    CASE
        WHEN a.devices_per_account &gt;= 4 THEN 1 ELSE 0
    END AS many_devices_flag,
    CASE
        WHEN ap.failed_payments &gt;= 3 THEN 1 ELSE 0
    END AS payment_failure_flag
FROM logins l
LEFT JOIN device_usage d
    ON l.device_id = d.device_id
LEFT JOIN account_usage a
    ON l.account_id = a.account_id
LEFT JOIN payments p
    ON l.account_id = p.account_id
LEFT JOIN payment_usage pu
    ON p.payment_method_id = pu.payment_method_id
LEFT JOIN account_payment ap
    ON l.account_id = ap.account_id
GROUP BY
    l.account_id,
    a.devices_per_account,
    ap.failed_payments,
    ap.chargebacks;
</code></code></pre><p>After that, I would decide whether to use:</p><ul><li><p>Rule-based detection for obvious fraud</p></li><li><p>Anomaly detection for unknown patterns</p></li><li><p>Supervised ML if we have labels like confirmed fraud, chargeback, banned account</p></li><li><p>Graph-based detection if fraud happens in connected groups</p></li></ul><p>Most importantly, I would measure precision and recall.</p><p>In fraud, catching everything sounds good, but false positives can hurt real customers. So I would not only ask, &#8220;How much fraud did we catch?&#8221; I would also ask, &#8220;How many good users did we block?&#8221;</p><div><hr></div><h2>&#9313; What are the assumptions of A/B testing?</h2><p>A/B testing looks simple, but it depends on many assumptions.</p><p>The main assumptions are:</p><p><strong>1. Random assignment</strong></p><p>Users should be randomly assigned to control and treatment. If one group gets more new users or more high-value users, the test becomes biased.</p><p><strong>2. Independence</strong></p><p>One user&#8217;s treatment should not affect another user&#8217;s outcome. This is also called no interference.</p><p>For example, if testing a social feature, one user seeing the feature may affect their friends. That breaks independence.</p><p><strong>3. Stable experience</strong></p><p>Control users should actually see control. Treatment users should actually see treatment. Bugs, logging issues, or delayed feature exposure can ruin the test.</p><p><strong>4. Same measurement logic</strong></p><p>Metrics should be calculated the same way for both groups.</p><p>If revenue is logged differently in treatment, the result may look significant even if the product did not improve.</p><p><strong>5. No sample ratio mismatch</strong></p><p>If the test is supposed to be 50 percent control and 50 percent treatment, the actual split should be close to that. If it becomes 70 and 30, something may be wrong.</p><p><strong>6. Enough time to capture behavior</strong></p><p>Some metrics need time.</p><p>A button click may show up quickly. Retention, refunds, churn, or subscription renewal needs more time.</p><p><strong>7. Metrics should be decided before the test</strong></p><p>If we keep checking 30 metrics and only report the one that looks good, we may fool ourselves.</p><p><strong>8. No repeated peeking without correction</strong></p><p>Checking results every hour and stopping when the p-value looks good increases false positives.</p><p><strong>9. The test population should match the decision</strong></p><p>If we only test on new users, we should be careful before applying the result to all users.</p><p>A/B testing is not just about splitting traffic. It is about making sure the comparison is fair.</p><div><hr></div><h2>&#9314; If you have one day of experiment data, large sample size, and significant results, would you stop the experiment?</h2><p>Usually, no.</p><p>A large sample size and a significant result after one day are not enough.</p><p>I would first check:</p><ul><li><p>Was the experiment planned to run only one day?</p></li><li><p>Is the metric immediate or delayed?</p></li><li><p>Are there weekday or weekend effects?</p></li><li><p>Is there novelty effect?</p></li><li><p>Is there a sample ratio mismatch?</p></li><li><p>Are guardrail metrics healthy?</p></li><li><p>Did logging work correctly?</p></li><li><p>Are new and returning users reacting differently?</p></li><li><p>Did the result hold across important segments?</p></li><li><p>Did we account for repeated peeking?</p></li></ul><p>One day can be misleading.</p><p>For example, a new homepage design may increase clicks on day one because users are curious. But after a week, the effect may disappear.</p><p>Or a pricing experiment may look good after one day because revenue per visitor increased, but refunds or cancellations may show up later.</p><p>I would stop early only if:</p><ul><li><p>The test had a pre-defined early stopping rule</p></li><li><p>There is a strong harm signal</p></li><li><p>There is a clear business reason to stop</p></li><li><p>Sequential testing methods were used</p></li><li><p>The effect is extremely large and stable across checks</p></li></ul><p>My interview answer would be:</p><p>&#8220;I would not stop just because the p-value is significant after one day. I would check the experiment design, guardrail metrics, sample ratio, novelty effects, and whether the metric needs more time. Unless early stopping was planned, I would continue until the pre-defined duration or decision rule is reached.&#8221;</p><div><hr></div><h2>&#9315; How do you know if one algorithm is better than another?</h2><p>An algorithm is not &#8220;better&#8221; in general. It is better for a specific goal.</p><p>I would compare algorithms using four layers.</p><p><strong>1. Offline model performance</strong></p><p>For classification:</p><ul><li><p>Accuracy</p></li><li><p>Precision</p></li><li><p>Recall</p></li><li><p>F1 score</p></li><li><p>AUC</p></li><li><p>Log loss</p></li><li><p>Calibration</p></li></ul><p>For regression:</p><ul><li><p>MAE</p></li><li><p>RMSE</p></li><li><p>MAPE</p></li><li><p>R&#178;</p></li><li><p>Residual behavior</p></li></ul><p>But the metric should match the business problem.</p><p>For fraud detection, recall may matter because missing fraud is expensive.</p><p>For spam detection, precision may matter because blocking real messages is painful.</p><p><strong>2. Validation setup</strong></p><p>I would make sure both algorithms are trained and tested fairly.</p><p>That means:</p><ul><li><p>Same train and test split</p></li><li><p>No data leakage</p></li><li><p>Same features</p></li><li><p>Same evaluation period</p></li><li><p>Cross-validation if needed</p></li><li><p>Time-based split for time-sensitive data</p></li></ul><p><strong>3. Business impact</strong></p><p>A model can have better AUC but worse business value.</p><p>For example, one model may improve fraud detection by 2 percent but create too many false positives. Another may have slightly lower AUC but better customer experience.</p><p>So I would ask:</p><ul><li><p>Does it improve revenue?</p></li><li><p>Does it reduce risk?</p></li><li><p>Does it reduce manual review?</p></li><li><p>Does it improve user experience?</p></li><li><p>Does it meet latency requirements?</p></li></ul><p><strong>4. Practical constraints</strong></p><p>I would also compare:</p><ul><li><p>Training time</p></li><li><p>Prediction speed</p></li><li><p>Interpretability</p></li><li><p>Maintenance cost</p></li><li><p>Stability over time</p></li><li><p>Fairness across user groups</p></li><li><p>Ease of deployment</p></li></ul><p>The best answer is not always the fanciest model.</p><p>Sometimes logistic regression is better than a deep model because it is fast, stable, explainable, and good enough.</p><div><hr></div><h2>&#9316; How do you interpret R&#178; in regression?</h2><p>R&#178; tells us how much of the variation in the target variable is explained by the model.</p><p>For example, if R&#178; is 0.72, I would say:</p><p>&#8220;The model explains 72 percent of the variation in the outcome using the features included in the model.&#8221;</p><p>It does not mean:</p><ul><li><p>The model is 72 percent accurate</p></li><li><p>The model is correct 72 percent of the time</p></li><li><p>The features cause 72 percent of the outcome</p></li><li><p>The model will perform well on new data automatically</p></li></ul><p>A high R&#178; can still be bad if:</p><ul><li><p>There is data leakage</p></li><li><p>The model overfits</p></li><li><p>Important assumptions are broken</p></li><li><p>Residuals show patterns</p></li><li><p>The model performs poorly on new data</p></li></ul><p>A low R&#178; is not always useless either.</p><p>In human behavior, marketing, finance, and social data, outcomes are noisy. A lower R&#178; can still support useful decisions if the model improves forecasting or ranking.</p><p>I would also look at:</p><ul><li><p>Adjusted R&#178;</p></li><li><p>RMSE or MAE</p></li><li><p>Residual plots</p></li><li><p>Out-of-sample performance</p></li><li><p>Whether the model makes business sense</p></li></ul><p>Simple interview line:</p><p>&#8220;R&#178; measures explained variance, not accuracy or causality. I would use it with error metrics and validation performance before trusting the model.&#8221;</p><div><hr></div><h2>&#9317; What is FDR? What are the pitfalls in multiple testing?</h2><p>FDR means <strong>False Discovery Rate</strong>.</p><p>It is the expected proportion of false positives among the results we call significant.</p><p>Example:</p><p>If we test 100 features and declare 20 significant, an FDR of 10 percent means we expect around 2 of those 20 discoveries to be false positives.</p><p>This matters because when we run many tests, some will look significant just by chance.</p><p>If we use p &lt; 0.05 and run 100 independent tests, we may get around 5 false positives even if nothing real is happening.</p><p>Common pitfalls:</p><p><strong>1. Testing too many metrics</strong></p><p>If we check revenue, clicks, watch time, retention, churn, refund rate, search usage, and 50 segments, something will look significant.</p><p><strong>2. Looking at results repeatedly</strong></p><p>If we keep checking daily and stop when p &lt; 0.05, we increase the chance of a false positive.</p><p><strong>3. Changing hypotheses after seeing results</strong></p><p>This is common in real life. People look at the data, find a nice story, then act like that was the original hypothesis.</p><p><strong>4. Segment fishing</strong></p><p>The overall result may be neutral, but one tiny segment looks amazing. That could be noise.</p><p><strong>5. Ignoring dependency between tests</strong></p><p>Metrics are often related. Clicks, sessions, engagement, and retention are not fully independent.</p><p>Ways to handle it:</p><ul><li><p>Pre-define primary and secondary metrics</p></li><li><p>Use Benjamini-Hochberg correction for FDR</p></li><li><p>Use Bonferroni when you want stricter control</p></li><li><p>Avoid making big decisions from small segments</p></li><li><p>Validate surprising findings in a follow-up experiment</p></li><li><p>Separate exploratory analysis from confirmatory analysis</p></li></ul><p>Simple interview answer:</p><p>&#8220;FDR controls the proportion of false discoveries among significant findings. The main pitfall in multiple testing is that the more tests we run, the more likely we are to find something significant by luck.&#8221;</p><div><hr></div><h2>&#9318; Explain regression coefficients, R&#178;, Type I error, and Type II error.</h2><p>Let&#8217;s take them one by one.</p><h3>Regression coefficients</h3><p>A regression coefficient tells us the expected change in the target variable when one feature increases by one unit, holding other variables constant.</p><p>Example:</p><p>If we predict monthly spend and the coefficient for <code>sessions_per_month</code> is 3.5, then one extra session is associated with 3.5 more dollars in monthly spend, assuming other variables stay the same.</p><p>For a binary variable:</p><p>If <code>premium_user = 1</code> has a coefficient of 20, premium users spend 20 dollars more on average than non-premium users, holding other features constant.</p><p>Important note:</p><p>Regression coefficients show association, not automatically causation.</p><h3>R&#178;</h3><p>R&#178; tells us how much variation in the target variable is explained by the model.</p><p>If R&#178; is 0.60, the model explains 60 percent of the variation in the outcome.</p><p>It does not mean the model is 60 percent accurate.</p><h3>Type I error</h3><p>Type I error means false positive.</p><p>We say there is an effect when there is actually no effect.</p><p>Example:</p><p>We say a new checkout page increases purchases, but in reality, it does not.</p><h3>Type II error</h3><p>Type II error means false negative.</p><p>We fail to detect an effect even though there is a real effect.</p><p>Example:</p><p>A new recommendation model actually improves retention, but our test does not detect it because the sample size was too small.</p><p>Simple memory:</p><ul><li><p>Type I error: false alarm</p></li><li><p>Type II error: missed signal</p></li></ul><p>In business, both matter.</p><p>A Type I error can make us launch a bad feature.</p><p>A Type II error can make us reject a good feature.</p><div><hr></div><h2>&#9319; Explain ETL considerations for Big Data.</h2><p>For Big Data ETL, I would think beyond just moving data from one place to another.</p><p>I would consider the full lifecycle: ingestion, transformation, storage, quality, cost, security, and monitoring.</p><h3>1. Data volume</h3><p>How much data are we processing?</p><ul><li><p>Millions of rows?</p></li><li><p>Billions of rows?</p></li><li><p>Terabytes per day?</p></li><li><p>Streaming or batch?</p></li></ul><p>This affects whether we use tools like Spark, Flink, BigQuery, Snowflake, Databricks, Kafka, or cloud storage.</p><h3>2. Data velocity</h3><p>Is the data coming in real time or once per day?</p><p>For example:</p><ul><li><p>Login events may need near real-time processing</p></li><li><p>Finance reports may be fine as daily batch jobs</p></li></ul><h3>3. Data variety</h3><p>Data can come from many formats:</p><ul><li><p>CSV</p></li><li><p>JSON</p></li><li><p>Parquet</p></li><li><p>Avro</p></li><li><p>Logs</p></li><li><p>APIs</p></li><li><p>Database tables</p></li><li><p>Event streams</p></li></ul><p>The ETL design should handle schema changes and messy fields.</p><h3>4. Partitioning</h3><p>Good partitioning improves speed and lowers cost.</p><p>Common partitions:</p><ul><li><p>Date</p></li><li><p>Region</p></li><li><p>Product</p></li><li><p>Event type</p></li></ul><p>Bad partitioning can create slow queries or too many small files.</p><h3>5. Incremental processing</h3><p>We should avoid reprocessing everything daily if only new records changed.</p><p>Useful approaches:</p><ul><li><p>Incremental loads</p></li><li><p>Change Data Capture</p></li><li><p>Watermarks</p></li><li><p>Last updated timestamps</p></li></ul><h3>6. Late-arriving data</h3><p>In Big Data systems, events do not always arrive on time.</p><p>For example, a purchase event may arrive today but belong to yesterday.</p><p>The pipeline should support late data correction.</p><h3>7. Data quality checks</h3><p>I would add checks for:</p><ul><li><p>Null values</p></li><li><p>Duplicate rows</p></li><li><p>Invalid IDs</p></li><li><p>Negative revenue</p></li><li><p>Future timestamps</p></li><li><p>Broken joins</p></li><li><p>Unexpected row count changes</p></li></ul><h3>8. Idempotency</h3><p>If a job runs twice, it should not double-count data.</p><p>This is very important.</p><p>A pipeline should be safe to rerun.</p><h3>9. Monitoring</h3><p>I would monitor:</p><ul><li><p>Job failures</p></li><li><p>Runtime</p></li><li><p>Row counts</p></li><li><p>Freshness</p></li><li><p>Data drift</p></li><li><p>Cost spikes</p></li><li><p>Schema changes</p></li></ul><h3>10. Security and privacy</h3><p>Payment, user, and personal data need careful handling.</p><p>I would consider:</p><ul><li><p>Encryption</p></li><li><p>Access control</p></li><li><p>PII masking</p></li><li><p>Audit logs</p></li><li><p>Retention policies</p></li></ul><p>Simple interview answer:</p><p>&#8220;Big Data ETL is not just about extracting and loading data. I would think about scale, partitioning, incremental loads, late data, data quality, idempotency, monitoring, cost, and security.&#8221;</p><div><hr></div><h2>&#9320; What is a p-value? What does 0.05 actually mean?</h2><p>A p-value tells us how surprising the observed result is if the null hypothesis were true.</p><p>The null hypothesis usually says there is no effect.</p><p>So if p-value = 0.05, it means:</p><p>&#8220;If there were truly no effect, there is a 5 percent chance of seeing a result this extreme or more extreme due to random chance.&#8221;</p><p>It does not mean:</p><ul><li><p>There is a 95 percent chance the treatment works</p></li><li><p>There is a 5 percent chance the result is false</p></li><li><p>The effect is important</p></li><li><p>The result is practically meaningful</p></li><li><p>The experiment was designed correctly</p></li></ul><p>A tiny p-value can happen with a huge sample size even when the effect is very small.</p><p>For example, a button color change may increase click rate from 10.00 percent to 10.03 percent. With millions of users, that may be statistically significant, but maybe not worth launching.</p><p>I would always look at:</p><ul><li><p>Effect size</p></li><li><p>Confidence interval</p></li><li><p>Business impact</p></li><li><p>Experiment design</p></li><li><p>Guardrail metrics</p></li><li><p>Whether the test was pre-planned</p></li><li><p>Whether multiple testing was handled</p></li></ul><p>Simple interview answer:</p><p>&#8220;A p-value of 0.05 means the observed result would be rare under the no-effect assumption. It does not prove the treatment works. I would also look at effect size, confidence intervals, and business value.&#8221;</p><div><hr></div><h2>&#9321; Solve a SQL problem with multiple constraints, joins, and averages.</h2><p>Let&#8217;s use a realistic example.</p><h3>Problem</h3><p>You have three tables:</p><p><code>users</code></p><pre><code><code>user_id
signup_date
country
</code></code></pre><p><code>orders</code></p><pre><code><code>order_id
user_id
order_date
order_amount
status
</code></code></pre><p><code>sessions</code></p><pre><code><code>session_id
user_id
session_date
device_type
</code></code></pre><p>Question:</p><p>Find the average completed order amount per active user in March 2026, by country, only for countries with at least 100 active users.</p><p>An active user is a user with at least one session in March 2026.</p><p>Only include completed orders from March 2026.</p><h3>SQL</h3><pre><code><code>WITH march_active_users AS (
    SELECT DISTINCT
        user_id
    FROM sessions
    WHERE session_date &gt;= DATE '2026-03-01'
      AND session_date &lt; DATE '2026-04-01'
),

march_completed_orders AS (
    SELECT
        user_id,
        SUM(order_amount) AS total_order_amount,
        COUNT(order_id) AS completed_order_count
    FROM orders
    WHERE order_date &gt;= DATE '2026-03-01'
      AND order_date &lt; DATE '2026-04-01'
      AND status = 'completed'
    GROUP BY user_id
),

user_level AS (
    SELECT
        u.user_id,
        u.country,
        COALESCE(o.total_order_amount, 0) AS total_order_amount,
        COALESCE(o.completed_order_count, 0) AS completed_order_count
    FROM users u
    JOIN march_active_users a
        ON u.user_id = a.user_id
    LEFT JOIN march_completed_orders o
        ON u.user_id = o.user_id
)

SELECT
    country,
    COUNT(DISTINCT user_id) AS active_users,
    SUM(total_order_amount) AS total_revenue,
    SUM(completed_order_count) AS completed_orders,
    AVG(total_order_amount) AS avg_order_amount_per_active_user
FROM user_level
GROUP BY country
HAVING COUNT(DISTINCT user_id) &gt;= 100
ORDER BY avg_order_amount_per_active_user DESC;
</code></code></pre><h3>How I would explain it</h3><p>I first identify active users from the sessions table.</p><p>Then I calculate completed orders for March at the user level.</p><p>Then I join users to active users and left join orders, because active users with zero orders should still be included.</p><p>Finally, I aggregate by country and filter to countries with at least 100 active users.</p><p>The key detail is the left join. If I used an inner join to orders, I would only include users who purchased, which would inflate the average.</p><div><hr></div><h2>&#9322; Work through a product analytics case study.</h2><p>Let&#8217;s say the case is:</p><p>&#8220;Netflix sees a drop in weekly watch time. How would you investigate?&#8221;</p><p>I would start by breaking the metric down.</p><p>Weekly watch time can drop because:</p><ul><li><p>Fewer users are active</p></li><li><p>Active users are watching less</p></li><li><p>Sessions are shorter</p></li><li><p>Fewer sessions per user</p></li><li><p>Content discovery is worse</p></li><li><p>Playback issues increased</p></li><li><p>New content quality is weaker</p></li><li><p>Pricing or account changes affected behavior</p></li><li><p>A specific region or device is causing the drop</p></li></ul><p>I would decompose it like this:</p><pre><code><code>Total Watch Time =
Active Users
x Sessions per Active User
x Plays per Session
x Minutes Watched per Play
</code></code></pre><p>Then I would check:</p><p><strong>1. Is this a real drop or a data issue?</strong></p><ul><li><p>Did event logging change?</p></li><li><p>Did a pipeline fail?</p></li><li><p>Is watch time missing for some devices?</p></li><li><p>Did the definition of watch time change?</p></li></ul><p><strong>2. Where is the drop happening?</strong></p><p>Break down by:</p><ul><li><p>Country</p></li><li><p>Device type</p></li><li><p>New vs returning users</p></li><li><p>Plan type</p></li><li><p>App version</p></li><li><p>Content category</p></li><li><p>Acquisition channel</p></li></ul><p><strong>3. When did it start?</strong></p><p>Check if the drop aligns with:</p><ul><li><p>Product release</p></li><li><p>Pricing change</p></li><li><p>Content release schedule</p></li><li><p>Competitor event</p></li><li><p>Holiday</p></li><li><p>Sports event</p></li><li><p>App outage</p></li></ul><p><strong>4. Which part of the funnel changed?</strong></p><p>For streaming:</p><ul><li><p>App open rate</p></li><li><p>Homepage impressions</p></li><li><p>Title clicks</p></li><li><p>Play starts</p></li><li><p>Playback errors</p></li><li><p>Completion rate</p></li><li><p>Search usage</p></li><li><p>Recommendation CTR</p></li></ul><p><strong>5. What actions would I recommend?</strong></p><p>If discovery dropped, I would look at recommendations and homepage ranking.</p><p>If playback errors increased, I would escalate to engineering.</p><p>If content engagement dropped, I would analyze catalog freshness and title-level performance.</p><p>If only new users dropped, I would inspect onboarding.</p><p>A strong product analytics answer always moves from metric to diagnosis to action.</p><div><hr></div><h2>&#9323; Explain deeper A/B testing ideas like variance, bootstrap, covariate adjustment, and treatment effects.</h2><h3>Variance</h3><p>Variance tells us how noisy a metric is.</p><p>High variance makes it harder to detect a real effect.</p><p>For example, revenue per user usually has high variance because a few users spend a lot and many spend nothing.</p><p>If variance is high, we may need:</p><ul><li><p>Larger sample size</p></li><li><p>Longer experiment duration</p></li><li><p>Better metric design</p></li><li><p>Winsorization for extreme outliers</p></li><li><p>Covariate adjustment</p></li></ul><h3>Bootstrap</h3><p>Bootstrap is a resampling method.</p><p>Instead of relying only on formulas, we repeatedly sample from the data with replacement and estimate the metric many times.</p><p>This gives us an empirical distribution of the metric.</p><p>Bootstrap is useful when:</p><ul><li><p>The metric is not normally distributed</p></li><li><p>The formula for standard error is messy</p></li><li><p>The metric is a ratio</p></li><li><p>We want confidence intervals in a practical way</p></li></ul><p>Example:</p><p>If we want confidence intervals for revenue per user, bootstrap can help because revenue data is often skewed.</p><h3>Covariate adjustment</h3><p>Covariate adjustment uses pre-experiment information to reduce noise.</p><p>For example, if we know each user&#8217;s watch time before the experiment, we can adjust for it.</p><p>This helps because users are naturally different.</p><p>Some users watch a lot. Some barely watch. If we control for past behavior, the treatment effect estimate can become more precise.</p><p>Common examples:</p><ul><li><p>Pre-period revenue</p></li><li><p>Pre-period engagement</p></li><li><p>User tenure</p></li><li><p>Country</p></li><li><p>Device type</p></li><li><p>Plan type</p></li></ul><p>One popular method is CUPED, which adjusts the outcome using pre-experiment behavior.</p><h3>Treatment effects</h3><p>The treatment effect is the difference between treatment and control.</p><p>Basic version:</p><pre><code><code>Treatment Effect = Average outcome in treatment - Average outcome in control
</code></code></pre><p>But we may also care about:</p><p><strong>Average Treatment Effect</strong></p><p>The overall average impact across all users.</p><p><strong>Heterogeneous Treatment Effect</strong></p><p>The effect is different across groups.</p><p>For example:</p><ul><li><p>New users benefit</p></li><li><p>Existing users do not</p></li><li><p>Mobile users benefit</p></li><li><p>TV users do not</p></li></ul><p><strong>Intent-to-treat effect</strong></p><p>This measures users based on assigned group, even if they did not fully experience the treatment.</p><p>This preserves randomization.</p><p><strong>Treatment-on-treated effect</strong></p><p>This measures the effect only among users who actually received or used the treatment.</p><p>This can be useful, but it may introduce bias if not handled carefully.</p><div><hr></div><h2>&#9324; Solve a live SQL + metrics case around a gift card program.</h2><h3>Problem</h3><p>A company launched a gift card program.</p><p>Tables:</p><p><code>gift_cards</code></p><pre><code><code>gift_card_id
buyer_user_id
recipient_user_id
purchase_date
gift_card_amount
</code></code></pre><p><code>redemptions</code></p><pre><code><code>redemption_id
gift_card_id
redeem_date
redeemed_amount
</code></code></pre><p><code>orders</code></p><pre><code><code>order_id
user_id
order_date
order_amount
payment_type
</code></code></pre><p>Questions:</p><ol><li><p>What percent of gift cards are redeemed within 30 days?</p></li><li><p>What is the average redeemed amount?</p></li><li><p>Do recipients spend more than the gift card amount?</p></li></ol><h3>SQL</h3><pre><code><code>WITH gift_card_base AS (
    SELECT
        gift_card_id,
        buyer_user_id,
        recipient_user_id,
        purchase_date,
        gift_card_amount
    FROM gift_cards
),

redemption_summary AS (
    SELECT
        gift_card_id,
        MIN(redeem_date) AS first_redeem_date,
        SUM(redeemed_amount) AS total_redeemed_amount
    FROM redemptions
    GROUP BY gift_card_id
),

recipient_orders_after_gift AS (
    SELECT
        g.gift_card_id,
        SUM(o.order_amount) AS recipient_total_spend_after_gift
    FROM gift_card_base g
    JOIN orders o
        ON g.recipient_user_id = o.user_id
       AND o.order_date &gt;= g.purchase_date
       AND o.order_date &lt; g.purchase_date + INTERVAL '30 days'
    GROUP BY g.gift_card_id
)

SELECT
    COUNT(*) AS total_gift_cards,

    AVG(
        CASE
            WHEN r.first_redeem_date IS NOT NULL
             AND r.first_redeem_date &lt; g.purchase_date + INTERVAL '30 days'
            THEN 1.0 ELSE 0.0
        END
    ) AS redemption_rate_30d,

    AVG(COALESCE(r.total_redeemed_amount, 0)) AS avg_redeemed_amount,

    AVG(COALESCE(o.recipient_total_spend_after_gift, 0)) AS avg_recipient_spend_30d,

    AVG(
        COALESCE(o.recipient_total_spend_after_gift, 0) - g.gift_card_amount
    ) AS avg_incremental_spend_above_gift_card

FROM gift_card_base g
LEFT JOIN redemption_summary r
    ON g.gift_card_id = r.gift_card_id
LEFT JOIN recipient_orders_after_gift o
    ON g.gift_card_id = o.gift_card_id;
</code></code></pre><h3>Metrics I would track</h3><p>For a gift card program, I would not only track sales.</p><p>I would track:</p><ul><li><p>Gift card purchase volume</p></li><li><p>Redemption rate</p></li><li><p>Time to redemption</p></li><li><p>Breakage rate, meaning unused value</p></li><li><p>Recipient activation rate</p></li><li><p>New user recipients</p></li><li><p>Repeat purchase rate after redemption</p></li><li><p>Incremental spend above gift amount</p></li><li><p>Buyer repeat gift purchase rate</p></li><li><p>Fraud or abuse rate</p></li></ul><p>The most important business question is:</p><p>&#8220;Is the gift card program creating incremental customer value, or just shifting existing spend into gift card form?&#8221;</p><p>That means we should compare recipients against a similar group of non-recipients, or run an experiment if possible.</p><div><hr></div><h2>&#9325; Design an A/B test and metric framework for hiring linguists for subtitles.</h2><p>Let&#8217;s say a streaming platform wants to improve subtitle quality by hiring more professional linguists.</p><p>The product question:</p><p>&#8220;Does using professional linguists for subtitles improve viewer experience and business outcomes?&#8221;</p><h3>Hypothesis</h3><p>Better subtitle quality will improve:</p><ul><li><p>Completion rate</p></li><li><p>Watch time</p></li><li><p>Viewer satisfaction</p></li><li><p>Lower subtitle-related complaints</p></li><li><p>Better engagement in non-native language content</p></li></ul><h3>Experiment design</h3><p>We can randomize content titles, regions, or users depending on the risk.</p><p>A clean setup:</p><ul><li><p>Control: Existing subtitle process</p></li><li><p>Treatment: Subtitles created or reviewed by professional linguists</p></li></ul><p>But we need to be careful.</p><p>If users talk to each other or content quality changes at the title level, user-level randomization may be messy.</p><p>For subtitle quality, title-level or region-level testing may be better.</p><h3>Primary metric</h3><p>I would choose one primary metric based on the goal.</p><p>For example:</p><ul><li><p>Completion rate for subtitle-enabled viewing sessions</p></li></ul><p>Why completion rate?</p><p>Because if subtitles are better, users may be more likely to finish the content.</p><h3>Secondary metrics</h3><ul><li><p>Watch time per subtitle-enabled session</p></li><li><p>Subtitle toggle-on rate</p></li><li><p>Rewatch rate</p></li><li><p>Thumbs up or rating</p></li><li><p>Search exits</p></li><li><p>Customer support complaints</p></li><li><p>Subtitle correction reports</p></li><li><p>Engagement with foreign-language titles</p></li></ul><h3>Guardrail metrics</h3><ul><li><p>Subtitle delivery time</p></li><li><p>Subtitle production cost</p></li><li><p>Content launch delay</p></li><li><p>Error rate</p></li><li><p>User complaints</p></li><li><p>Cancellation rate</p></li></ul><h3>Segments</h3><p>I would analyze by:</p><ul><li><p>Country</p></li><li><p>Language pair</p></li><li><p>Device</p></li><li><p>Content genre</p></li><li><p>New vs returning users</p></li><li><p>Native vs non-native language viewers</p></li><li><p>High subtitle usage users</p></li></ul><h3>Decision framework</h3><p>I would launch if:</p><ul><li><p>Completion rate improves</p></li><li><p>Complaints decrease</p></li><li><p>Cost increase is justified</p></li><li><p>No major delay in content availability</p></li><li><p>Results are consistent across important languages or regions</p></li></ul><p>This case is good because it shows that experiment design is not just math. It also needs product judgment.</p><div><hr></div><h2>&#9326; How would you value a piece of content?</h2><p>I would value content based on the business value it creates over time.</p><p>For streaming, a piece of content can create value in several ways.</p><h3>1. Acquisition value</h3><p>Does the content bring in new subscribers?</p><p>For example, a popular show may convince people to sign up.</p><p>Metrics:</p><ul><li><p>New subscriptions after release</p></li><li><p>Trial starts</p></li><li><p>Signup conversion</p></li><li><p>Marketing campaign attribution</p></li></ul><h3>2. Retention value</h3><p>Does the content keep existing users from canceling?</p><p>This is often more important than acquisition.</p><p>Metrics:</p><ul><li><p>Churn reduction</p></li><li><p>Renewal rate</p></li><li><p>Watch frequency</p></li><li><p>Return visits</p></li><li><p>Completion rate</p></li></ul><h3>3. Engagement value</h3><p>Does the content increase platform usage?</p><p>Metrics:</p><ul><li><p>Total hours watched</p></li><li><p>Unique viewers</p></li><li><p>Completion rate</p></li><li><p>Episodes watched per user</p></li><li><p>Repeat viewing</p></li><li><p>Recommendation impact</p></li></ul><h3>4. Brand value</h3><p>Some content makes the platform feel premium.</p><p>It may not have the highest watch hours, but it may improve brand perception.</p><p>Examples:</p><ul><li><p>Award-winning content</p></li><li><p>Prestige shows</p></li><li><p>Culturally important titles</p></li><li><p>Strong niche content</p></li></ul><h3>5. Portfolio value</h3><p>Content may fill a gap in the catalog.</p><p>For example:</p><ul><li><p>Kids content</p></li><li><p>Regional content</p></li><li><p>Anime</p></li><li><p>Sports documentaries</p></li><li><p>Local language content</p></li></ul><p>A title may be valuable because it serves a specific audience very well.</p><h3>6. Long-term library value</h3><p>Some content keeps getting watched for years.</p><p>Metrics:</p><ul><li><p>Evergreen watch time</p></li><li><p>Long-tail engagement</p></li><li><p>Rewatch rate</p></li><li><p>Search demand</p></li><li><p>Recommendation performance</p></li></ul><h3>Simple valuation formula</h3><pre><code><code>Content Value =
Incremental Acquisition Value
+ Incremental Retention Value
+ Incremental Engagement Value
+ Brand Value
+ Long-Term Library Value
- Content Cost
- Marketing Cost
</code></code></pre><p>I would be careful not to give all credit to one title.</p><p>A user may join after seeing an ad for one show but stay because of the full catalog.</p><p>So attribution should be handled carefully.</p><div><hr></div><h2>&#9327; What are the value drivers for Netflix?</h2><p>I would break Netflix value drivers into customer, content, monetization, and operating drivers.</p><h3>Customer drivers</h3><ul><li><p>Subscriber growth</p></li><li><p>Retention</p></li><li><p>Churn reduction</p></li><li><p>User engagement</p></li><li><p>Household penetration</p></li><li><p>International growth</p></li><li><p>Paid sharing conversion</p></li></ul><p>Netflix becomes more valuable when it can attract and retain users profitably.</p><h3>Content drivers</h3><ul><li><p>Quality of original content</p></li><li><p>Depth of content library</p></li><li><p>Local language content</p></li><li><p>Exclusive rights</p></li><li><p>Franchise potential</p></li><li><p>Content freshness</p></li><li><p>Hit rate of new releases</p></li></ul><p>The content engine matters because users stay when they believe there is always something worth watching.</p><h3>Monetization drivers</h3><ul><li><p>Subscription pricing</p></li><li><p>Plan mix</p></li><li><p>Ad-supported plan growth</p></li><li><p>Revenue per user</p></li><li><p>Upsell opportunities</p></li><li><p>Regional pricing strategy</p></li></ul><p>A company can grow not only by adding users, but also by increasing revenue per user.</p><h3>Engagement drivers</h3><ul><li><p>Watch time</p></li><li><p>Completion rate</p></li><li><p>Search success</p></li><li><p>Recommendation quality</p></li><li><p>App experience</p></li><li><p>Content discovery</p></li></ul><p>Engagement matters because high engagement usually supports retention.</p><h3>Cost drivers</h3><ul><li><p>Content production cost</p></li><li><p>Licensing cost</p></li><li><p>Marketing efficiency</p></li><li><p>Technology infrastructure</p></li><li><p>Customer support cost</p></li></ul><p>A platform can grow revenue and still struggle if content costs rise too quickly.</p><h3>Strategic drivers</h3><ul><li><p>Global distribution</p></li><li><p>Brand strength</p></li><li><p>Data advantage</p></li><li><p>Personalization</p></li><li><p>Partnerships</p></li><li><p>Live events or special programming</p></li><li><p>Gaming or newer entertainment formats</p></li></ul><p>In an interview, I would say:</p><p>&#8220;Netflix value is driven by its ability to acquire users, keep them engaged, reduce churn, monetize through pricing and ads, and manage content costs while continuing to produce shows people care about.&#8221;</p><div><hr></div><h2>&#9328; What would you consider when valuing a Netflix deal?</h2><p>First, I would clarify what kind of deal it is.</p><p>Is it:</p><ul><li><p>Licensing a show?</p></li><li><p>Producing an original series?</p></li><li><p>Buying exclusive streaming rights?</p></li><li><p>Partnering with a studio?</p></li><li><p>Sports or live event rights?</p></li><li><p>Talent deal?</p></li><li><p>Regional content deal?</p></li></ul><p>Then I would evaluate both value and risk.</p><h3>Revenue impact</h3><ul><li><p>Will this deal bring new subscribers?</p></li><li><p>Will it reduce churn?</p></li><li><p>Will it increase engagement?</p></li><li><p>Will it support ad revenue?</p></li><li><p>Will it help pricing power?</p></li></ul><h3>Audience fit</h3><ul><li><p>Which audience does it serve?</p></li><li><p>Is the audience large enough?</p></li><li><p>Is it global or regional?</p></li><li><p>Does it attract a hard-to-reach segment?</p></li><li><p>Does it strengthen a weak catalog area?</p></li></ul><h3>Content performance</h3><p>I would estimate:</p><ul><li><p>Expected viewers</p></li><li><p>Completion rate</p></li><li><p>Watch hours</p></li><li><p>Rewatch potential</p></li><li><p>Social buzz</p></li><li><p>Search demand</p></li><li><p>Similar title performance</p></li></ul><h3>Cost</h3><p>I would include:</p><ul><li><p>Licensing fee</p></li><li><p>Production cost</p></li><li><p>Marketing cost</p></li><li><p>Localization cost</p></li><li><p>Legal and rights cost</p></li><li><p>Opportunity cost</p></li></ul><h3>Exclusivity</h3><p>Exclusive content is usually more valuable than non-exclusive content.</p><p>But exclusivity costs more, so I would ask whether exclusivity is worth the premium.</p><h3>Time value</h3><p>Some deals create short-term spikes.</p><p>Others build long-term library value.</p><p>A sports event may drive immediate engagement, while a strong series may create long-tail value for years.</p><h3>Risk</h3><p>Risks include:</p><ul><li><p>Content underperformance</p></li><li><p>Production delay</p></li><li><p>Audience mismatch</p></li><li><p>Regional rights complexity</p></li><li><p>Reputation risk</p></li><li><p>Cost overruns</p></li><li><p>Weak retention impact</p></li></ul><h3>Decision</h3><p>I would compare expected incremental value against total cost.</p><pre><code><code>Deal Value =
Incremental Subscriber Value
+ Retention Value
+ Engagement Value
+ Ad Revenue Value
+ Brand Value
+ Long-Term Library Value
- Total Deal Cost
</code></code></pre><p>I would recommend the deal only if the expected value is higher than the cost and the strategic fit is strong.</p><div><hr></div><h2>&#9329; Design a full experiment end to end and explain your choices.</h2><p>Let&#8217;s design an experiment for a streaming platform.</p><h3>Product idea</h3><p>We want to test a new personalized homepage ranking model.</p><p>The new model is expected to help users find something to watch faster.</p><h3>Step 1: Define the goal</h3><p>The goal is to improve content discovery.</p><p>Business goal:</p><ul><li><p>Increase user engagement</p></li><li><p>Improve retention</p></li><li><p>Reduce browsing frustration</p></li></ul><h3>Step 2: Define the hypothesis</h3><p>Hypothesis:</p><p>&#8220;If we improve homepage ranking, users will start watching content faster and watch more content per session.&#8221;</p><h3>Step 3: Choose the unit of randomization</h3><p>I would randomize at the user level.</p><p>Each user sees either:</p><ul><li><p>Control: Current homepage ranking</p></li><li><p>Treatment: New ranking model</p></li></ul><p>User-level randomization works because homepage experience is personal.</p><h3>Step 4: Define metrics</h3><p>Primary metric:</p><ul><li><p>Play start rate per session</p></li></ul><p>This tells us whether users are more likely to find something to watch.</p><p>Secondary metrics:</p><ul><li><p>Watch time per user</p></li><li><p>Time to first play</p></li><li><p>Homepage click-through rate</p></li><li><p>Completion rate</p></li><li><p>Search usage</p></li><li><p>Return rate next day or next week</p></li></ul><p>Guardrail metrics:</p><ul><li><p>App crashes</p></li><li><p>Playback errors</p></li><li><p>Churn</p></li><li><p>Customer complaints</p></li><li><p>Diversity of content watched</p></li><li><p>Latency of homepage loading</p></li></ul><p>I would not rely on only one engagement metric because a ranking model can increase clicks but hurt satisfaction.</p><h3>Step 5: Sample size and duration</h3><p>Before launching, I would estimate sample size using:</p><ul><li><p>Baseline play start rate</p></li><li><p>Minimum detectable effect</p></li><li><p>Power, usually 80 percent or 90 percent</p></li><li><p>Significance level, often 0.05</p></li><li><p>Expected variance</p></li></ul><p>I would run the test long enough to cover normal user behavior.</p><p>For entertainment products, I would usually want at least one full weekly cycle because weekday and weekend behavior can be very different.</p><h3>Step 6: Data quality checks</h3><p>Before trusting results, I would check:</p><ul><li><p>Sample ratio mismatch</p></li><li><p>Event logging</p></li><li><p>Exposure logging</p></li><li><p>Missing data</p></li><li><p>Duplicate users</p></li><li><p>Bot activity</p></li><li><p>Whether treatment users actually saw the new ranking</p></li></ul><h3>Step 7: Launch plan</h3><p>I would start with a small ramp:</p><ul><li><p>1 percent traffic</p></li><li><p>Check guardrails</p></li><li><p>Move to 10 percent</p></li><li><p>Then 50 percent if stable</p></li></ul><p>This reduces risk.</p><h3>Step 8: Analyze results</h3><p>At the end, I would compare treatment vs control.</p><p>I would look at:</p><ul><li><p>Difference in primary metric</p></li><li><p>Confidence interval</p></li><li><p>p-value</p></li><li><p>Effect size</p></li><li><p>Guardrails</p></li><li><p>Segment-level performance</p></li></ul><p>Important segments:</p><ul><li><p>New users</p></li><li><p>Returning users</p></li><li><p>Heavy users</p></li><li><p>Light users</p></li><li><p>Mobile users</p></li><li><p>TV users</p></li><li><p>Different countries</p></li></ul><h3>Step 9: Decision</h3><p>I would launch if:</p><ul><li><p>Primary metric improves</p></li><li><p>Guardrails are healthy</p></li><li><p>Effect is practically meaningful</p></li><li><p>No major segment is harmed</p></li><li><p>Technical performance is stable</p></li></ul><p>I would not launch if:</p><ul><li><p>The result is statistically significant but too small to matter</p></li><li><p>Watch time improves but complaints increase</p></li><li><p>Clicks improve but completion drops</p></li><li><p>One major user segment is harmed</p></li></ul><h3>Step 10: Follow-up</h3><p>After launch, I would keep monitoring:</p><ul><li><p>Long-term retention</p></li><li><p>Content diversity</p></li><li><p>User complaints</p></li><li><p>Model drift</p></li><li><p>Recommendation freshness</p></li></ul><p>A/B testing does not end the moment we launch. Real users keep changing, and the system needs monitoring.</p><div><hr></div><h1>Final Interview Tip</h1><p>For messy analytics questions, do not rush into formulas.</p><p>A strong answer usually follows this structure:</p><ol><li><p>Clarify the business goal</p></li><li><p>Define the metric</p></li><li><p>State assumptions</p></li><li><p>Break the problem into parts</p></li><li><p>Choose the right method</p></li><li><p>Mention risks and edge cases</p></li><li><p>Tie the answer back to business impact</p></li></ol><p>That is what interviewers are really looking for.</p><p>Not memorized answers.</p><p>Clear thinking.</p>]]></content:encoded></item><item><title><![CDATA[35 Series A Startups Hiring in 2026]]></title><description><![CDATA[USA-based startups that raised Series A in Q1 2026 and are worth tracking for job openings]]></description><link>https://karthiktechdairy.substack.com/p/35-series-a-startups-hiring-in-2026</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/35-series-a-startups-hiring-in-2026</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Mon, 27 Apr 2026 00:28:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you are job hunting in 2026, one of the smartest places to look is recently funded startups.</p><p>Why?</p><p>Because after a Series A round, many startups start expanding their engineering, product, sales, operations, data, customer success, and GTM teams. These companies may not always be as crowded as big tech, but they can offer strong learning, faster growth, and early-career opportunities.</p><p>Below are 35 USA-based startups from the Q1 2026 Series A list that showed strong active-hiring signals.</p><div><hr></div><h2>Now coming to the list</h2><h3>1. depthfirst</h3><p><strong>Domain:</strong> Cybersecurity / AI Security<br><strong>Funds raised:</strong> $40M Series A<br><strong>Venture funded:</strong> Accel, Alt Capital, BoxGroup, Liquid 2 Ventures, SV Angel<br><strong>Company Link:</strong> https://depthfirst.com<br><strong>Company career site:</strong> <a href="https://depthfirst.com/careers">https://depthfirst.com/careers</a><br><strong>Small Blurb:</strong> depthfirst is building AI-native security tools for code, infrastructure, and business logic vulnerabilities. A strong company to watch for applied AI, security engineering, product, and GTM roles.</p><div><hr></div><h3>2. Renterra</h3><p><strong>Domain:</strong> Construction Tech / Equipment Rental SaaS<br><strong>Funds raised:</strong> $9M Series A<br><strong>Venture funded:</strong> Avenue Growth Partners<br><strong>Company Link:</strong> https://getrenterra.com<br><strong>Company career site:</strong> <a href="https://getrenterra.com/careers">https://getrenterra.com/careers</a><br><strong>Small Blurb:</strong> Renterra builds rental management software for heavy equipment companies. Good fit for candidates interested in SaaS, operations, customer success, product, and engineering roles.</p><div><hr></div><h3>3. Neurophos</h3><p><strong>Domain:</strong> Semiconductors / Photonic AI Chips<br><strong>Funds raised:</strong> $110M Series A<br><strong>Venture funded:</strong> Gates Frontier, M12, Carbon Direct Capital, Aramco Ventures, Bosch Ventures<br><strong>Company Link:</strong> https://www.neurophos.com<br><strong>Company career site:</strong> <a href="https://www.neurophos.com/careers">https://www.neurophos.com/careers</a><br><strong>Small Blurb:</strong> Neurophos is working on photonic AI inference chips. This is a strong company to follow for hardware, AI infrastructure, chip design, systems, and research roles.</p><div><hr></div><h3>4. Artie</h3><p><strong>Domain:</strong> Data Infrastructure / Real-Time Streaming<br><strong>Funds raised:</strong> $12M Series A<br><strong>Venture funded:</strong> Standard Capital, Y Combinator, Pathlight Ventures<br><strong>Company Link:</strong> https://www.artie.com<br><strong>Company career site:</strong> <a href="https://www.artie.com/careers">https://www.artie.com/careers</a><br><strong>Small Blurb:</strong> Artie builds real-time data streaming infrastructure for fraud, inventory, analytics, and AI workloads. Great fit for data engineering, backend, infrastructure, and developer tools candidates.</p><div><hr></div><h3>5. Linq</h3><p><strong>Domain:</strong> AI Communication Infrastructure<br><strong>Funds raised:</strong> $20M Series A<br><strong>Venture funded:</strong> TQ Ventures, Mucker Capital, angel investors<br><strong>Company Link:</strong> https://www.linq.ai<br><strong>Company career site:</strong> <a href="https://www.linq.ai/careers">https://www.linq.ai/careers</a><br><strong>Small Blurb:</strong> Linq is building a communication layer for AI agents across SMS, iMessage, RCS, and voice. Good company to track for AI, backend, product, and communication platform roles.</p><div><hr></div><h3>6. Fundamental</h3><p><strong>Domain:</strong> Enterprise AI / Tabular AI<br><strong>Funds raised:</strong> $255M Series A<br><strong>Venture funded:</strong> Oak HC/FT, Valor Equity Partners, Battery Ventures, Salesforce Ventures<br><strong>Company Link:</strong> https://www.fundamental.ai<br><strong>Company career site:</strong> <a href="https://www.fundamental.ai/careers">https://www.fundamental.ai/careers</a><br><strong>Small Blurb:</strong> Fundamental builds Large Tabular Models for enterprise decision-making. Strong fit for ML engineers, data scientists, research engineers, enterprise AI, and GTM roles.</p><div><hr></div><h3>7. Rowspace</h3><p><strong>Domain:</strong> FinTech / Institutional AI<br><strong>Funds raised:</strong> $50M Seed + Series A<br><strong>Venture funded:</strong> Sequoia, Emergence Capital, Stripe, Conviction, Basis Set<br><strong>Company Link:</strong> https://www.rowspace.com<br><strong>Company career site:</strong> <a href="https://www.rowspace.com/careers">https://www.rowspace.com/careers</a><br><strong>Small Blurb:</strong> Rowspace is building AI tools for institutional finance and portfolio decision-making. A strong startup to watch for fintech, AI, data, and backend roles.</p><div><hr></div><h3>8. Corridor</h3><p><strong>Domain:</strong> Cybersecurity / AI Coding Security<br><strong>Funds raised:</strong> $25M Series A<br><strong>Venture funded:</strong> Felicis, Conviction, Timeless, Lux Capital, Datadog, SV Angel<br><strong>Company Link:</strong> https://www.corridor.dev<br><strong>Company career site:</strong> <a href="https://www.corridor.dev/jobs">https://www.corridor.dev/jobs</a><br><strong>Small Blurb:</strong> Corridor focuses on security for AI-native software development. Good fit for candidates interested in AppSec, AI security, developer tools, and infrastructure.</p><div><hr></div><h3>9. Gimlet Labs</h3><p><strong>Domain:</strong> AI Infrastructure / Serverless Inference<br><strong>Funds raised:</strong> $80M Series A<br><strong>Venture funded:</strong> Menlo Ventures, Eclipse Ventures, Prosperity7, Triatomic, Factory<br><strong>Company Link:</strong> https://www.gimletlabs.ai<br><strong>Company career site:</strong> <a href="https://www.gimletlabs.ai/careers">https://www.gimletlabs.ai/careers</a><br><strong>Small Blurb:</strong> Gimlet Labs is building serverless inference infrastructure for AI agents and multi-agent systems. Strong fit for systems, distributed computing, ML infrastructure, and backend roles.</p><div><hr></div><h3>10. Cloudforce</h3><p><strong>Domain:</strong> AI / Healthcare / Public Sector<br><strong>Funds raised:</strong> $10M Series A<br><strong>Venture funded:</strong> Owl Ventures, M12<br><strong>Company Link:</strong> https://www.gocloudforce.com<br><strong>Company career site:</strong> <a href="https://www.gocloudforce.com/careers">https://www.gocloudforce.com/careers</a><br><strong>Small Blurb:</strong> Cloudforce builds AI solutions for regulated sectors like healthcare and the public sector. Good startup to watch for AI, cloud, compliance, and implementation roles.</p><div><hr></div><h3>11. Converge Bio</h3><p><strong>Domain:</strong> AI Drug Discovery / Biotech<br><strong>Funds raised:</strong> $25M Series A<br><strong>Venture funded:</strong> Bessemer Venture Partners, TLV Partners, Vintage Investment Partners, Saras Capital<br><strong>Company Link:</strong> https://converge-bio.com<br><strong>Company career site:</strong> <a href="https://converge-bio.com/careers">https://converge-bio.com/careers</a><br><strong>Small Blurb:</strong> Converge Bio uses AI to support drug discovery and development. Strong fit for computational biology, bioinformatics, ML, and data science candidates.</p><div><hr></div><h3>12. SkyFi</h3><p><strong>Domain:</strong> Earth Intelligence / Geospatial AI<br><strong>Funds raised:</strong> $12.7M Series A<br><strong>Venture funded:</strong> Buoyant Ventures, IronGate Capital Advisors, DNV Ventures, TFX Ventures, J2 Ventures<br><strong>Company Link:</strong> https://skyfi.com<br><strong>Company career site:</strong> <a href="https://skyfi.com/careers">https://skyfi.com/careers</a><br><strong>Small Blurb:</strong> SkyFi is an Earth intelligence platform built around satellite imagery and geospatial analytics. Good fit for GIS, computer vision, data science, and defense-tech roles.</p><div><hr></div><h3>13. Zarminali Pediatrics</h3><p><strong>Domain:</strong> Healthcare / Pediatrics<br><strong>Funds raised:</strong> $110M Series A<br><strong>Venture funded:</strong> General Catalyst, Healthier Capital, K2 HealthVentures<br><strong>Company Link:</strong> https://zarminali.com<br><strong>Company career site:</strong> <a href="https://zarminali.com/careers">https://zarminali.com/careers</a><br><strong>Small Blurb:</strong> Zarminali Pediatrics is building a tech-focused pediatric care group. Strong company to watch for healthcare operations, product, data, clinical, and support roles.</p><div><hr></div><h3>14. Cubby</h3><p><strong>Domain:</strong> PropTech / Storage Management SaaS<br><strong>Funds raised:</strong> $63M Series A<br><strong>Venture funded:</strong> Goldman Sachs Alternatives Growth Equity<br><strong>Company Link:</strong> https://www.cubbystorage.com<br><strong>Company career site:</strong> <a href="https://www.cubbystorage.com/careers">https://www.cubbystorage.com/careers</a><br><strong>Small Blurb:</strong> Cubby builds AI-native property management software for self-storage operators. Good fit for SaaS, operations, engineering, support, and product roles.</p><div><hr></div><h3>15. Cambio</h3><p><strong>Domain:</strong> Commercial Real Estate AI / Climate Tech<br><strong>Funds raised:</strong> $18M Series A<br><strong>Venture funded:</strong> Maverick Ventures, Y Combinator, Adverb Ventures, Peterson Ventures<br><strong>Company Link:</strong> https://cambio.ai<br><strong>Company career site:</strong> <a href="https://cambio.ai/careers">https://cambio.ai/careers</a><br><strong>Small Blurb:</strong> Cambio helps commercial real estate teams improve building performance and retrofit planning. Strong fit for climate tech, data, product, and real estate operations roles.</p><div><hr></div><h3>16. Mia Labs</h3><p><strong>Domain:</strong> Automotive AI / Voice AI<br><strong>Funds raised:</strong> $20M Series A<br><strong>Venture funded:</strong> Permanent Capital Ventures, Norwest, Eniac Ventures, Vine Ventures<br><strong>Company Link:</strong> https://www.mia.inc<br><strong>Company career site:</strong> <a href="https://www.mia.inc/careers">https://www.mia.inc/careers</a><br><strong>Small Blurb:</strong> Mia Labs builds conversational AI tools for automotive dealerships. Good fit for AI, voice systems, customer success, sales engineering, and backend roles.</p><div><hr></div><h3>17. Tradespace</h3><p><strong>Domain:</strong> LegalTech / IP Management AI<br><strong>Funds raised:</strong> $15M Series A<br><strong>Venture funded:</strong> AVP, Eniac Ventures, Amplo VC, Scrum Ventures<br><strong>Company Link:</strong> https://tradespace.io<br><strong>Company career site:</strong> <a href="https://tradespace.io/careers">https://tradespace.io/careers</a><br><strong>Small Blurb:</strong> Tradespace builds AI-native tools for invention disclosure, patents, and IP workflows. Good fit for legaltech, AI, product, data, and enterprise SaaS roles.</p><div><hr></div><h3>18. Concourse</h3><p><strong>Domain:</strong> FinTech / Finance AI Agents<br><strong>Funds raised:</strong> $12M Series A<br><strong>Venture funded:</strong> Standard Capital, Andreessen Horowitz, CRV, Y Combinator<br><strong>Company Link:</strong> https://www.concourse.co<br><strong>Company career site:</strong> <a href="https://www.concourse.co/careers">https://www.concourse.co/careers</a><br><strong>Small Blurb:</strong> Concourse builds AI agents for corporate finance teams. Strong fit for candidates interested in finance automation, AI agents, backend, and product roles.</p><div><hr></div><h3>19. Datatruck</h3><p><strong>Domain:</strong> Logistics SaaS / Trucking AI<br><strong>Funds raised:</strong> $12M Series A<br><strong>Venture funded:</strong> Avenue Growth Partners<br><strong>Company Link:</strong> https://www.datatruck.io<br><strong>Company career site:</strong> <a href="https://www.datatruck.io/careers">https://www.datatruck.io/careers</a><br><strong>Small Blurb:</strong> Datatruck builds an AI-native operating system for trucking companies. Good fit for logistics, SaaS, operations, product, data, and customer success roles.</p><div><hr></div><h3>20. Checkbox</h3><p><strong>Domain:</strong> LegalTech / AI Agents<br><strong>Funds raised:</strong> $23M Series A<br><strong>Venture funded:</strong> Touring Capital, Peak XV, Conductive Ventures, Tidal Ventures<br><strong>Company Link:</strong> https://www.checkbox.ai<br><strong>Company career site:</strong> <a href="https://www.checkbox.ai/careers">https://www.checkbox.ai/careers</a><br><strong>Small Blurb:</strong> Checkbox builds AI agent solutions for in-house legal teams. Strong fit for candidates interested in legal automation, workflow tools, product, and enterprise AI.</p><div><hr></div><h3>21. XBuild</h3><p><strong>Domain:</strong> Construction AI / Estimating<br><strong>Funds raised:</strong> $19M Series A<br><strong>Venture funded:</strong> N47, Rackhouse Ventures, Andreessen Horowitz<br><strong>Company Link:</strong> https://www.xbuild.ai<br><strong>Company career site:</strong> <a href="https://www.xbuild.ai/careers">https://www.xbuild.ai/careers</a><br><strong>Small Blurb:</strong> XBuild uses AI to support construction estimating and proposal generation. Good fit for AI, SaaS, construction tech, product, and GTM roles.</p><div><hr></div><h3>22. Resolve AI</h3><p><strong>Domain:</strong> SRE / Engineering AI Agents<br><strong>Funds raised:</strong> $125M Series A<br><strong>Venture funded:</strong> Publicly reported Series A investors<br><strong>Company Link:</strong> https://resolve.ai<br><strong>Company career site:</strong> <a href="https://resolve.ai/careers">https://resolve.ai/careers</a><br><strong>Small Blurb:</strong> Resolve AI helps engineering teams automate incident response and reliability workflows. Strong fit for SRE, DevOps, backend, AI agents, and platform roles.</p><div><hr></div><h3>23. Didero</h3><p><strong>Domain:</strong> Procurement AI / Enterprise SaaS<br><strong>Funds raised:</strong> $30M Series A<br><strong>Venture funded:</strong> Chemistry, Headline, M12<br><strong>Company Link:</strong> https://www.didero.ai<br><strong>Company career site:</strong> <a href="https://www.didero.ai/careers">https://www.didero.ai/careers</a><br><strong>Small Blurb:</strong> Didero builds AI agents for procurement teams, manufacturers, and distributors. Good fit for enterprise AI, product, supply chain, and SaaS roles.</p><div><hr></div><h3>24. Take2</h3><p><strong>Domain:</strong> Healthcare Recruiting AI / HRTech<br><strong>Funds raised:</strong> $14M Series A<br><strong>Venture funded:</strong> Human Capital, Bertelsmann Healthcare Investments, Reach Capital<br><strong>Company Link:</strong> https://www.take2.ai<br><strong>Company career site:</strong> <a href="https://www.take2.ai/careers">https://www.take2.ai/careers</a><br><strong>Small Blurb:</strong> Take2 builds AI agents for healthcare recruiting, credentialing, scheduling, and onboarding. Good fit for HRTech, healthcare operations, AI, and customer success roles.</p><div><hr></div><h3>25. Integrate</h3><p><strong>Domain:</strong> DefenseTech / Project Management SaaS<br><strong>Funds raised:</strong> $17M Series A<br><strong>Venture funded:</strong> FPV Ventures, Fuse VC, Rsquared VC<br><strong>Company Link:</strong> https://www.integrate.co<br><strong>Company career site:</strong> <a href="https://www.integrate.co/careers">https://www.integrate.co/careers</a><br><strong>Small Blurb:</strong> Integrate builds project management software for defense, space, cyber, maritime, and aerospace programs. Strong fit for defense-tech, product, engineering, and program operations roles.</p><div><hr></div><h3>26. Zero Homes</h3><p><strong>Domain:</strong> ClimateTech / Home Electrification<br><strong>Funds raised:</strong> $16.8M Series A<br><strong>Venture funded:</strong> Prelude Ventures, SJF Ventures, Watsco Ventures, VoLo Earth Ventures<br><strong>Company Link:</strong> https://www.zerohomes.io<br><strong>Company career site:</strong> <a href="https://www.zerohomes.io/careers">https://www.zerohomes.io/careers</a><br><strong>Small Blurb:</strong> Zero Homes helps homeowners electrify through heat pumps, insulation, EV chargers, and related projects. Good fit for climate tech, operations, data, and product roles.</p><div><hr></div><h3>27. Humand</h3><p><strong>Domain:</strong> HRTech / Deskless Workforce AI<br><strong>Funds raised:</strong> $66M Series A<br><strong>Venture funded:</strong> Kaszek, Goodwater Capital, Y Combinator, angel investors<br><strong>Company Link:</strong> https://humand.co<br><strong>Company career site:</strong> <a href="https://humand.co/careers">https://humand.co/careers</a><br><strong>Small Blurb:</strong> Humand builds an AI-powered operating system for deskless workforces. Strong fit for HRTech, SaaS, product, implementation, and customer success roles.</p><div><hr></div><h3>28. Coral Care</h3><p><strong>Domain:</strong> Pediatric Healthcare / Therapy Marketplace<br><strong>Funds raised:</strong> $13M Series A<br><strong>Venture funded:</strong> Haymaker Ventures, FCA Ventures, Peterson Ventures, AlleyCorp, Reach Capital<br><strong>Company Link:</strong> https://www.joincoralcare.com<br><strong>Company career site:</strong> <a href="https://www.joincoralcare.com/careers">https://www.joincoralcare.com/careers</a><br><strong>Small Blurb:</strong> Coral Care expands access to in-home pediatric speech, occupational, and physical therapy. Good fit for healthcare operations, product, support, and marketplace roles.</p><div><hr></div><h3>29. Third Way Health</h3><p><strong>Domain:</strong> Healthcare Services / Automation<br><strong>Funds raised:</strong> $15M Series A<br><strong>Venture funded:</strong> Health Velocity Capital<br><strong>Company Link:</strong> https://www.thirdway.health<br><strong>Company career site:</strong> <a href="https://www.thirdway.health/careers">https://www.thirdway.health/careers</a><br><strong>Small Blurb:</strong> Third Way Health supports healthcare organizations with front-office services like scheduling and prior authorization. Good fit for healthcare operations, automation, and customer success roles.</p><div><hr></div><h3>30. Halcyon</h3><p><strong>Domain:</strong> Energy AI / Data Infrastructure<br><strong>Funds raised:</strong> $21M Series A<br><strong>Venture funded:</strong> Energize Capital, Zero Infinity Partners, Congruent Ventures, Obvious Ventures<br><strong>Company Link:</strong> https://www.halcyon.eco<br><strong>Company career site:</strong> <a href="https://www.halcyon.eco/careers">https://www.halcyon.eco/careers</a><br><strong>Small Blurb:</strong> Halcyon builds AI tools for energy professionals, power-market intelligence, and data center siting. Strong fit for energy, data, AI, climate, and infrastructure roles.</p><div><hr></div><h3>31. Conduit Health</h3><p><strong>Domain:</strong> Healthcare / Medicare &amp; Medicaid Services<br><strong>Funds raised:</strong> $17M Series A<br><strong>Venture funded:</strong> Drive Capital, XYZ Ventures, Twelve Below, Eniac Ventures<br><strong>Company Link:</strong> https://www.conduithealth.com<br><strong>Company career site:</strong> <a href="https://www.conduithealth.com/careers">https://www.conduithealth.com/careers</a><br><strong>Small Blurb:</strong> Conduit Health provides insurance-covered medical supplies and services for Medicare and Medicaid patients. Good fit for healthcare operations, support, data, and growth roles.</p><div><hr></div><h3>32. Deeptune</h3><p><strong>Domain:</strong> AI Simulation / Agent Training<br><strong>Funds raised:</strong> $43M Series A<br><strong>Venture funded:</strong> Andreessen Horowitz, 776, Abstract Ventures, Inspired Capital<br><strong>Company Link:</strong> https://www.deeptune.ai<br><strong>Company career site:</strong> <a href="https://www.deeptune.ai/careers">https://www.deeptune.ai/careers</a><br><strong>Small Blurb:</strong> Deeptune builds simulation environments where AI agents can practice complex tasks. Strong fit for AI research, simulation, ML engineering, and infrastructure roles.</p><div><hr></div><h3>33. Edra</h3><p><strong>Domain:</strong> Workflow Automation / AI Agents<br><strong>Funds raised:</strong> $30M Series A<br><strong>Venture funded:</strong> Sequoia Capital, A*, 8VC<br><strong>Company Link:</strong> https://edra.com<br><strong>Company career site:</strong> <a href="https://edra.com/careers">https://edra.com/careers</a><br><strong>Small Blurb:</strong> Edra builds AI agents that learn business operations and automate repetitive workflows. Good fit for AI agents, workflow automation, backend, and product roles.</p><div><hr></div><h3>34. BlueFlag Security</h3><p><strong>Domain:</strong> Cybersecurity / Developer Identity Governance<br><strong>Funds raised:</strong> $16.5M Series A<br><strong>Venture funded:</strong> Maverick Ventures, Ten Eleven Ventures<br><strong>Company Link:</strong> https://www.blueflagsecurity.com<br><strong>Company career site:</strong> <a href="https://www.blueflagsecurity.com/careers">https://www.blueflagsecurity.com/careers</a><br><strong>Small Blurb:</strong> BlueFlag Security focuses on identity-centric security for developers, contractors, non-human identities, and AI agents. Strong fit for cybersecurity, identity, DevSecOps, and platform roles.</p><div><hr></div><h3>35. Starcloud</h3><p><strong>Domain:</strong> SpaceTech / Data Centers in Space<br><strong>Funds raised:</strong> $170M Series A<br><strong>Venture funded:</strong> Publicly reported Series A investors<br><strong>Company Link:</strong> https://www.starcloud.com<br><strong>Company career site:</strong> <a href="https://www.starcloud.com/careers">https://www.starcloud.com/careers</a><br><strong>Small Blurb:</strong> Starcloud is building data centers in space. A very interesting company to follow for aerospace, distributed systems, AI infrastructure, thermal engineering, and hardware roles.</p><div><hr></div><h2>Final note for job seekers</h2><p>Recently funded startups are not always easy to discover through regular job boards.</p><p>That is exactly why they are worth tracking.</p><p>Do not only apply to the same 10 big companies everyone is applying to. Build a startup watchlist, check their career pages every week, connect with team members, and apply early when roles open.</p><p>These 35 companies are a good starting point if you are exploring roles in AI, cybersecurity, healthcare, climate tech, fintech, defense tech, data infrastructure, and space tech.</p>]]></content:encoded></item><item><title><![CDATA[50 Data Center Projects for MS BA, MS DS, and Tech Students]]></title><description><![CDATA[If you are trying to get into data center roles, infrastructure roles, cloud operations, NOC, SRE, network engineering, or infrastructure analytics, projects can help a lot.]]></description><link>https://karthiktechdairy.substack.com/p/50-data-center-projects-for-ms-ba</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/50-data-center-projects-for-ms-ba</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Sat, 25 Apr 2026 23:07:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!81vR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>But here is the important part.</p><p>These projects are not only for computer science students. Many <strong>MS Business Analytics, MS Data Science, MS Information Technology, and Cybersecurity</strong> students can also work on these because modern data centers generate a lot of data: server metrics, logs, power usage, ticket trends, uptime, capacity usage, network traffic, cloud cost, and security alerts.</p><p>Before using the resume points below, please remember:</p><p><strong>Resume bullet points are only for reference. Adjust them based on what you actually build, measure, and customize for your own profile.<br><br>Resume for Reference - https://www.overleaf.com/read/szjhqgsvcdyw#e6311a</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!81vR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!81vR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!81vR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!81vR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!81vR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!81vR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg" width="1456" height="1884" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1884,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:549058,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://karthiktechdairy.substack.com/i/195482020?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!81vR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!81vR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!81vR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!81vR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad55a90-e956-4194-8779-0a9cacbe7a7f_1700x2200.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Project 1. NetBox DCIM/IPAM Lab</h2><p>Github link - <a href="https://github.com/netbox-community/netbox?utm_source=chatgpt.com">https://github.com/netbox-community/netbox</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS BA, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a data center inventory system using NetBox to track <strong>rack utilization</strong>, <strong>device inventory</strong>, <strong>IP addresses</strong>, and <strong>circuits</strong>.</p></li><li><p>Created structured infrastructure datasets for asset reporting, capacity planning, and operational analysis.</p></li><li><p>Automated device and IP documentation using NetBox APIs to reduce manual tracking effort by <strong>X%</strong>.</p></li></ul><div><hr></div><h2>Project 2. openDCIM Data Center Inventory</h2><p>Github link - <a href="https://github.com/opendcim/openDCIM?utm_source=chatgpt.com">https://github.com/opendcim/openDCIM</a></p><p>Best fit MS Backgrounds - MS IT, MS BA, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Implemented openDCIM to manage <strong>rack-level assets</strong>, <strong>cabinet usage</strong>, <strong>power allocation</strong>, and <strong>device inventory</strong>.</p></li><li><p>Built utilization reports to identify unused rack capacity and infrastructure gaps across <strong>X cabinets</strong>.</p></li><li><p>Created a data center asset-tracking workflow to improve visibility into <strong>hardware location</strong>, <strong>ownership</strong>, and <strong>capacity usage</strong>.</p></li></ul><div><hr></div><h2>Project 3. Nautobot Network Source of Truth</h2><p>Github link - <a href="https://github.com/nautobot/nautobot?utm_source=chatgpt.com">https://github.com/nautobot/nautobot</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a network source-of-truth system using Nautobot to manage <strong>devices</strong>, <strong>sites</strong>, <strong>IP addresses</strong>, and <strong>circuits</strong>.</p></li><li><p>Created API-based workflows to validate network inventory accuracy across <strong>X network assets</strong>.</p></li><li><p>Designed reports and dashboards to improve visibility into <strong>network documentation quality</strong> and <strong>infrastructure readiness</strong>.</p></li></ul><div><hr></div><h2>Project 4. Data Center Asset Analytics Dashboard</h2><p>Github link - <a href="https://github.com/netbox-community/netbox?utm_source=chatgpt.com">https://github.com/netbox-community/netbox</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Analyzed NetBox asset data to identify <strong>rack utilization</strong>, <strong>device density</strong>, and <strong>capacity trends</strong>.</p></li><li><p>Built a dashboard summarizing infrastructure usage by <strong>rack</strong>, <strong>site</strong>, <strong>device type</strong>, and <strong>IP allocation</strong>.</p></li><li><p>Generated capacity planning insights to support decisions around <strong>hardware expansion</strong>, <strong>rack space</strong>, and <strong>network resources</strong>.</p></li></ul><div><hr></div><h2>Project 5. IP Address Management Analytics</h2><p>Github link - <a href="https://github.com/SpriteLink/NIPAP">https://github.com/SpriteLink/NIPAP</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built an IP address utilization dashboard to track <strong>subnet usage</strong>, <strong>available IPs</strong>, and <strong>allocation efficiency</strong>.</p></li><li><p>Analyzed IP allocation patterns to reduce unused blocks and improve network planning.</p></li><li><p>Created IP capacity reports showing <strong>X% subnet utilization</strong> and future availability risk.</p></li></ul><div><hr></div><h2>Project 6. MAAS Bare-Metal Provisioning</h2><p>Github link - <a href="https://github.com/canonical/maas?utm_source=chatgpt.com">https://github.com/canonical/maas</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed MAAS to automate <strong>bare-metal server discovery</strong>, <strong>PXE boot</strong>, and <strong>OS provisioning</strong>.</p></li><li><p>Configured repeatable server deployment workflows across <strong>X nodes</strong>.</p></li><li><p>Tracked provisioning time and documented improvements in server onboarding speed by <strong>X%</strong>.</p></li></ul><div><hr></div><h2>Project 7. Foreman Server Lifecycle Management</h2><p>Github link - <a href="https://github.com/theforeman/foreman?utm_source=chatgpt.com">https://github.com/theforeman/foreman</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a server lifecycle management workflow using Foreman for <strong>provisioning</strong>, <strong>patch tracking</strong>, and <strong>host inventory</strong>.</p></li><li><p>Created infrastructure reports for <strong>patch compliance</strong>, <strong>host status</strong>, and <strong>operational visibility</strong>.</p></li><li><p>Improved server documentation by centralizing records for <strong>X systems</strong>.</p></li></ul><div><hr></div><h2>Project 8. Cobbler PXE Deployment Lab</h2><p>Github link - <a href="https://github.com/cobbler/cobbler?utm_source=chatgpt.com">https://github.com/cobbler/cobbler</a></p><p>Best fit MS Backgrounds - MS IT, MS CS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a PXE-based Linux provisioning lab using Cobbler for automated OS installation.</p></li><li><p>Configured <strong>DHCP</strong>, <strong>DNS</strong>, <strong>Kickstart</strong>, and boot profiles for repeatable server deployments.</p></li><li><p>Reduced manual installation steps by automating server setup across <strong>X Linux machines</strong>.</p></li></ul><div><hr></div><h2>Project 9. FOG Imaging Project</h2><p>Github link - <a href="https://github.com/FOGProject/fogproject?utm_source=chatgpt.com">https://github.com/FOGProject/fogproject</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Implemented an imaging workflow to deploy and restore systems using FOG Project.</p></li><li><p>Created device inventory and imaging status reports for <strong>X endpoints/servers</strong>.</p></li><li><p>Improved system recovery readiness by standardizing image deployment and backup workflows.</p></li></ul><div><hr></div><h2>Project 10. Ansible Data Center Automation</h2><p>Github link - <a href="https://github.com/ansible/ansible">https://github.com/ansible/ansible</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Automated Linux server configuration using Ansible playbooks for <strong>users</strong>, <strong>packages</strong>, <strong>services</strong>, and <strong>security settings</strong>.</p></li><li><p>Built reusable roles for monitoring agent installation and baseline server hardening.</p></li><li><p>Reduced repetitive administration tasks by automating <strong>X infrastructure workflows</strong>.</p></li></ul><div><hr></div><h2>Project 11. Terraform Infrastructure as Code</h2><p>Github link - <a href="https://github.com/hashicorp/terraform">https://github.com/hashicorp/terraform</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built reusable Terraform modules to provision infrastructure resources consistently across environments.</p></li><li><p>Managed infrastructure configuration using <strong>variables</strong>, <strong>state files</strong>, and <strong>modular design</strong>.</p></li><li><p>Improved deployment repeatability by defining <strong>X infrastructure resources</strong> as code.</p></li></ul><div><hr></div><h2>Project 12. OpenTofu IaC Lab</h2><p>Github link - <a href="https://github.com/opentofu/opentofu">https://github.com/opentofu/opentofu</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Created OpenTofu modules to automate infrastructure deployment and environment setup.</p></li><li><p>Designed reusable templates for <strong>development</strong>, <strong>testing</strong>, and <strong>production-like</strong> infrastructure.</p></li><li><p>Practiced version-controlled infrastructure management across <strong>X environments</strong>.</p></li></ul><div><hr></div><h2>Project 13. Packer Golden Image Builder</h2><p>Github link - <a href="https://github.com/hashicorp/packer">https://github.com/hashicorp/packer</a></p><p>Best fit MS Backgrounds - MS IT, MS CS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built repeatable Linux golden images using Packer for faster server provisioning.</p></li><li><p>Automated baseline package installation, security configuration, and image validation.</p></li><li><p>Reduced manual server setup time by creating reusable VM images with <strong>X standard configurations</strong>.</p></li></ul><div><hr></div><h2>Project 14. Kubernetes Cluster Operations</h2><p>Github link - <a href="https://github.com/kubernetes/kubernetes">https://github.com/kubernetes/kubernetes</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed and managed a Kubernetes cluster to understand <strong>workloads</strong>, <strong>services</strong>, <strong>scheduling</strong>, and <strong>scaling</strong>.</p></li><li><p>Monitored pod health, resource usage, and cluster availability using operational metrics.</p></li><li><p>Documented troubleshooting steps for failed deployments, scaling issues, and service downtime.</p></li></ul><div><hr></div><h2>Project 15. Kubespray Bare-Metal Kubernetes</h2><p>Github link - <a href="https://github.com/kubernetes-sigs/kubespray?utm_source=chatgpt.com">https://github.com/kubernetes-sigs/kubespray</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a multi-node Kubernetes cluster using Kubespray and Ansible automation.</p></li><li><p>Configured <strong>networking</strong>, <strong>storage</strong>, <strong>node roles</strong>, and <strong>cluster validation</strong> workflows.</p></li><li><p>Deployed Kubernetes on <strong>X nodes</strong> and documented high-availability setup steps.</p></li></ul><div><hr></div><h2>Project 16. K3s Edge/Data Center Mini Cluster</h2><p>Github link - <a href="https://github.com/k3s-io/k3s">https://github.com/k3s-io/k3s</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a lightweight Kubernetes cluster using K3s to simulate edge and small data center environments.</p></li><li><p>Deployed containerized workloads and monitored <strong>CPU</strong>, <strong>memory</strong>, <strong>disk</strong>, and <strong>network usage</strong>.</p></li><li><p>Created a compact infrastructure lab for testing automation, monitoring, and workload deployment.</p></li></ul><div><hr></div><h2>Project 17. MetalLB Bare-Metal Load Balancing</h2><p>Github link - <a href="https://github.com/metallb/metallb">https://github.com/metallb/metallb</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Configured MetalLB to provide load balancing for a bare-metal Kubernetes cluster.</p></li><li><p>Tested <strong>Layer 2</strong> and <strong>BGP-based</strong> service exposure for internal applications.</p></li><li><p>Improved service availability by enabling load-balanced access across <strong>X Kubernetes services</strong>.</p></li></ul><div><hr></div><h2>Project 18. KubeVirt VM on Kubernetes</h2><p>Github link - <a href="https://github.com/kubevirt/kubevirt?utm_source=chatgpt.com">https://github.com/kubevirt/kubevirt</a></p><p>Best fit MS Backgrounds - MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed virtual machines inside Kubernetes using KubeVirt.</p></li><li><p>Compared VM-based and container-based workloads across <strong>resource usage</strong>, <strong>deployment speed</strong>, and <strong>management complexity</strong>.</p></li><li><p>Documented how traditional virtualization and Kubernetes can operate in the same infrastructure environment.</p></li></ul><div><hr></div><h2>Project 19. Harvester Hyperconverged Infrastructure</h2><p>Github link - <a href="https://github.com/harvester/harvester">https://github.com/harvester/harvester</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a hyperconverged infrastructure lab using Harvester for <strong>VMs</strong>, <strong>storage</strong>, and <strong>cluster resource management</strong>.</p></li><li><p>Created infrastructure capacity notes covering <strong>compute</strong>, <strong>memory</strong>, <strong>storage</strong>, and <strong>workload usage</strong>.</p></li><li><p>Managed virtual workloads through a unified private cloud-style platform.</p></li></ul><div><hr></div><h2>Project 20. OpenStack-Ansible Private Cloud</h2><p>Github link - <a href="https://github.com/openstack/openstack-ansible">https://github.com/openstack/openstack-ansible</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed a private cloud environment using OpenStack-Ansible.</p></li><li><p>Configured <strong>compute</strong>, <strong>networking</strong>, and <strong>storage</strong> services for VM provisioning.</p></li><li><p>Documented private cloud operations including <strong>instance creation</strong>, <strong>resource allocation</strong>, and <strong>service validation</strong>.</p></li></ul><div><hr></div><h2>Project 21. Apache CloudStack Private Cloud</h2><p>Github link - <a href="https://github.com/apache/cloudstack?utm_source=chatgpt.com">https://github.com/apache/cloudstack</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a private cloud lab using Apache CloudStack to manage virtual infrastructure.</p></li><li><p>Monitored <strong>compute pools</strong>, <strong>VM usage</strong>, <strong>network resources</strong>, and <strong>storage allocation</strong>.</p></li><li><p>Created infrastructure reports showing VM capacity, usage trends, and operational status.</p></li></ul><div><hr></div><h2>Project 22. OpenNebula Private/Edge Cloud</h2><p>Github link - <a href="https://github.com/OpenNebula/one">https://github.com/OpenNebula/one</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed OpenNebula to manage virtualized infrastructure resources.</p></li><li><p>Created VM templates and automated workload deployment across <strong>X virtual machines</strong>.</p></li><li><p>Analyzed resource allocation across <strong>compute</strong>, <strong>storage</strong>, and <strong>network pools</strong>.</p></li></ul><div><hr></div><h2>Project 23. Prometheus Metrics Monitoring</h2><p>Github link - <a href="https://github.com/prometheus/prometheus">https://github.com/prometheus/prometheus</a></p><p>Best fit MS Backgrounds - MS DS, MS BA, MS IT, MS CS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a metrics monitoring system using Prometheus for servers, applications, and infrastructure services.</p></li><li><p>Wrote PromQL queries to analyze <strong>CPU usage</strong>, <strong>memory consumption</strong>, <strong>disk I/O</strong>, and <strong>network traffic</strong>.</p></li><li><p>Configured alerts for infrastructure health thresholds, reducing manual monitoring effort by <strong>X%</strong>.</p></li></ul><div><hr></div><h2>Project 24. Grafana Infrastructure Dashboard</h2><p>Github link - <a href="https://github.com/grafana/grafana?utm_source=chatgpt.com">https://github.com/grafana/grafana</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Designed Grafana dashboards for <strong>server health</strong>, <strong>network traffic</strong>, <strong>uptime</strong>, and <strong>capacity usage</strong>.</p></li><li><p>Converted raw infrastructure metrics into visual reports for operational and business decision-making.</p></li><li><p>Created alert panels to track <strong>SLA performance</strong>, <strong>downtime risk</strong>, and <strong>resource saturation</strong>.</p></li></ul><div><hr></div><h2>Project 25. Grafana Loki Log Analytics</h2><p>Github link - <a href="https://github.com/grafana/loki?utm_source=chatgpt.com">https://github.com/grafana/loki</a></p><p>Best fit MS Backgrounds - MS DS, MS BA, MS IT, MS Cybersecurity</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a centralized log analytics system using Loki and Grafana.</p></li><li><p>Created queries to identify <strong>repeated errors</strong>, <strong>service failures</strong>, <strong>latency spikes</strong>, and <strong>incident patterns</strong>.</p></li><li><p>Developed dashboards for operational troubleshooting and reduced log investigation time by <strong>X%</strong>.</p></li></ul><div><hr></div><h2>Project 26. OpenTelemetry Collector Pipeline</h2><p>Github link - <a href="https://github.com/open-telemetry/opentelemetry-collector">https://github.com/open-telemetry/opentelemetry-collector</a></p><p>Best fit MS Backgrounds - MS DS, MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a telemetry collection pipeline using OpenTelemetry Collector.</p></li><li><p>Routed <strong>metrics</strong>, <strong>logs</strong>, and <strong>traces</strong> from services into observability tools.</p></li><li><p>Documented end-to-end data flow architecture for infrastructure monitoring and incident analysis.</p></li></ul><div><hr></div><h2>Project 27. Netdata Real-Time Server Monitoring</h2><p>Github link - <a href="https://github.com/netdata/netdata?utm_source=chatgpt.com">https://github.com/netdata/netdata</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed Netdata to monitor <strong>CPU</strong>, <strong>memory</strong>, <strong>disk</strong>, <strong>network</strong>, and <strong>service health</strong> in real time.</p></li><li><p>Created operational dashboards to visualize server performance and infrastructure behavior.</p></li><li><p>Analyzed resource spikes during workload testing and identified <strong>X performance bottlenecks</strong>.</p></li></ul><div><hr></div><h2>Project 28. Zabbix Enterprise Monitoring</h2><p>Github link - <a href="https://github.com/zabbix/zabbix">https://github.com/zabbix/zabbix</a></p><p>Best fit MS Backgrounds - MS IT, MS BA, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Implemented Zabbix monitoring for servers, services, and network devices.</p></li><li><p>Configured alerts for <strong>downtime</strong>, <strong>resource saturation</strong>, <strong>disk usage</strong>, and <strong>service failures</strong>.</p></li><li><p>Created infrastructure availability reports showing <strong>uptime</strong>, <strong>incident frequency</strong>, and <strong>response patterns</strong>.</p></li></ul><div><hr></div><h2>Project 29. LibreNMS Network Monitoring</h2><p>Github link - <a href="https://github.com/librenms/librenms">https://github.com/librenms/librenms</a></p><p>Best fit MS Backgrounds - MS IT, MS BA, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed LibreNMS to monitor routers, switches, and server interfaces using SNMP.</p></li><li><p>Built dashboards for <strong>bandwidth usage</strong>, <strong>device health</strong>, <strong>interface status</strong>, and <strong>network utilization</strong>.</p></li><li><p>Created alerting rules for interface downtime and unusual traffic patterns across <strong>X devices</strong>.</p></li></ul><div><hr></div><h2>Project 30. Icinga2 Availability Monitoring</h2><p>Github link - <a href="https://github.com/Icinga/icinga2">https://github.com/Icinga/icinga2</a></p><p>Best fit MS Backgrounds - MS IT, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Configured Icinga2 checks for server uptime, services, and infrastructure availability.</p></li><li><p>Designed alert rules for <strong>failed services</strong>, <strong>degraded performance</strong>, and <strong>availability drops</strong>.</p></li><li><p>Documented incident response workflows for service outages and monitoring escalations.</p></li></ul><div><hr></div><h2>Project 31. OpenSearch Log Search &amp; Analytics</h2><p>Github link - <a href="https://github.com/opensearch-project/OpenSearch">https://github.com/opensearch-project/OpenSearch</a></p><p>Best fit MS Backgrounds - MS DS, MS BA, MS Cybersecurity, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built an OpenSearch-based log analytics system for infrastructure and security events.</p></li><li><p>Created searchable indexes for <strong>server logs</strong>, <strong>application logs</strong>, and <strong>security alerts</strong>.</p></li><li><p>Designed dashboards to analyze <strong>incident trends</strong>, <strong>error frequency</strong>, and <strong>operational patterns</strong>.</p></li></ul><div><hr></div><h2>Project 32. OpenSearch Dashboards BI Project</h2><p>Github link - <a href="https://github.com/opensearch-project/OpenSearch-Dashboards">https://github.com/opensearch-project/OpenSearch-Dashboards</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Created OpenSearch dashboards for <strong>infrastructure KPIs</strong>, <strong>incident counts</strong>, and <strong>log trends</strong>.</p></li><li><p>Built visual reports to support operational decision-making and reliability reviews.</p></li><li><p>Connected log data to business-style metrics such as <strong>SLA performance</strong>, <strong>MTTR</strong>, and <strong>downtime frequency</strong>.</p></li></ul><div><hr></div><h2>Project 33. OpenCost Kubernetes Cost Monitoring</h2><p>Github link - <a href="https://github.com/opencost/opencost?utm_source=chatgpt.com">https://github.com/opencost/opencost</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed OpenCost to track Kubernetes resource cost by <strong>namespace</strong>, <strong>workload</strong>, and <strong>service</strong>.</p></li><li><p>Built cost allocation reports for infrastructure usage analysis and chargeback-style reporting.</p></li><li><p>Identified cost optimization opportunities from underutilized workloads, targeting <strong>X% cost reduction</strong>.</p></li></ul><div><hr></div><h2>Project 34. Cloud Carbon Footprint</h2><p>Github link - <a href="https://github.com/cloud-carbon-footprint/cloud-carbon-footprint?utm_source=chatgpt.com">https://github.com/cloud-carbon-footprint/cloud-carbon-footprint</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS Sustainability, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a cloud sustainability dashboard to estimate <strong>energy usage</strong>, <strong>carbon impact</strong>, and <strong>cloud cost trends</strong>.</p></li><li><p>Analyzed infrastructure usage patterns to identify high-impact workloads.</p></li><li><p>Created recommendations to reduce cloud waste and improve sustainability reporting by <strong>X%</strong>.</p></li></ul><div><hr></div><h2>Project 35. Kepler Kubernetes Energy Monitoring</h2><p>Github link - <a href="https://github.com/sustainable-computing-io/kepler?utm_source=chatgpt.com">https://github.com/sustainable-computing-io/kepler</a></p><p>Best fit MS Backgrounds - MS DS, MS BA, MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed Kepler to collect Kubernetes energy metrics at <strong>node</strong>, <strong>pod</strong>, and <strong>container</strong> levels.</p></li><li><p>Built dashboards to analyze <strong>energy consumption</strong>, <strong>workload efficiency</strong>, and <strong>resource usage patterns</strong>.</p></li><li><p>Created insights for greener infrastructure and workload optimization across <strong>X services</strong>.</p></li></ul><div><hr></div><h2>Project 36. kube-green Workload Energy Optimization</h2><p>Github link - <a href="https://github.com/kube-green/kube-green">https://github.com/kube-green/kube-green</a></p><p>Best fit MS Backgrounds - MS BA, MS DS, MS CS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Configured kube-green to reduce Kubernetes resource usage during non-working hours.</p></li><li><p>Measured workload sleep/wake behavior and estimated <strong>compute savings</strong>, <strong>energy savings</strong>, and <strong>cost reduction</strong>.</p></li><li><p>Documented sustainability and cost optimization outcomes across <strong>X workloads</strong>.</p></li></ul><div><hr></div><h2>Project 37. Network UPS Tools Power Monitoring</h2><p>Github link - <a href="https://github.com/networkupstools/nut">https://github.com/networkupstools/nut</a></p><p>Best fit MS Backgrounds - MS IT, MS BA, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Configured UPS monitoring to track <strong>power status</strong>, <strong>battery health</strong>, and <strong>outage events</strong>.</p></li><li><p>Built reports for backup power availability, power incidents, and battery performance trends.</p></li><li><p>Automated graceful shutdown logic to protect systems during power failure scenarios.</p></li></ul><div><hr></div><h2>Project 38. Ceph Distributed Storage</h2><p>Github link - <a href="https://github.com/ceph/ceph">https://github.com/ceph/ceph</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed a Ceph storage cluster to understand <strong>distributed storage</strong>, <strong>replication</strong>, and <strong>fault tolerance</strong>.</p></li><li><p>Monitored <strong>storage capacity</strong>, <strong>disk health</strong>, <strong>cluster status</strong>, and <strong>recovery behavior</strong>.</p></li><li><p>Documented storage failure scenarios and recovery workflows across <strong>X storage nodes</strong>.</p></li></ul><div><hr></div><h2>Project 39. Rook Ceph on Kubernetes</h2><p>Github link - <a href="https://github.com/rook/rook">https://github.com/rook/rook</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed Ceph storage on Kubernetes using Rook.</p></li><li><p>Created persistent storage for container workloads and tested failure recovery.</p></li><li><p>Monitored <strong>storage utilization</strong>, <strong>volume health</strong>, and <strong>availability</strong> across the Kubernetes cluster.</p></li></ul><div><hr></div><h2>Project 40. Longhorn Kubernetes Storage</h2><p>Github link - <a href="https://github.com/longhorn/longhorn">https://github.com/longhorn/longhorn</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Implemented Longhorn for persistent Kubernetes storage.</p></li><li><p>Configured <strong>volume replication</strong>, <strong>backup</strong>, and <strong>restore</strong> workflows.</p></li><li><p>Tested application recovery after simulated storage failure and measured <strong>recovery time</strong>.</p></li></ul><div><hr></div><h2>Project 41. MinIO Object Storage</h2><p>Github link - <a href="https://github.com/minio/minio">https://github.com/minio/minio</a></p><p>Best fit MS Backgrounds - MS DS, MS BA, MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built an S3-compatible object storage system using MinIO.</p></li><li><p>Stored <strong>logs</strong>, <strong>backup files</strong>, and <strong>analytics datasets</strong> in a self-hosted object storage layer.</p></li><li><p>Designed a storage workflow to support data pipelines, infrastructure logs, and backup use cases.</p></li></ul><div><hr></div><h2>Project 42. Velero Kubernetes Backup &amp; Disaster Recovery</h2><p>Github link - <a href="https://github.com/vmware-tanzu/velero">https://github.com/vmware-tanzu/velero</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS BA</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Configured Velero to back up and restore Kubernetes workloads.</p></li><li><p>Tested disaster recovery scenarios for <strong>namespaces</strong>, <strong>deployments</strong>, and <strong>persistent volumes</strong>.</p></li><li><p>Documented <strong>recovery time</strong>, <strong>backup success rate</strong>, and validation steps for workload restoration.</p></li></ul><div><hr></div><h2>Project 43. TrueNAS Middleware Storage Project</h2><p>Github link - <a href="https://github.com/truenas/middleware">https://github.com/truenas/middleware</a></p><p>Best fit MS Backgrounds - MS IT, MS CS, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Explored NAS storage workflows using TrueNAS middleware concepts.</p></li><li><p>Analyzed <strong>storage pool usage</strong>, <strong>snapshots</strong>, <strong>datasets</strong>, and <strong>storage API behavior</strong>.</p></li><li><p>Built documentation for ZFS-based storage operations and storage health monitoring.</p></li></ul><div><hr></div><h2>Project 44. Batfish Network Validation</h2><p>Github link - <a href="https://github.com/batfish/batfish">https://github.com/batfish/batfish</a></p><p>Best fit MS Backgrounds - MS DS, MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Used Batfish to analyze network configurations and detect routing or policy issues.</p></li><li><p>Built validation checks to identify misconfigurations before deployment.</p></li><li><p>Created reports summarizing <strong>network risk</strong>, <strong>configuration errors</strong>, and <strong>routing validation results</strong>.</p></li></ul><div><hr></div><h2>Project 45. Nornir Network Automation</h2><p>Github link - <a href="https://github.com/nornir-automation/nornir">https://github.com/nornir-automation/nornir</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built Python-based network automation tasks using Nornir.</p></li><li><p>Automated <strong>device inventory checks</strong>, <strong>configuration backups</strong>, and <strong>status collection</strong>.</p></li><li><p>Generated structured network data for analytics and reporting across <strong>X devices</strong>.</p></li></ul><div><hr></div><h2>Project 46. NAPALM Multi-Vendor Network Automation</h2><p>Github link - <a href="https://github.com/napalm-automation/napalm">https://github.com/napalm-automation/napalm</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Automated multi-vendor network device checks using NAPALM.</p></li><li><p>Collected <strong>configuration</strong>, <strong>interface</strong>, and <strong>device state</strong> data for analysis.</p></li><li><p>Built validation scripts to compare expected vs actual network state across <strong>X devices</strong>.</p></li></ul><div><hr></div><h2>Project 47. FRRouting Data Center Routing Lab</h2><p>Github link - <a href="https://github.com/FRRouting/frr">https://github.com/FRRouting/frr</a></p><p>Best fit MS Backgrounds - MS CS, MS IT, MS DS</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Built a routing lab using FRRouting to practice <strong>BGP</strong>, <strong>OSPF</strong>, <strong>EVPN</strong>, and Linux networking.</p></li><li><p>Simulated data center routing scenarios and tested failover behavior.</p></li><li><p>Documented <strong>routing table changes</strong>, <strong>network convergence</strong>, and <strong>failover response time</strong>.</p></li></ul><div><hr></div><h2>Project 48. SONiC Network OS Study Lab</h2><p>Github link - <a href="https://github.com/sonic-net/SONiC?utm_source=chatgpt.com">https://github.com/sonic-net/SONiC</a></p><p>Best fit MS Backgrounds - MS CS, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Explored SONiC architecture for cloud-scale data center switching.</p></li><li><p>Studied switch OS components, routing behavior, and network operations workflows.</p></li><li><p>Documented key concepts around <strong>BGP</strong>, <strong>switch management</strong>, <strong>network OS design</strong>, and <strong>data center switching</strong>.</p></li></ul><div><hr></div><h2>Project 49. Wazuh Security Monitoring</h2><p>Github link - <a href="https://github.com/wazuh/wazuh?utm_source=chatgpt.com">https://github.com/wazuh/wazuh</a></p><p>Best fit MS Backgrounds - MS Cybersecurity, MS DS, MS BA, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Deployed Wazuh to collect and analyze security events from servers.</p></li><li><p>Built dashboards for <strong>security alerts</strong>, <strong>compliance checks</strong>, <strong>endpoint activity</strong>, and <strong>incident trends</strong>.</p></li><li><p>Investigated log patterns to identify suspicious activity and improve infrastructure security visibility.</p></li></ul><div><hr></div><h2>Project 50. Zeek Network Security Analytics</h2><p>Github link - <a href="https://github.com/zeek/zeek">https://github.com/zeek/zeek</a></p><p>Best fit MS Backgrounds - MS Cybersecurity, MS DS, MS BA, MS IT</p><p>Resume-Ready Bullet Points -</p><ul><li><p>Used Zeek to generate structured network security logs from traffic.</p></li><li><p>Analyzed <strong>connection logs</strong>, <strong>DNS logs</strong>, <strong>HTTP logs</strong>, and <strong>traffic patterns</strong> to identify suspicious behavior.</p></li><li><p>Built a network security analytics dashboard for incident investigation and anomaly detection.</p></li></ul><div><hr></div><h1>Final Note</h1><p>You do not need to complete all 50 projects.</p><p>Pick <strong>3 to 5 projects</strong> based on your target role.</p><p>For MS BA students, start with:</p><ul><li><p>Grafana Infrastructure Dashboard</p></li><li><p>OpenCost Kubernetes Cost Monitoring</p></li><li><p>Cloud Carbon Footprint</p></li><li><p>NetBox Asset Analytics Dashboard</p></li><li><p>OpenSearch Dashboards</p></li></ul><p>For MS DS students, start with:</p><ul><li><p>Prometheus Metrics Monitoring</p></li><li><p>Grafana Loki Log Analytics</p></li><li><p>OpenTelemetry Collector Pipeline</p></li><li><p>Kepler Kubernetes Energy Monitoring</p></li><li><p>Zeek Network Security Analytics</p></li></ul><p>For infrastructure and data center roles, start with:</p><ul><li><p>NetBox</p></li><li><p>MAAS</p></li><li><p>Ansible</p></li><li><p>Kubespray</p></li><li><p>Prometheus</p></li><li><p>LibreNMS</p></li><li><p>Ceph</p></li><li><p>FRRouting</p></li></ul><p>Again, one important reminder:</p><p><strong>Resume bullet points are only for reference. Adjust them based on your actual implementation, your target role, your results, and the metrics you personally achieve.</strong></p><p></p>]]></content:encoded></item><item><title><![CDATA[Top 25 SQL questions that cover any interview]]></title><description><![CDATA[FAANG company top SQL questions!]]></description><link>https://karthiktechdairy.substack.com/p/top-25-sql-questions-that-cover-any</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/top-25-sql-questions-that-cover-any</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Sun, 05 Apr 2026 21:12:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div><hr></div><h2>1) Top 3 products per region by total revenue, including ties</h2><pre><code><code>WITH product_revenue AS (
    SELECT
        region,
        product_id,
        SUM(revenue) AS total_revenue
    FROM sales
    GROUP BY region, product_id
),
ranked AS (
    SELECT
        region,
        product_id,
        total_revenue,
        DENSE_RANK() OVER (
            PARTITION BY region
            ORDER BY total_revenue DESC
        ) AS rnk
    FROM product_revenue
)
SELECT
    region,
    product_id,
    total_revenue
FROM ranked
WHERE rnk &lt;= 3
ORDER BY region, total_revenue DESC, product_id;
</code></code></pre><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>2) Latest event per user per event_type, tie-break by highest event_id</h2><pre><code><code>WITH ranked AS (
    SELECT
        user_id,
        event_type,
        event_time,
        event_id,
        ROW_NUMBER() OVER (
            PARTITION BY user_id, event_type
            ORDER BY event_time DESC, event_id DESC
        ) AS rn
    FROM events
)
SELECT
    user_id,
    event_type,
    event_time,
    event_id
FROM ranked
WHERE rn = 1;
</code></code></pre><div><hr></div><h2>3) Monthly revenue in 2025, fill missing months with 0, MoM growth %</h2><pre><code><code>WITH months AS (
    SELECT generate_series(
        DATE '2025-01-01',
        DATE '2025-12-01',
        INTERVAL '1 month'
    )::date AS month_start
),
monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', order_date)::date AS month_start,
        SUM(amount) AS revenue
    FROM orders
    WHERE order_date &gt;= DATE '2025-01-01'
      AND order_date &lt; DATE '2026-01-01'
    GROUP BY 1
),
filled AS (
    SELECT
        m.month_start,
        COALESCE(r.revenue, 0) AS revenue
    FROM months m
    LEFT JOIN monthly_revenue r
        ON m.month_start = r.month_start
)
SELECT
    month_start,
    revenue,
    ROUND(
        100.0 * (revenue - LAG(revenue) OVER (ORDER BY month_start))
        / NULLIF(LAG(revenue) OVER (ORDER BY month_start), 0),
        2
    ) AS mom_growth_pct
FROM filled
ORDER BY month_start;
</code></code></pre><div><hr></div><h2>4) 7-day rolling average revenue</h2><pre><code><code>SELECT
    dt,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            ORDER BY dt
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ),
        2
    ) AS rolling_7_day_avg
FROM daily_sales
ORDER BY dt;
</code></code></pre><div><hr></div><h2>5) Users whose second purchase happened within 7 days of the first</h2><pre><code><code>WITH distinct_purchases AS (
    SELECT DISTINCT
        user_id,
        order_time
    FROM orders
),
ranked AS (
    SELECT
        user_id,
        order_time,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY order_time
        ) AS purchase_num
    FROM distinct_purchases
),
first_second AS (
    SELECT
        user_id,
        MAX(CASE WHEN purchase_num = 1 THEN order_time END) AS first_purchase,
        MAX(CASE WHEN purchase_num = 2 THEN order_time END) AS second_purchase
    FROM ranked
    WHERE purchase_num &lt;= 2
    GROUP BY user_id
)
SELECT
    user_id,
    first_purchase,
    second_purchase
FROM first_second
WHERE second_purchase IS NOT NULL
  AND second_purchase &lt;= first_purchase + INTERVAL '7 day';
</code></code></pre><div><hr></div><h2>6) Day-7 retention by signup cohort</h2><pre><code><code>WITH retained_users AS (
    SELECT DISTINCT
        u.user_id,
        u.signup_date
    FROM users u
    JOIN events e
        ON e.user_id = u.user_id
       AND e.event_date = u.signup_date + INTERVAL '7 day'
)
SELECT
    u.signup_date,
    COUNT(*) AS signed_up_users,
    COUNT(r.user_id) AS retained_users_day_7,
    ROUND(100.0 * COUNT(r.user_id) / COUNT(*), 2) AS day_7_retention_pct
FROM users u
LEFT JOIN retained_users r
    ON u.user_id = r.user_id
   AND u.signup_date = r.signup_date
GROUP BY u.signup_date
ORDER BY u.signup_date;
</code></code></pre><div><hr></div><h2>7) Funnel: visit &#8594; signup &#8594; purchase in strict order within 24 hours of first visit</h2><pre><code><code>WITH first_visit AS (
    SELECT
        user_id,
        MIN(event_time) AS visit_time
    FROM events
    WHERE event_name = 'visit'
    GROUP BY user_id
),
first_signup AS (
    SELECT
        v.user_id,
        MIN(e.event_time) AS signup_time
    FROM first_visit v
    JOIN events e
        ON e.user_id = v.user_id
       AND e.event_name = 'signup'
       AND e.event_time &gt; v.visit_time
       AND e.event_time &lt;= v.visit_time + INTERVAL '24 hour'
    GROUP BY v.user_id
),
first_purchase AS (
    SELECT
        s.user_id,
        MIN(e.event_time) AS purchase_time
    FROM first_signup s
    JOIN first_visit v
        ON v.user_id = s.user_id
    JOIN events e
        ON e.user_id = s.user_id
       AND e.event_name = 'purchase'
       AND e.event_time &gt; s.signup_time
       AND e.event_time &lt;= v.visit_time + INTERVAL '24 hour'
    GROUP BY s.user_id
)
SELECT
    COUNT(v.user_id) AS visitors,
    COUNT(s.user_id) AS signed_up,
    COUNT(p.user_id) AS purchased,
    ROUND(100.0 * COUNT(s.user_id) / NULLIF(COUNT(v.user_id), 0), 2) AS visit_to_signup_pct,
    ROUND(100.0 * COUNT(p.user_id) / NULLIF(COUNT(v.user_id), 0), 2) AS visit_to_purchase_pct
FROM first_visit v
LEFT JOIN first_signup s
    ON v.user_id = s.user_id
LEFT JOIN first_purchase p
    ON v.user_id = p.user_id;
</code></code></pre><div><hr></div><h2>8) Sessionization with 30-minute inactivity rule</h2><pre><code><code>WITH flagged AS (
    SELECT
        user_id,
        event_time,
        CASE
            WHEN LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL THEN 1
            WHEN event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) &gt; INTERVAL '30 minute' THEN 1
            ELSE 0
        END AS new_session_flag
    FROM events
),
sessioned AS (
    SELECT
        user_id,
        event_time,
        SUM(new_session_flag) OVER (
            PARTITION BY user_id
            ORDER BY event_time
            ROWS UNBOUNDED PRECEDING
        ) AS session_id
    FROM flagged
)
SELECT
    user_id,
    session_id,
    MIN(event_time) AS session_start,
    MAX(event_time) AS session_end,
    COUNT(*) AS event_count
FROM sessioned
GROUP BY user_id, session_id
ORDER BY user_id, session_start;
</code></code></pre><div><hr></div><h2>9) Longest consecutive daily login streak</h2><pre><code><code>WITH distinct_logins AS (
    SELECT DISTINCT
        user_id,
        login_date::date AS login_date
    FROM logins
),
grouped AS (
    SELECT
        user_id,
        login_date,
        login_date
        - (ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY login_date
        )::int) * INTERVAL '1 day' AS grp
    FROM distinct_logins
),
streaks AS (
    SELECT
        user_id,
        MIN(login_date) AS streak_start,
        MAX(login_date) AS streak_end,
        COUNT(*) AS streak_length
    FROM grouped
    GROUP BY user_id, grp
)
SELECT *
FROM (
    SELECT
        user_id,
        streak_start,
        streak_end,
        streak_length,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY streak_length DESC, streak_end DESC
        ) AS rn
    FROM streaks
) x
WHERE rn = 1
ORDER BY user_id;
</code></code></pre><div><hr></div><h2>10) Products with at least 4 consecutive months above $100,000</h2><pre><code><code>WITH filtered AS (
    SELECT
        product_id,
        month_start::date AS month_start
    FROM monthly_sales
    WHERE revenue &gt; 100000
),
grouped AS (
    SELECT
        product_id,
        month_start,
        month_start
        - (ROW_NUMBER() OVER (
            PARTITION BY product_id
            ORDER BY month_start
        )::int) * INTERVAL '1 month' AS grp
    FROM filtered
),
streaks AS (
    SELECT
        product_id,
        MIN(month_start) AS streak_start,
        MAX(month_start) AS streak_end,
        COUNT(*) AS months_in_streak
    FROM grouped
    GROUP BY product_id, grp
)
SELECT
    product_id,
    streak_start,
    streak_end,
    months_in_streak
FROM streaks
WHERE months_in_streak &gt;= 4
ORDER BY product_id, streak_start;
</code></code></pre><div><hr></div><h2>11) Running inventory and first stockout timestamp</h2><pre><code><code>WITH running AS (
    SELECT
        sku,
        txn_time,
        qty_change,
        SUM(qty_change) OVER (
            PARTITION BY sku
            ORDER BY txn_time
            ROWS UNBOUNDED PRECEDING
        ) AS running_qty
    FROM inventory
),
first_stockout AS (
    SELECT
        sku,
        MIN(txn_time) AS first_below_zero_time
    FROM running
    WHERE running_qty &lt; 0
    GROUP BY sku
)
SELECT
    r.sku,
    r.txn_time,
    r.qty_change,
    r.running_qty,
    f.first_below_zero_time
FROM running r
LEFT JOIN first_stockout f
    ON r.sku = f.sku
ORDER BY r.sku, r.txn_time;
</code></code></pre><div><hr></div><h2>12) Median salary per department</h2><pre><code><code>SELECT
    department_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM salaries
GROUP BY department_id
ORDER BY department_id;
</code></code></pre><div><hr></div><h2>13) 3rd highest distinct salary per department</h2><pre><code><code>WITH distinct_salaries AS (
    SELECT DISTINCT
        department_id,
        salary
    FROM salaries
),
ranked AS (
    SELECT
        department_id,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department_id
            ORDER BY salary DESC
        ) AS rnk
    FROM distinct_salaries
)
SELECT
    department_id,
    salary AS third_highest_salary
FROM ranked
WHERE rnk = 3
ORDER BY department_id;
</code></code></pre><div><hr></div><h2>14) Customers who opened a support ticket and never placed any order after that ticket</h2><pre><code><code>SELECT DISTINCT
    t.customer_id
FROM support_tickets t
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = t.customer_id
      AND o.order_date &gt; t.ticket_date
);
</code></code></pre><div><hr></div><h2>15) Last-touch attribution</h2><pre><code><code>WITH attributed AS (
    SELECT
        p.user_id,
        p.purchase_time,
        p.revenue,
        t.channel
    FROM purchases p
    LEFT JOIN LATERAL (
        SELECT channel
        FROM touches t
        WHERE t.user_id = p.user_id
          AND t.touch_time &lt;= p.purchase_time
        ORDER BY t.touch_time DESC
        LIMIT 1
    ) t ON TRUE
)
SELECT
    channel,
    SUM(revenue) AS attributed_revenue
FROM attributed
GROUP BY channel
ORDER BY attributed_revenue DESC;
</code></code></pre><div><hr></div><h2>16) Opening price, closing price, and monthly change per product</h2><pre><code><code>WITH monthly_prices AS (
    SELECT
        product_id,
        DATE_TRUNC('month', price_date)::date AS month_start,
        price_date,
        price,
        FIRST_VALUE(price) OVER (
            PARTITION BY product_id, DATE_TRUNC('month', price_date)
            ORDER BY price_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS opening_price,
        LAST_VALUE(price) OVER (
            PARTITION BY product_id, DATE_TRUNC('month', price_date)
            ORDER BY price_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS closing_price
    FROM prices
)
SELECT DISTINCT
    product_id,
    month_start,
    opening_price,
    closing_price,
    closing_price - opening_price AS monthly_price_change
FROM monthly_prices
ORDER BY product_id, month_start;
</code></code></pre><div><hr></div><h2>17) Share-of-wallet per user-category</h2><pre><code><code>WITH category_spend AS (
    SELECT
        user_id,
        category,
        SUM(amount) AS category_amount
    FROM transactions
    GROUP BY user_id, category
),
total_spend AS (
    SELECT
        user_id,
        SUM(amount) AS total_amount
    FROM transactions
    GROUP BY user_id
)
SELECT
    c.user_id,
    c.category,
    c.category_amount,
    ROUND(100.0 * c.category_amount / NULLIF(t.total_amount, 0), 2) AS pct_of_wallet
FROM category_spend c
JOIN total_spend t
    ON c.user_id = t.user_id
ORDER BY c.user_id, c.category;
</code></code></pre><div><hr></div><h2>18) Users active in all 3 months of Q1 2025</h2><pre><code><code>SELECT
    user_id
FROM events
WHERE event_date &gt;= DATE '2025-01-01'
  AND event_date &lt; DATE '2025-04-01'
GROUP BY user_id
HAVING COUNT(DISTINCT DATE_TRUNC('month', event_date)) = 3
ORDER BY user_id;
</code></code></pre><div><hr></div><h2>19) Retained users vs reactivated users by month</h2><pre><code><code>WITH monthly_active AS (
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', event_date)::date AS month_start
    FROM events
),
classified AS (
    SELECT
        a.user_id,
        a.month_start,
        CASE
            WHEN EXISTS (
                SELECT 1
                FROM monthly_active p
                WHERE p.user_id = a.user_id
                  AND p.month_start = a.month_start - INTERVAL '1 month'
            ) THEN 'retained'
            WHEN EXISTS (
                SELECT 1
                FROM monthly_active e
                WHERE e.user_id = a.user_id
                  AND e.month_start &lt; a.month_start - INTERVAL '1 month'
            ) THEN 'reactivated'
            ELSE 'new_or_returning_first_time'
        END AS user_status
    FROM monthly_active a
)
SELECT
    month_start,
    COUNT(DISTINCT CASE WHEN user_status = 'retained' THEN user_id END) AS retained_users,
    COUNT(DISTINCT CASE WHEN user_status = 'reactivated' THEN user_id END) AS reactivated_users
FROM classified
GROUP BY month_start
ORDER BY month_start;
</code></code></pre><div><hr></div><h2>20) Assign each order the correct customer tier at purchase time</h2><pre><code><code>SELECT
    o.order_id,
    o.customer_id,
    o.order_time,
    t.tier
FROM orders o
JOIN customer_tiers t
    ON o.customer_id = t.customer_id
   AND o.order_time &gt;= t.valid_from
   AND (
        o.order_time &lt; t.valid_to
        OR t.valid_to IS NULL
   );
</code></code></pre><div><hr></div><h2>21) Recursive hierarchy: depth and top-most manager</h2><pre><code><code>WITH RECURSIVE org AS (
    SELECT
        emp_id AS root_emp_id,
        emp_id,
        manager_id,
        0 AS depth
    FROM employees

    UNION ALL

    SELECT
        o.root_emp_id,
        e.emp_id,
        e.manager_id,
        o.depth + 1
    FROM org o
    JOIN employees e
        ON o.manager_id = e.emp_id
),
final_chain AS (
    SELECT
        root_emp_id AS emp_id,
        emp_id AS top_most_manager,
        depth,
        ROW_NUMBER() OVER (
            PARTITION BY root_emp_id
            ORDER BY depth DESC
        ) AS rn
    FROM org
)
SELECT
    emp_id,
    top_most_manager,
    depth AS management_chain_depth
FROM final_chain
WHERE rn = 1
ORDER BY emp_id;
</code></code></pre><div><hr></div><h2>22) Recursive hierarchy with cycle detection</h2><pre><code><code>WITH RECURSIVE org AS (
    SELECT
        emp_id AS root_emp_id,
        emp_id,
        manager_id,
        ARRAY[emp_id] AS path,
        FALSE AS has_cycle
    FROM employees

    UNION ALL

    SELECT
        o.root_emp_id,
        e.emp_id,
        e.manager_id,
        o.path || e.emp_id,
        e.emp_id = ANY(o.path) AS has_cycle
    FROM org o
    JOIN employees e
        ON o.manager_id = e.emp_id
    WHERE NOT o.has_cycle
)
SELECT
    root_emp_id AS emp_id,
    path,
    has_cycle
FROM org
WHERE has_cycle = TRUE
ORDER BY emp_id;
</code></code></pre><p>This detects bad loops and stops recursion from running forever.</p><div><hr></div><h2>23) Orders above the 95th percentile each month</h2><pre><code><code>WITH monthly_p95 AS (
    SELECT
        DATE_TRUNC('month', order_date)::date AS month_start,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY order_amount) AS p95_amount
    FROM orders
    GROUP BY 1
)
SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    o.order_amount,
    p.p95_amount
FROM orders o
JOIN monthly_p95 p
    ON DATE_TRUNC('month', o.order_date)::date = p.month_start
WHERE o.order_amount &gt; p.p95_amount
ORDER BY o.order_date, o.order_amount DESC;
</code></code></pre><div><hr></div><h2>24) Canonical user record and duplicates by normalized email</h2><pre><code><code>WITH normalized AS (
    SELECT
        user_id,
        email,
        created_at,
        LOWER(TRIM(email)) AS normalized_email
    FROM users
),
ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY normalized_email
            ORDER BY created_at, user_id
        ) AS rn,
        MIN(user_id) OVER (
            PARTITION BY normalized_email
        ) AS canonical_user_id
    FROM normalized
)
SELECT
    user_id,
    email,
    created_at,
    normalized_email,
    canonical_user_id,
    CASE
        WHEN rn = 1 THEN 'canonical'
        ELSE 'duplicate'
    END AS record_type
FROM ranked
ORDER BY normalized_email, created_at, user_id;
</code></code></pre><div><hr></div><h2>25) Performance/debugging answer</h2><p>This one is more about <strong>how you think</strong> than just writing SQL.</p><h3>Step 1: check row counts at each stage</h3><p>If the final row count is much larger than expected, the first suspect is a <strong>join explosion</strong>.</p><pre><code><code>SELECT COUNT(*) FROM events;
SELECT COUNT(*) FROM orders;

SELECT COUNT(*)
FROM events e
JOIN orders o
    ON e.user_id = o.user_id;
</code></code></pre><p>If this jumps massively, you probably have a many-to-many join.</p><div><hr></div><h3>Step 2: test uniqueness on join keys</h3><pre><code><code>SELECT user_id, COUNT(*)
FROM events
GROUP BY user_id
HAVING COUNT(*) &gt; 1;

SELECT user_id, COUNT(*)
FROM orders
GROUP BY user_id
HAVING COUNT(*) &gt; 1;
</code></code></pre><p>If both sides have repeated <code>user_id</code>, joining directly multiplies rows.</p><div><hr></div><h3>Step 3: pre-aggregate before joining</h3><p>If the business logic is really user-level, reduce both sides first.</p><pre><code><code>WITH event_users AS (
    SELECT user_id, MIN(event_time) AS first_event_time
    FROM events
    GROUP BY user_id
),
order_users AS (
    SELECT user_id, MIN(order_time) AS first_order_time
    FROM orders
    GROUP BY user_id
)
SELECT *
FROM event_users e
JOIN order_users o
    ON e.user_id = o.user_id;
</code></code></pre><div><hr></div><h3>Step 4: push filters early</h3><p>Bad filtering order makes the engine process too much data.</p><pre><code><code>WITH filtered_events AS (
    SELECT *
    FROM events
    WHERE event_time &gt;= DATE '2025-01-01'
),
filtered_orders AS (
    SELECT *
    FROM orders
    WHERE order_time &gt;= DATE '2025-01-01'
)
SELECT *
FROM filtered_events e
JOIN filtered_orders o
    ON e.user_id = o.user_id;
</code></code></pre><div><hr></div><h3>Step 5: inspect the execution plan</h3><pre><code><code>EXPLAIN ANALYZE
SELECT ...
</code></code></pre><p>Look for:</p><ul><li><p>huge row estimates vs actual rows</p></li><li><p>nested loops on large tables</p></li><li><p>full table scans where indexes should exist</p></li><li><p>hash joins building giant intermediate tables</p></li></ul><div><hr></div><h3>What interviewers want to hear</h3><p>A strong answer is:</p><blockquote><p>&#8220;First I&#8217;d validate whether the slowdown is from row multiplication by checking join cardinality and uniqueness of keys. Then I&#8217;d test whether pre-aggregation is missing, push filters earlier, and inspect the execution plan with <code>EXPLAIN ANALYZE</code> to see where the cost blows up.&#8221;</p></blockquote><div><hr></div><h2>Why these answers matter</h2><p>Because if you can solve these cleanly, you&#8217;re covering the hardest buckets interviewers actually care about:</p><ul><li><p>ranking</p></li><li><p>deduplication</p></li><li><p>cohorts and retention</p></li><li><p>funnels</p></li><li><p>sessionization</p></li><li><p>streaks</p></li><li><p>attribution</p></li><li><p>temporal joins</p></li><li><p>recursive CTEs</p></li><li><p>percentiles</p></li><li><p>query debugging</p></li></ul><p>This is the real core.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[How I’d Actually Become a Data Analyst in 2026]]></title><description><![CDATA[From zero to advanced, without wasting months learning random tools]]></description><link>https://karthiktechdairy.substack.com/p/how-id-actually-become-a-data-analyst</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/how-id-actually-become-a-data-analyst</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Sat, 04 Apr 2026 15:23:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>From zero to advanced, without wasting months learning random tools</em></p><p>Every week, I see people asking the same question:</p><p><strong>&#8220;How do I become a data analyst?&#8221;</strong></p><p>And most of the time, the answers are either too vague or too overwhelming.</p><p>Some people say, &#8220;Just learn SQL and Excel.&#8221;<br>Some say, &#8220;Do Python, Tableau, Power BI, statistics, machine learning, cloud, and AI.&#8221;<br>And then beginners end up doing a little bit of everything&#8230; and mastering nothing.</p><p>That&#8217;s the real problem.</p><p>The goal is not to collect tools.<br>The goal is to become someone who can look at messy data, figure out what matters, and explain it in a way that helps a business take action.</p><p>That&#8217;s what a good data analyst actually does.</p><p>So if I had to start again today, this is the roadmap I&#8217;d follow.</p><div><hr></div><h2>1) Start with spreadsheets first</h2><p>I know spreadsheets don&#8217;t sound exciting.<br>But this is where most real business data still lives.</p><p>Before jumping into dashboards or Python, get genuinely comfortable with <strong>Excel or Google Sheets</strong>.</p><p>Learn:</p><ul><li><p>formulas like <code>SUM</code>, <code>AVERAGE</code>, <code>IF</code>, <code>COUNTIF</code>, <code>XLOOKUP</code></p></li><li><p>sorting and filtering</p></li><li><p>conditional formatting</p></li><li><p>pivot tables</p></li><li><p>basic charts</p></li><li><p>simple dashboards</p></li><li><p>manual data cleaning</p></li></ul><p>This stage matters more than people think.</p><p>Because if you cannot take a messy spreadsheet and make it readable, advanced tools won&#8217;t magically fix that.</p><p>Your first goal should be simple:</p><p><strong>Take raw data and turn it into something a non-technical person can understand in two minutes.</strong></p><div><hr></div><h2>2) Learn basic statistics without overcomplicating it</h2><p>A lot of beginners get scared when they hear &#8220;statistics,&#8221; but honestly, you do not need to become a statistician.</p><p>You just need enough to avoid making bad conclusions.</p><p>Focus on:</p><ul><li><p>mean, median, mode</p></li><li><p>percentages and growth rates</p></li><li><p>variance and standard deviation</p></li><li><p>correlation vs causation</p></li><li><p>probability basics</p></li><li><p>distributions</p></li><li><p>confidence intervals</p></li><li><p>basic hypothesis testing intuition</p></li></ul><p>Why this matters:</p><p>Because a lot of people can build charts.<br>Very few can tell whether the pattern in that chart is actually meaningful.</p><p>A data analyst should be able to answer questions like:</p><ul><li><p>Is this change normal?</p></li><li><p>Is this trend real?</p></li><li><p>Is this just random noise?</p></li><li><p>Should the business care about this?</p></li></ul><p>If you can answer those properly, you&#8217;re already ahead of a lot of people.</p><div><hr></div><h2>3) Understand the business, not just the data</h2><p>This is where many people get stuck.</p><p>They become &#8220;tool people&#8221; instead of &#8220;problem solvers.&#8221;</p><p>A company is not hiring you because you know how to use a dashboard tool.<br>They&#8217;re hiring you because they want someone who can help them understand:</p><ul><li><p>why sales dropped</p></li><li><p>why users stopped converting</p></li><li><p>which channel is wasting money</p></li><li><p>what product change is actually working</p></li></ul><p>So start learning business concepts early:</p><ul><li><p>revenue</p></li><li><p>profit</p></li><li><p>margin</p></li><li><p>conversion rate</p></li><li><p>churn</p></li><li><p>retention</p></li><li><p>funnel analysis</p></li><li><p>cohort analysis</p></li><li><p>segmentation</p></li></ul><p>The habit you want to build is this:</p><p><strong>What is happening? Why is it happening? What should be done next?</strong></p><p>That thinking is what makes an analyst valuable.</p><div><hr></div><h2>4) Learn SQL properly</h2><p>If spreadsheets are your starting point, SQL is your real entry ticket into data analytics.</p><p>This is one of the most important skills in the entire roadmap.</p><p>Start with:</p><ul><li><p><code>SELECT</code></p></li><li><p><code>WHERE</code></p></li><li><p><code>ORDER BY</code></p></li><li><p><code>GROUP BY</code></p></li><li><p><code>HAVING</code></p></li><li><p>joins</p></li><li><p>subqueries</p></li><li><p>CTEs</p></li><li><p><code>CASE WHEN</code></p></li><li><p>date functions</p></li><li><p>window functions</p></li></ul><p>Then push into more business-style use cases:</p><ul><li><p>monthly revenue trends</p></li><li><p>retention analysis</p></li><li><p>ranking top customers</p></li><li><p>finding duplicate records</p></li><li><p>identifying inactive users</p></li><li><p>comparing categories across time</p></li></ul><p>The mistake many people make is learning SQL like it&#8217;s a syntax exercise.</p><p>Don&#8217;t do that.</p><p>Learn SQL like you&#8217;re answering business questions.</p><p>That&#8217;s when it starts becoming useful.</p><div><hr></div><h2>5) Learn visualization after you learn how to think</h2><p>This is the stage where people start feeling like a &#8220;real analyst,&#8221; because now the data becomes visible.</p><p>Pick <strong>Tableau</strong> or <strong>Power BI</strong> and get comfortable with:</p><ul><li><p>bar charts</p></li><li><p>line charts</p></li><li><p>scatter plots</p></li><li><p>heatmaps</p></li><li><p>maps</p></li><li><p>dashboard layout</p></li><li><p>filters</p></li><li><p>drill-downs</p></li><li><p>storytelling with data</p></li></ul><p>But here&#8217;s the important part:</p><p>A dashboard is not supposed to look impressive.<br>It is supposed to make the answer obvious.</p><p>That&#8217;s the real standard.</p><p>A strong dashboard helps someone instantly see:</p><ul><li><p>what changed</p></li><li><p>where the problem is</p></li><li><p>what needs attention</p></li><li><p>what decision should be taken next</p></li></ul><p>That&#8217;s how you should build.</p><div><hr></div><h2>6) Learn Python when you&#8217;re ready to go beyond manual work</h2><p>Once you&#8217;re comfortable with spreadsheets, SQL, and basic visualization, then Python starts making a lot more sense.</p><p>Because now you know why you&#8217;re using it.</p><p>Focus on:</p><ul><li><p><code>pandas</code></p></li><li><p><code>numpy</code></p></li><li><p><code>matplotlib</code></p></li><li><p>basic EDA</p></li><li><p>cleaning missing values</p></li><li><p>merging datasets</p></li><li><p>grouping and aggregation</p></li><li><p>reading CSV and Excel files</p></li><li><p>automating repetitive analysis tasks</p></li></ul><p>Python helps when:</p><ul><li><p>the data is bigger</p></li><li><p>the cleaning is messier</p></li><li><p>the analysis needs repetition</p></li><li><p>you want more flexibility than spreadsheets can give</p></li></ul><p>You do not need to become a software engineer here.</p><p>You just need to become the kind of analyst who can take messy data, clean it, analyze it, and explain what matters.</p><p>That alone is powerful.</p><div><hr></div><h2>7) Get very good at data cleaning</h2><p>This is probably the least glamorous skill in analytics.</p><p>And also one of the most important.</p><p>Real data is messy.</p><p>It has:</p><ul><li><p>missing values</p></li><li><p>inconsistent names</p></li><li><p>wrong formats</p></li><li><p>duplicates</p></li><li><p>broken dates</p></li><li><p>weird text</p></li><li><p>bad joins</p></li><li><p>incomplete records</p></li></ul><p>This is where a lot of analysis goes wrong.</p><p>Not because the model was bad.<br>Not because the dashboard was bad.<br>But because the data itself was never questioned.</p><p>A good analyst keeps asking:</p><ul><li><p>Can I trust this dataset?</p></li><li><p>Is something missing?</p></li><li><p>Are these numbers even reasonable?</p></li><li><p>Are we comparing the right things?</p></li><li><p>Is the business logic aligned with the data logic?</p></li></ul><p>That mindset matters more than any tool.</p><div><hr></div><h2>8) Move into advanced analytics</h2><p>Once your basics are strong, then it&#8217;s time to go beyond &#8220;what happened?&#8221;</p><p>Now you can start exploring:</p><ul><li><p>A/B testing</p></li><li><p>regression basics</p></li><li><p>forecasting basics</p></li><li><p>retention analysis</p></li><li><p>cohort analysis</p></li><li><p>funnel analysis</p></li><li><p>segmentation</p></li><li><p>anomaly detection</p></li><li><p>time series thinking</p></li></ul><p>This is the stage where your work becomes more strategic.</p><p>You move from:</p><ul><li><p>describing the past</p></li></ul><p>to:</p><ul><li><p>explaining the present</p></li><li><p>testing ideas</p></li><li><p>predicting what may happen next</p></li></ul><p>That shift is a big one.</p><p>And it&#8217;s also where analysts become much more valuable.</p><div><hr></div><h2>9) Learn how data systems actually work</h2><p>A strong analyst does not just query tables blindly.</p><p>They understand where the data comes from.</p><p>That means learning:</p><ul><li><p>relational databases</p></li><li><p>primary keys and foreign keys</p></li><li><p>data warehouses</p></li><li><p>ETL / ELT basics</p></li><li><p>fact and dimension tables</p></li><li><p>star schema basics</p></li><li><p>how pipelines move data from source to dashboard</p></li></ul><p>Later, it also helps to know tools like:</p><ul><li><p>BigQuery</p></li><li><p>Snowflake</p></li><li><p>Redshift</p></li><li><p>dbt</p></li></ul><p>You do not need to become a data engineer.</p><p>But you should absolutely understand the flow of data.</p><p>Because once you understand the system, your analysis becomes more reliable.</p><div><hr></div><h2>10) Communication is not optional</h2><p>This is where average analysts and strong analysts start separating.</p><p>You can do excellent analysis and still get ignored if you cannot communicate it well.</p><p>So learn how to:</p><ul><li><p>write clear summaries</p></li><li><p>explain insights simply</p></li><li><p>present recommendations</p></li><li><p>adapt your language for stakeholders</p></li><li><p>connect numbers to action</p></li></ul><p>For example:</p><p>Instead of saying:</p><p><strong>&#8220;Revenue increased 12% month-over-month.&#8221;</strong></p><p>Say:</p><p><strong>&#8220;Revenue grew 12% compared to last month, mainly driven by repeat customers in the top-performing category.&#8221;</strong></p><p>That second version is better because it adds meaning.</p><p>That&#8217;s what people remember.<br>That&#8217;s what gets trusted.</p><div><hr></div><h2>11) Build projects that show business thinking</h2><p>Projects are where all of this starts coming together.</p><p>Not random projects.<br>Not &#8220;I made a chart because I had a CSV.&#8221;<br>Real projects with a question, a process, and an outcome.</p><p>Good project ideas:</p><ul><li><p>sales dashboard</p></li><li><p>customer churn analysis</p></li><li><p>marketing campaign performance analysis</p></li><li><p>retention or cohort analysis</p></li><li><p>SQL case study</p></li><li><p>Python EDA project</p></li><li><p>A/B testing case study</p></li></ul><p>For every project, include:</p><ul><li><p>the problem</p></li><li><p>the dataset</p></li><li><p>your cleaning process</p></li><li><p>your analysis</p></li><li><p>the insight</p></li><li><p>the recommendation</p></li><li><p>the final output</p></li></ul><p>The best portfolios do not just show code.</p><p>They show judgment.</p><div><hr></div><h2>12) Prepare for interviews like an analyst, not a student</h2><p>Interview prep is not just memorizing SQL questions.</p><p>It&#8217;s learning how to explain your work clearly.</p><p>Be ready to answer:</p><ul><li><p>What problem were you solving?</p></li><li><p>What was messy in the data?</p></li><li><p>How did you clean it?</p></li><li><p>What did you find?</p></li><li><p>What recommendation did you make?</p></li><li><p>What would you improve if you had more time?</p></li></ul><p>Also practice:</p><ul><li><p>SQL interview questions</p></li><li><p>Excel case studies</p></li><li><p>dashboard walkthroughs</p></li><li><p>statistics basics</p></li><li><p>business case questions</p></li></ul><p>A lot of people know enough to do the work.</p><p>But they struggle to explain it.</p><p>That&#8217;s why practice matters.</p><div><hr></div><h2>13) Learn to use AI as a tool, not a shortcut</h2><p>This part matters even more now.</p><p>The strongest analysts today are not just &#8220;Excel + SQL&#8221; people.</p><p>They know how to use AI to work faster and think better.</p><p>That does <strong>not</strong> mean letting AI do everything.</p><p>It means using it well.</p><p>For example, AI can help you:</p><ul><li><p>draft SQL queries faster</p></li><li><p>debug broken code</p></li><li><p>summarize large datasets</p></li><li><p>brainstorm KPIs</p></li><li><p>turn dashboard findings into first-draft business summaries</p></li><li><p>explore anomalies quickly</p></li></ul><p>But the real job is still yours.</p><p>You still need to decide:</p><ul><li><p>what question matters</p></li><li><p>whether the data is trustworthy</p></li><li><p>whether the output makes business sense</p></li><li><p>what action should be taken</p></li></ul><p>So yes, learn AI tools.</p><p>But use them like an accelerator, not a crutch.</p><div><hr></div><h2>The order I&#8217;d follow</h2><p>If I were guiding someone from scratch, I&#8217;d keep it simple:</p><p><strong>Stage 1:</strong> Excel / Google Sheets + basic statistics + business understanding<br><strong>Stage 2:</strong> SQL from beginner to advanced<br><strong>Stage 3:</strong> Tableau or Power BI<br><strong>Stage 4:</strong> Python for analytics<br><strong>Stage 5:</strong> Data cleaning and real-world case studies<br><strong>Stage 6:</strong> Advanced analytics<br><strong>Stage 7:</strong> Data systems and warehouses<br><strong>Stage 8:</strong> Portfolio + interview preparation</p><p>And the best way to learn is not:</p><p>&#8220;Finish one giant syllabus and then start building.&#8221;</p><p>It&#8217;s this:</p><ul><li><p>learn a topic</p></li><li><p>do a small project</p></li><li><p>learn the next topic</p></li><li><p>improve the project</p></li><li><p>repeat</p></li></ul><p>That loop works.</p><div><hr></div><h2>A few YouTube resources that are actually useful</h2><p>Here are some solid starting points:</p><p><strong>1. Alex The Analyst &#8211; Data Analyst Bootcamp</strong><br>A practical full playlist that covers the core analyst stack.<br><a href="https://www.youtube.com/playlist?list=PLUaB-1hjhk8FE_XZ87vPPSfHqb6OcM0cF">Watch here</a></p><p><strong>2. Luke Barousse &#8211; SQL for Data Analytics</strong><br>Very helpful if you want SQL taught in a clean, job-relevant way.<br><a href="https://www.youtube.com/watch?v=7mz73uXD9DA">Watch here</a></p><p><strong>3. freeCodeCamp &#8211; Data Analysis with Python</strong><br>A good beginner-friendly Python course for analysis.<br><a href="https://www.youtube.com/watch?v=r-uOLxNrNk8">Watch here</a></p><p><strong>4. Alex The Analyst &#8211; Excel Tutorials for Data Analysts</strong><br>Useful for spreadsheet foundations, pivot tables, and cleaning.<br><a href="https://www.youtube.com/playlist?list=PLUaB-1hjhk8Hyd5NiPQ9CND82vNodlFF5">Watch here</a></p><div><hr></div><h2>GitHub projects and repos to explore</h2><p>If you want hands-on practice, these are good places to start:</p><p><strong>1. AlexTheAnalyst / PortfolioProjects</strong><br><a href="https://github.com/AlexTheAnalyst/PortfolioProjects">GitHub repo</a></p><p><strong>2. emily1618 / Data-Portfolio</strong><br><a href="https://github.com/emily1618/Data-Portfolio">GitHub repo</a></p><p><strong>3. DeviSuhithaChundru / Retail-Data-Analytics-Project-Python-SQL-Integration</strong><br><a href="https://github.com/DeviSuhithaChundru/Retail-Data-Analytics-Project-Python-SQL-Integration">GitHub repo</a></p><p><strong>4. lukebarousse / Int_SQL_Data_Analytics_Course</strong><br><a href="https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course">GitHub repo</a></p><p><strong>5. jordanlue / DataQuest-Guided-Projects</strong><br><a href="https://github.com/jordanlue/DataQuest-Guided-Projects">GitHub repo</a></p><p><strong>6. amlanmohanty1 / customer-trends-data-analysis-SQL-Python-PowerBI</strong><br><a href="https://github.com/amlanmohanty1/customer-trends-data-analysis-SQL-Python-PowerBI">GitHub repo</a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Final thought</h2><p>A lot of people spend months asking:</p><p><strong>&#8220;Which tool should I learn next?&#8221;</strong></p><p>A better question is:</p><p><strong>&#8220;Can I take messy data, find something meaningful, and explain what should happen next?&#8221;</strong></p><p>Because that is the real job.</p><p>Tools matter, yes.</p><p>But clear thinking, clean analysis, and strong communication matter more.</p><p>That&#8217;s what turns someone from &#8220;learning analytics&#8221; into actually becoming an analyst.</p>]]></content:encoded></item><item><title><![CDATA[F1 Visa Slots Are Opening: Here’s the Full Legit Process From DS-160 to USTravelDocs]]></title><description><![CDATA[If you&#8217;re trying to book your F1 visa right now, don&#8217;t rush blindly.]]></description><link>https://karthiktechdairy.substack.com/p/f1-visa-slots-are-opening-heres-the</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/f1-visa-slots-are-opening-heres-the</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Mon, 30 Mar 2026 16:08:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>F1 Visa Slots Are Opening: Here&#8217;s the Full Legit Process From DS-160 to USTravelDocs</h1><p>If you&#8217;re trying to book your F1 visa right now, don&#8217;t rush blindly.</p><p>A lot of students think the process is just:<br>fill DS-160 &#8594; book slot &#8594; attend interview.</p><p>But that is not the full picture.</p><p>There is a proper order, and if you do things in the wrong sequence, you can delay your application, enter incorrect details, or show up without the right documents.</p><p>So here is the full process in a simple, clear way.</p><h2>Official links you should keep open</h2><pre><code>Student visa information:
https://travel.state.gov/content/travel/en/us-visas/study/student-visa.html

DS-160 form:
https://ceac.state.gov/genniv/

DS-160 FAQs:
https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/forms/ds-160-online-nonimmigrant-visa-application/ds-160-faqs.html

SEVIS I-901 fee:
https://www.ice.gov/sevis/i901

SEVIS student information:
https://www.ice.gov/sevis/students

USTravelDocs:
https://www.ustraveldocs.com/

Visa wait times:
https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/global-visa-wait-times.html

Visa status check:
https://ceac.state.gov/ceacstattracker/status.aspx

U.S. visas main page / embassy access:
https://travel.state.gov/content/travel/en/us-visas.html</code></pre><h2>First, understand the correct order</h2><p>For most new F1 students, the process goes like this:</p><p>Get admitted to a SEVP-approved school &#8594; receive Form I-20 &#8594; pay SEVIS I-901 fee &#8594; complete DS-160 &#8594; create your visa profile and follow your embassy/USTravelDocs steps &#8594; schedule appointment &#8594; attend biometrics/interview if required &#8594; wait for passport return and visa decision.</p><p>The exact appointment and payment flow can vary by country, so always follow the instructions for your specific embassy or consulate.</p><h2>Step 1: Get admitted and receive your Form I-20</h2><p>Before you can apply for an F1 visa, you need to be accepted by a SEVP-approved school.</p><p>Once admitted, your school registers you in SEVIS and issues your Form I-20.</p><p>This document is one of the most important parts of the entire process.</p><p>Make sure:</p><ul><li><p>your name matches your passport</p></li><li><p>your program details are correct</p></li><li><p>the start date is correct</p></li><li><p>the I-20 is signed where needed<br></p></li></ul><p>If dependents are traveling with you, they need their own I-20s.</p><h2>Step 2: Pay the SEVIS I-901 fee</h2><p>After receiving the I-20, pay the SEVIS I-901 fee through the official SEVIS website.</p><p>Save the payment receipt immediately.</p><p>I strongly recommend keeping:</p><ul><li><p>one PDF copy</p></li><li><p>one screenshot</p></li><li><p>one printed copy<br></p></li></ul><p>You may need this during the visa process and interview preparation.</p><h2>Step 3: Fill out the DS-160 carefully</h2><p>Next, complete the DS-160 online.</p><p>This is your official nonimmigrant visa application form.</p><p>While filling it out, keep these ready:</p><ul><li><p>passport</p></li><li><p>Form I-20</p></li><li><p>SEVIS ID from the I-20</p></li><li><p>school address</p></li><li><p>travel history if applicable</p></li><li><p>education/work details<br></p></li></ul><p>Take your time here.</p><p>A lot of people make avoidable mistakes in:</p><ul><li><p>passport number</p></li><li><p>SEVIS ID</p></li><li><p>university name</p></li><li><p>personal details</p></li><li><p>travel history</p></li><li><p>photo upload<br></p></li></ul><p>After submitting the DS-160, download and print the confirmation page with the barcode.</p><p>That confirmation page is extremely important.</p><p>Also save your DS-160 application ID somewhere safe in case you need to retrieve the form later.</p><h2>Step 4: Upload your visa photo properly</h2><p>During the DS-160 process, you will usually upload a visa photo.</p><p>Make sure the photo follows the official U.S. visa photo requirements.</p><p>If the digital upload fails, you may need to carry a printed photo in the required format.</p><p>Do not ignore this part. Even small issues with the photo can create unnecessary problems.</p><h2>Step 5: Create your visa profile and follow your embassy/USTravelDocs process</h2><p>This is where many students get confused.</p><p>After DS-160, go to your specific U.S. embassy or consulate instructions and check whether your country uses USTravelDocs or another scheduling platform.</p><p>In many countries, USTravelDocs handles:</p><ul><li><p>profile creation</p></li><li><p>fee instructions</p></li><li><p>appointment scheduling</p></li><li><p>passport tracking</p></li><li><p>pickup or delivery information<br></p></li></ul><p>At this stage, do the following:</p><ol><li><p>Open your embassy or consulate&#8217;s visa instructions</p></li><li><p>Create your visa profile</p></li><li><p>Enter the correct passport and DS-160 details</p></li><li><p>Follow the fee payment method shown for your location</p></li><li><p>Schedule your appointment(s)<br></p></li></ol><p>Do not blindly follow random videos from another country. The process can differ by location.</p><h2>Step 6: Check visa slot availability and book early</h2><p>Visa slot availability changes by location, season, and demand.</p><p>That is why students should check early and be ready with all documents before trying to book.</p><p>Also remember this important point:</p><p>You may be able to get the visa well before your course starts, but new F1 students generally cannot enter the U.S. more than 30 days before the program start date on the I-20.</p><p>So plan your travel carefully.</p><h2>Step 7: Be ready for biometrics or multiple appointments</h2><p>Depending on your location, the process may involve:</p><ul><li><p>biometrics</p></li><li><p>fingerprints</p></li><li><p>one appointment</p></li><li><p>two appointments</p></li><li><p>VAC plus interview</p></li><li><p>direct interview flow<br></p></li></ul><p>This is exactly why country-specific instructions matter.</p><p>Before your appointment day, double-check:</p><ul><li><p>location</p></li><li><p>appointment date</p></li><li><p>reporting time</p></li><li><p>document rules</p></li><li><p>whether electronics or bags are restricted<br></p></li></ul><h2>Step 8: Prepare your complete document folder</h2><p>Your baseline folder should include:</p><ul><li><p>valid passport</p></li><li><p>DS-160 confirmation page</p></li><li><p>visa fee receipt if applicable</p></li><li><p>printed visa photo if needed</p></li><li><p>signed Form I-20</p></li><li><p>SEVIS fee receipt</p></li><li><p>appointment confirmation page<br></p></li></ul><p>You should also keep supporting documents ready, such as:</p><ul><li><p>admission letter</p></li><li><p>transcripts</p></li><li><p>degree certificates</p></li><li><p>test scores if relevant</p></li><li><p>financial documents</p></li><li><p>scholarship or assistantship letters if any</p></li><li><p>sponsor documents if someone else is funding you<br></p></li></ul><p>Even if every document is not always requested, it is much better to be overprepared than underprepared.</p><h2>Step 9: Prepare for the visa interview properly</h2><p>Your interview is not just about documents.</p><p>You should be able to clearly explain:</p><ul><li><p>what you are going to study</p></li><li><p>why you chose that university</p></li><li><p>why you chose that program</p></li><li><p>how your education will be funded</p></li><li><p>what your academic background is</p></li><li><p>what your plans are after completing your studies<br></p></li></ul><p>Your answers should be clear, honest, and direct.</p><p>Do not memorize robotic lines.</p><p>Know your own profile well.</p><h2>Step 10: After the interview</h2><p>After your interview, keep track of two things:</p><ol><li><p>Your visa application status</p></li><li><p>Your passport return or pickup status<br></p></li></ol><p>Depending on your location, USTravelDocs may help with passport tracking and delivery details.</p><p>Do not make irreversible travel plans until your passport is returned and your visa is actually issued.</p><h2>Important travel rule</h2><p>Even if your visa is approved, the visa itself does not guarantee entry.</p><p>Final admission into the United States is decided at the port of entry.</p><p>Also remember again:<br><br>new F1 students usually cannot enter the U.S. more than 30 days before the program start date listed on the I-20.</p><h2>Biggest mistakes students should avoid</h2><p>Here are some of the most common mistakes:</p><ul><li><p>trying to start without the I-20</p></li><li><p>entering wrong DS-160 details</p></li><li><p>forgetting to save the DS-160 confirmation page</p></li><li><p>ignoring embassy-specific instructions</p></li><li><p>not saving SEVIS payment proof</p></li><li><p>showing up without financial documents</p></li><li><p>assuming one country&#8217;s process is the same everywhere</p></li><li><p>booking travel too early</p></li><li><p>not checking whether biometrics are separate</p></li><li><p>carrying incomplete or inconsistent documents<br></p></li></ul><h2>My practical checklist before booking a slot</h2><p>Before trying to book a visa slot, make sure you have:</p><ul><li><p>passport ready</p></li><li><p>Form I-20 ready</p></li><li><p>SEVIS fee paid</p></li><li><p>DS-160 submitted</p></li><li><p>DS-160 confirmation page saved</p></li><li><p>photo ready</p></li><li><p>embassy instructions open</p></li><li><p>visa profile created</p></li><li><p>financial documents organized</p></li><li><p>academic documents organized<br></p></li></ul><p>If these are ready, your process becomes much smoother.</p><h2>Final note</h2><p>The safest rule in the entire F1 visa process is this:</p><p>Always follow the official instructions for your specific U.S. embassy or consulate.</p>]]></content:encoded></item><item><title><![CDATA[20 SQL Interview Questions I’d Practice Before Any Data Interview]]></title><description><![CDATA[20 SQL Interview Questions]]></description><link>https://karthiktechdairy.substack.com/p/20-sql-interview-questions-id-practice</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/20-sql-interview-questions-id-practice</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Fri, 27 Mar 2026 15:03:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Whenever I look at SQL interview prep, I notice the same mistake again and again:</p><p>A lot of people keep collecting random questions, but they never really master the core patterns.</p><p>So in this post, I&#8217;m not giving fluff.<br>I&#8217;m sharing <strong>20 standard SQL questions</strong> that I believe cover a huge part of what usually gets asked in data analyst, business analyst, BI, and SQL-heavy interview rounds.</p><p>I wrote the answers in a very clear way, so this can work both as a practice guide and as a quick revision post before interviews.</p><p>Let&#8217;s get into it.</p><div><hr></div><h2>1) What is the difference between <code>INNER JOIN</code> and <code>LEFT JOIN</code>?</h2><h3>My answer:</h3><p>I think about it like this:</p><ul><li><p><code>INNER JOIN</code> only returns the matching rows from both tables.</p></li><li><p><code>LEFT JOIN</code> returns all rows from the left table, and only the matching rows from the right table. If there is no match, I get <code>NULL</code> values from the right side.</p></li></ul><h3>Example:</h3><p>If I have a <code>customers</code> table and an <code>orders</code> table:</p><ul><li><p>With <code>INNER JOIN</code>, I only see customers who placed orders.</p></li><li><p>With <code>LEFT JOIN</code>, I see all customers, even the ones who never placed an order.</p></li></ul><pre><code><code>SELECT c.customer_id, c.customer_name, o.order_id
FROM customers c
INNER JOIN orders o
  ON c.customer_id = o.customer_id;
</code></code></pre><pre><code><code>SELECT c.customer_id, c.customer_name, o.order_id
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id;
</code></code></pre><h3>Important interview point:</h3><p>A <code>LEFT JOIN</code> can accidentally behave like an <code>INNER JOIN</code> if I filter the right table inside the <code>WHERE</code> clause.</p><p>Bad example:</p><pre><code><code>SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id
WHERE o.order_id IS NOT NULL;
</code></code></pre><p>That removes the <code>NULL</code> rows, so now I&#8217;ve basically turned it into an inner join.</p><div><hr></div><h2>2) What is the difference between <code>WHERE</code> and <code>HAVING</code>?</h2><h3>My answer:</h3><p>I use:</p><ul><li><p><code>WHERE</code> to filter rows <strong>before</strong> grouping</p></li><li><p><code>HAVING</code> to filter groups <strong>after</strong> aggregation</p></li></ul><h3>Example:</h3><p>If I want customers who placed more than 2 orders:</p><pre><code><code>SELECT customer_id, COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) &gt; 2;
</code></code></pre><p>If I want to first look only at completed orders, I use <code>WHERE</code> before grouping:</p><pre><code><code>SELECT customer_id, COUNT(*) AS total_orders
FROM orders
WHERE order_status = 'Completed'
GROUP BY customer_id
HAVING COUNT(*) &gt; 2;
</code></code></pre><h3>Simple memory trick:</h3><ul><li><p><code>WHERE</code> filters raw rows</p></li><li><p><code>HAVING</code> filters aggregated results</p></li></ul><div><hr></div><h2>3) What is the difference between <code>COUNT(*)</code>, <code>COUNT(column)</code>, and <code>COUNT(DISTINCT column)</code>?</h2><h3>My answer:</h3><p>This is one of those questions that sounds simple, but interviewers love it because many people answer it loosely.</p><ul><li><p><code>COUNT(*)</code> counts all rows</p></li><li><p><code>COUNT(column)</code> counts only non-null values in that column</p></li><li><p><code>COUNT(DISTINCT column)</code> counts unique non-null values</p></li></ul><h3>Example:</h3><p>Suppose I have this data in <code>employees</code>:</p><p>iddepartment1Sales2Sales3NULL4HR</p><p>Then:</p><pre><code><code>SELECT COUNT(*) FROM employees;              -- 4
SELECT COUNT(department) FROM employees;     -- 3
SELECT COUNT(DISTINCT department) FROM employees; -- 2
</code></code></pre><h3>Interview tip:</h3><p>I always mention null handling here, because that&#8217;s usually what they want to test.</p><div><hr></div><h2>4) How do I find the 2nd highest salary?</h2><h3>My answer:</h3><p>There are multiple ways. The best answer depends on whether ties matter.</p><p>If I want the second <strong>distinct</strong> highest salary:</p><pre><code><code>SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary &lt; (
  SELECT MAX(salary)
  FROM employees
);
</code></code></pre><h3>Better scalable version using <code>DENSE_RANK()</code>:</h3><pre><code><code>WITH ranked_salaries AS (
  SELECT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT salary
FROM ranked_salaries
WHERE rnk = 2;
</code></code></pre><h3>Why I like this answer:</h3><p>It handles ties better and is easier to extend to 3rd, 4th, or nth highest salary.</p><div><hr></div><h2>5) What is the difference between <code>ROW_NUMBER()</code>, <code>RANK()</code>, and <code>DENSE_RANK()</code>?</h2><h3>My answer:</h3><p>All three are window functions used for ranking, but they behave differently with ties.</p><h3>Example data:</h3><p>namescoreA95B95C90</p><h3>Behavior:</h3><ul><li><p><code>ROW_NUMBER()</code> gives unique numbers no matter what<br>&#8594; 1, 2, 3</p></li><li><p><code>RANK()</code> gives the same rank to ties, but skips the next rank<br>&#8594; 1, 1, 3</p></li><li><p><code>DENSE_RANK()</code> gives the same rank to ties, but does not skip ranks<br>&#8594; 1, 1, 2</p></li></ul><h3>Query:</h3><pre><code><code>SELECT name,
       score,
       ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
       RANK() OVER (ORDER BY score DESC) AS rank_num,
       DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank_num
FROM scores;
</code></code></pre><h3>When I use each:</h3><ul><li><p><code>ROW_NUMBER()</code> when I need exactly one row per group</p></li><li><p><code>RANK()</code> when ranking competition-style positions</p></li><li><p><code>DENSE_RANK()</code> when I care about distinct ranking levels</p></li></ul><div><hr></div><h2>6) Write a query using <code>GROUP BY</code> and <code>HAVING</code></h2><h3>My answer:</h3><p>A very common example is finding customers with at least 2 orders.</p><pre><code><code>SELECT customer_id,
       COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) &gt;= 2;
</code></code></pre><p>If I want customers whose revenue is above 1000:</p><pre><code><code>SELECT customer_id,
       SUM(order_amount) AS total_revenue
FROM orders
GROUP BY customer_id
HAVING SUM(order_amount) &gt; 1000;
</code></code></pre><h3>What I keep in mind:</h3><p>Every non-aggregated column in the <code>SELECT</code> clause usually needs to appear in <code>GROUP BY</code>.</p><div><hr></div><h2>7) Why do joins create duplicate rows? How do I fix it?</h2><h3>My answer:</h3><p>Joins create duplicate rows when the relationship is not one-to-one.</p><p>For example:</p><ul><li><p>one customer can have many orders</p></li><li><p>one order can have many items</p></li></ul><p>So when I join tables, rows multiply.</p><h3>Example:</h3><p>If one customer has 3 orders, joining <code>customers</code> and <code>orders</code> gives 3 rows for that customer.</p><h3>How I fix it:</h3><p>I don&#8217;t start with <code>DISTINCT</code> blindly. I first check:</p><ol><li><p>What is the grain of each table?</p></li><li><p>Is the join one-to-many or many-to-many?</p></li><li><p>Am I missing a join condition?</p></li><li><p>Do I actually need aggregation before joining?</p></li></ol><h3>Common fixes:</h3><ul><li><p>aggregate first</p></li><li><p>use the correct join keys</p></li><li><p>deduplicate source rows</p></li><li><p>use <code>ROW_NUMBER()</code> to keep only the latest row if needed</p></li></ul><p>Example:</p><pre><code><code>WITH latest_orders AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
  FROM orders
)
SELECT customer_id, order_id, order_date
FROM latest_orders
WHERE rn = 1;
</code></code></pre><div><hr></div><h2>8) What is the difference between a CTE and a subquery?</h2><h3>My answer:</h3><p>Both help me break down complex queries.</p><ul><li><p>A <strong>subquery</strong> is written inside another query</p></li><li><p>A <strong>CTE</strong> (<code>WITH</code> clause) is like a temporary named result set that makes the logic easier to read</p></li></ul><h3>Subquery example:</h3><pre><code><code>SELECT employee_name, salary
FROM employees
WHERE salary &gt; (
  SELECT AVG(salary)
  FROM employees
);
</code></code></pre><h3>CTE version:</h3><pre><code><code>WITH avg_salary_cte AS (
  SELECT AVG(salary) AS avg_salary
  FROM employees
)
SELECT employee_name, salary
FROM employees, avg_salary_cte
WHERE employees.salary &gt; avg_salary_cte.avg_salary;
</code></code></pre><h3>When I use each:</h3><ul><li><p>I use subqueries for smaller, simpler logic</p></li><li><p>I use CTEs when the query gets longer and I want readability or multi-step logic</p></li></ul><div><hr></div><h2>9) How do I calculate a running total in SQL?</h2><h3>My answer:</h3><p>I use a window function with <code>SUM()</code>.</p><pre><code><code>SELECT order_date,
       sales_amount,
       SUM(sales_amount) OVER (
         ORDER BY order_date
       ) AS running_total
FROM sales;
</code></code></pre><h3>Why this works:</h3><p>The window function keeps adding values in sorted order.</p><h3>If I need a running total by customer:</h3><pre><code><code>SELECT customer_id,
       order_date,
       sales_amount,
       SUM(sales_amount) OVER (
         PARTITION BY customer_id
         ORDER BY order_date
       ) AS customer_running_total
FROM sales;
</code></code></pre><h3>Interview point:</h3><p>I like this answer because it shows I understand window functions, not just basic aggregation.</p><div><hr></div><h2>10) How do I get the latest record for each user?</h2><h3>My answer:</h3><p>This is one of the most common real interview questions.</p><p>I usually solve it with <code>ROW_NUMBER()</code>.</p><pre><code><code>WITH ranked_records AS (
  SELECT user_id,
         status,
         updated_at,
         ROW_NUMBER() OVER (
           PARTITION BY user_id
           ORDER BY updated_at DESC
         ) AS rn
  FROM user_status
)
SELECT user_id, status, updated_at
FROM ranked_records
WHERE rn = 1;
</code></code></pre><h3>Why I like this:</h3><p>It&#8217;s clear, scalable, and works well in most SQL dialects.</p><div><hr></div><h2>11) How do I write conditional aggregation?</h2><h3>My answer:</h3><p>I use <code>CASE WHEN</code> inside aggregation functions.</p><h3>Example:</h3><p>Suppose I want total orders, completed orders, and canceled orders by month.</p><pre><code><code>SELECT DATE_TRUNC('month', order_date) AS month,
       COUNT(*) AS total_orders,
       SUM(CASE WHEN order_status = 'Completed' THEN 1 ELSE 0 END) AS completed_orders,
       SUM(CASE WHEN order_status = 'Canceled' THEN 1 ELSE 0 END) AS canceled_orders
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
</code></code></pre><h3>Why this matters:</h3><p>A lot of dashboard-style metrics come from conditional aggregation.</p><div><hr></div><h2>12) How do I use <code>LAG()</code> and <code>LEAD()</code>?</h2><h3>My answer:</h3><p>I use them when I want to compare the current row with a previous or next row.</p><ul><li><p><code>LAG()</code> looks backward</p></li><li><p><code>LEAD()</code> looks forward</p></li></ul><h3>Example:</h3><p>Day-over-day sales change:</p><pre><code><code>SELECT order_date,
       sales_amount,
       LAG(sales_amount) OVER (ORDER BY order_date) AS previous_day_sales,
       sales_amount - LAG(sales_amount) OVER (ORDER BY order_date) AS sales_change
FROM daily_sales;
</code></code></pre><h3>Where this shows up:</h3><ul><li><p>month-over-month growth</p></li><li><p>previous login date</p></li><li><p>churn or reactivation analysis</p></li><li><p>comparing current vs prior state</p></li></ul><div><hr></div><h2>13) How do I find customers who placed orders on consecutive days?</h2><h3>My answer:</h3><p>There are a few ways, but the cleanest approach often uses window functions.</p><pre><code><code>WITH ordered_dates AS (
  SELECT customer_id,
         order_date,
         LAG(order_date) OVER (
           PARTITION BY customer_id
           ORDER BY order_date
         ) AS prev_order_date
  FROM orders
)
SELECT customer_id, order_date, prev_order_date
FROM ordered_dates
WHERE order_date = prev_order_date + INTERVAL '1 day';
</code></code></pre><h3>What this shows:</h3><p>I can compare one event with the previous event for the same customer.</p><div><hr></div><h2>14) How do I find duplicate records?</h2><h3>My answer:</h3><p>I first define what &#8220;duplicate&#8221; means from a business perspective.</p><p>For example, if duplicate orders mean same <code>customer_id</code>, <code>product_id</code>, and <code>order_date</code>, then:</p><pre><code><code>SELECT customer_id,
       product_id,
       order_date,
       COUNT(*) AS duplicate_count
FROM orders
GROUP BY customer_id, product_id, order_date
HAVING COUNT(*) &gt; 1;
</code></code></pre><h3>If I need to keep only one row:</h3><p>I use <code>ROW_NUMBER()</code>:</p><pre><code><code>WITH deduped AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY customer_id, product_id, order_date
           ORDER BY created_at DESC
         ) AS rn
  FROM orders
)
SELECT *
FROM deduped
WHERE rn = 1;
</code></code></pre><div><hr></div><h2>15) How do I return the top 3 highest-paid employees in each department?</h2><h3>My answer:</h3><p>This is a classic &#8220;top N per group&#8221; problem.</p><pre><code><code>WITH ranked_employees AS (
  SELECT employee_name,
         department,
         salary,
         DENSE_RANK() OVER (
           PARTITION BY department
           ORDER BY salary DESC
         ) AS rnk
  FROM employees
)
SELECT employee_name, department, salary
FROM ranked_employees
WHERE rnk &lt;= 3;
</code></code></pre><h3>Why I use <code>DENSE_RANK()</code>:</h3><p>It handles ties better than <code>ROW_NUMBER()</code> if I want all employees tied within the top 3 salary levels.</p><div><hr></div><h2>16) Why can date filtering go wrong with timestamps?</h2><h3>My answer:</h3><p>A lot of people use <code>BETWEEN</code> carelessly and miss rows.</p><p>For example, this can be risky:</p><pre><code><code>WHERE order_timestamp BETWEEN '2025-01-01' AND '2025-01-31'
</code></code></pre><p>Because timestamps after midnight on January 31 may get excluded depending on the database and formatting.</p><h3>Safer version:</h3><pre><code><code>WHERE order_timestamp &gt;= '2025-01-01'
  AND order_timestamp &lt; '2025-02-01'
</code></code></pre><h3>Why I prefer this:</h3><p>It makes the boundary condition much cleaner.</p><div><hr></div><h2>17) How do I handle <code>NULL</code> values in SQL?</h2><h3>My answer:</h3><p>I always remember that <code>NULL</code> means unknown or missing, and it behaves differently from regular values.</p><h3>Important points:</h3><ul><li><p><code>COUNT(column)</code> ignores nulls</p></li><li><p><code>COUNT(*)</code> does not</p></li><li><p><code>NULL = NULL</code> is not true</p></li><li><p>I need <code>IS NULL</code> or <code>IS NOT NULL</code></p></li></ul><h3>Example:</h3><pre><code><code>SELECT *
FROM employees
WHERE manager_id IS NULL;
</code></code></pre><h3>Using <code>COALESCE()</code>:</h3><p>If I want a fallback value:</p><pre><code><code>SELECT employee_name,
       COALESCE(bonus, 0) AS bonus_amount
FROM employees;
</code></code></pre><h3>Interview point:</h3><p>I make sure not to say null is equal to zero or blank. That&#8217;s a common mistake.</p><div><hr></div><h2>18) How do I find median salary by department?</h2><h3>My answer:</h3><p>This depends on SQL dialect.</p><p>If the database supports percentile functions:</p><pre><code><code>SELECT department,
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;
</code></code></pre><h3>If not:</h3><p>I explain the manual approach:</p><ul><li><p>sort salaries within each department</p></li><li><p>assign row numbers</p></li><li><p>pick the middle row for odd counts</p></li><li><p>average the two middle rows for even counts</p></li></ul><h3>What the interviewer usually wants:</h3><p>They want to see whether I understand the logic, even if the exact syntax changes by database.</p><div><hr></div><h2>19) How do I optimize a slow SQL query?</h2><h3>My answer:</h3><p>I usually talk through a structured process instead of guessing.</p><h3>My checklist:</h3><ol><li><p>Check the execution plan</p></li><li><p>See which table is scanning too many rows</p></li><li><p>Confirm indexes on join and filter columns</p></li><li><p>Filter early when possible</p></li><li><p>Avoid selecting unnecessary columns</p></li><li><p>Check for expensive joins or many-to-many joins</p></li><li><p>Aggregate before joining if it reduces data</p></li><li><p>Remove unnecessary nested logic or repeated calculations</p></li></ol><h3>Example answer in interviews:</h3><p>If a query is slow, I don&#8217;t jump straight to rewriting it. I first check where the cost is coming from, then I look at indexes, joins, filters, and row explosion.</p><p>That usually sounds much stronger than giving random optimization buzzwords.</p><div><hr></div><h2>20) &#8220;Sales dropped in one region.&#8221; How would I investigate it using SQL?</h2><h3>My answer:</h3><p>I would break the problem step by step instead of jumping to a conclusion.</p><h3>My approach:</h3><ol><li><p>Confirm the time period of the drop</p></li><li><p>Compare the affected region with other regions</p></li><li><p>Break sales into:</p><ul><li><p>number of orders</p></li><li><p>number of customers</p></li><li><p>average order value</p></li><li><p>conversion rate</p></li></ul></li><li><p>Drill down by:</p><ul><li><p>product category</p></li><li><p>channel</p></li><li><p>customer segment</p></li><li><p>city or store</p></li><li><p>date or week</p></li></ul></li><li><p>Check whether:</p><ul><li><p>traffic dropped</p></li><li><p>conversion dropped</p></li><li><p>pricing changed</p></li><li><p>cancellations increased</p></li><li><p>one major product underperformed</p></li></ul></li></ol><h3>Example SQL direction:</h3><pre><code><code>SELECT region,
       DATE_TRUNC('week', order_date) AS week,
       COUNT(DISTINCT order_id) AS total_orders,
       SUM(sales_amount) AS total_sales,
       AVG(sales_amount) AS avg_order_value
FROM sales
GROUP BY region, DATE_TRUNC('week', order_date)
ORDER BY week, region;
</code></code></pre><p>Then I would keep slicing until I find the main driver.</p><h3>Why this is a strong answer:</h3><p>Because business SQL interviews are often less about one fancy query and more about whether I can investigate in a structured way.</p><div><hr></div><h1>How I&#8217;d Actually Practice These</h1><p>If I were preparing seriously, I wouldn&#8217;t just read these once.</p><p>I&#8217;d do this:</p><ul><li><p>write each query from memory</p></li><li><p>explain each answer out loud in simple words</p></li><li><p>practice with small sample tables</p></li><li><p>then solve variations of the same pattern</p></li></ul><p>That&#8217;s what usually builds real interview confidence.</p><p>Not random memorization.<br>Pattern recognition.</p><div><hr></div><h1>Final Thought</h1><p>Whenever I look at SQL interviews, I keep coming back to one thing:</p><p>The questions may look different on the surface, but the underlying patterns are often the same.</p><p>That&#8217;s why I&#8217;d rather master these 20 really well than try to collect 200 random questions without depth.</p><p>If I can confidently handle joins, grouping, ranking, window functions, duplicates, nulls, and business investigation questions, I&#8217;m already in a much stronger position for most SQL-heavy interviews.</p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[Zero to AI Engineer in 90 Days]]></title><description><![CDATA[If I were starting from scratch today, this is the roadmap I&#8217;d follow]]></description><link>https://karthiktechdairy.substack.com/p/zero-to-ai-engineer-in-90-days</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/zero-to-ai-engineer-in-90-days</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Tue, 10 Mar 2026 23:05:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>If I were starting from scratch today, this is the roadmap I&#8217;d follow</h3><p>If I had to start from zero today and aim for an <strong>AI Engineer</strong> role, I would not begin with random AI tools or only prompt engineering.</p><p>I would build around the skills that keep showing up in real roles: <strong>Python, SQL, machine learning, data prep, evaluation, deployment, inference, and production systems</strong>. Current openings from Google, Amazon, and OpenAI still point strongly in that direction, with repeated emphasis on Python/SQL, ML modeling, evaluation, deployment, scalability, and getting systems into production.</p><p>So if I were creating a roadmap today, my focus would be simple:</p><p><strong>Don&#8217;t just learn AI. Learn how to build and ship it.</strong></p><p>One honest note: <strong>90 days is enough to become strong and portfolio-ready, not enough to master everything</strong>. The goal is to build momentum fast, create real projects, and become someone who can actually solve problems with AI.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Days 1&#8211;14: I would build the foundation first</h2><p>Before touching deep learning, LLMs, or agents, I&#8217;d make sure I can code properly and work with data.</p><p>In the first two weeks, I would focus on:</p><ul><li><p>Python</p></li><li><p>Git and GitHub</p></li><li><p>SQL</p></li><li><p>Pandas / NumPy basics</p></li><li><p>math intuition for ML, especially linear algebra and statistics</p></li></ul><p>For Python, I&#8217;d start with <strong>Kaggle Learn: Python</strong> if I wanted something short and interactive, and pair it with a <strong>freeCodeCamp Python full course on YouTube</strong> if I wanted a longer video-based walkthrough. Kaggle&#8217;s Learn portal also offers free tracks for Pandas and other data skills. (<a href="https://www.kaggle.com/learn?utm_source=chatgpt.com">Kaggle</a>)</p><p>For SQL, I&#8217;d use <strong>Kaggle Learn: Intro to SQL</strong> for a quick practical start, and then a <strong>freeCodeCamp SQL full course on YouTube</strong> for a longer video format. (<a href="https://www.kaggle.com/learn?utm_source=chatgpt.com">Kaggle</a>)</p><p>For math, I&#8217;d use <strong>Khan Academy&#8217;s free Linear Algebra</strong> and <strong>Statistics &amp; Probability</strong> tracks, because they&#8217;re fully free and beginner-friendly. (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p><p>By the end of these 14 days, I&#8217;d want to be comfortable:</p><ul><li><p>writing small Python programs on my own</p></li><li><p>reading and cleaning data</p></li><li><p>pushing code to GitHub</p></li><li><p>writing basic SQL queries</p></li><li><p>understanding concepts like vectors, matrices, averages, variance, and probability</p></li></ul><p>That foundation matters a lot more than people think.</p><div><hr></div><h2>Days 15&#8211;30: I would learn machine learning properly</h2><p>Once the basics are in place, I&#8217;d move into classical machine learning.</p><p>This is where I&#8217;d learn:</p><ul><li><p>supervised vs unsupervised learning</p></li><li><p>regression vs classification</p></li><li><p>train / validation / test split</p></li><li><p>overfitting and underfitting</p></li><li><p>feature engineering</p></li><li><p>model evaluation</p></li><li><p>error analysis</p></li><li><p>model comparison</p></li></ul><p>My main free resource here would be <strong>Google&#8217;s Machine Learning Crash Course</strong>, which Google describes as a fast-paced, practical introduction with videos, visualizations, and hands-on exercises. I&#8217;d pair that with <strong>Kaggle&#8217;s Intro to Machine Learning</strong> and <strong>Intermediate Machine Learning</strong> for practice. (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p><p>For conceptual clarity, I&#8217;d also use <strong>StatQuest on YouTube</strong>, especially for confusion matrix, bias-variance, metrics, cross-validation, and model intuition. (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p><p>This is also the stage where I would build my <strong>first ML project</strong>:</p><ul><li><p>spam detection</p></li><li><p>customer churn prediction</p></li><li><p>house price prediction</p></li><li><p>loan default prediction</p></li></ul><p>Not just the notebook. I&#8217;d also explain:</p><ul><li><p>what problem I solved</p></li><li><p>what data I used</p></li><li><p>which models I tried</p></li><li><p>what metric I chose</p></li><li><p>what mistakes the model still makes</p></li></ul><p>That habit is what starts turning learning into engineering.</p><div><hr></div><h2>Days 31&#8211;45: I would learn deep learning with PyTorch</h2><p>Once I understand classical ML, I&#8217;d move into deep learning.</p><p>Here I&#8217;d focus on:</p><ul><li><p>tensors</p></li><li><p>datasets and dataloaders</p></li><li><p>neural networks</p></li><li><p>loss functions</p></li><li><p>optimizers</p></li><li><p>training loops</p></li><li><p>transfer learning</p></li></ul><p>For this phase, I&#8217;d start with the <strong>PyTorch official beginner tutorials</strong>, especially <strong>Learn the Basics</strong>, because they walk through a complete ML workflow in PyTorch. I&#8217;d pair that with a <strong>freeCodeCamp PyTorch course on YouTube</strong> if I wanted a longer guided walkthrough. If I wanted an extra mini-course, <strong>Kaggle&#8217;s Intro to Deep Learning</strong> is also free. (<a href="https://docs.pytorch.org/tutorials/beginner/basics/intro.html?utm_source=chatgpt.com">PyTorch Docs</a>)</p><p>By the end of this phase, I&#8217;d build one deep learning project like:</p><ul><li><p>image classifier</p></li><li><p>sentiment classifier</p></li><li><p>document classifier</p></li><li><p>resume classifier</p></li></ul><p>The goal here wouldn&#8217;t be fancy research. It would be understanding how a model trains, how loss changes, how to detect overfitting, and how to evaluate results properly.</p><div><hr></div><h2>Days 46&#8211;60: I would learn LLMs, transformers, and RAG</h2><p>This is where the roadmap becomes modern AI engineering.</p><p>At this stage, I&#8217;d focus on:</p><ul><li><p>transformers</p></li><li><p>tokenization</p></li><li><p>embeddings</p></li><li><p>prompting</p></li><li><p>vector search</p></li><li><p>retrieval-augmented generation</p></li><li><p>hallucination analysis</p></li><li><p>evaluation for LLM apps</p></li></ul><p>My main free resource here would be the <strong>Hugging Face LLM Course</strong>, which is fully free and teaches large language models and NLP using the Hugging Face ecosystem. (<a href="https://huggingface.co/learn/llm-course/en/chapter1/1?utm_source=chatgpt.com">Hugging Face</a>)</p><p>For RAG, I&#8217;d use <strong>freeCodeCamp&#8217;s RAG &amp; MCP Fundamentals</strong> or <strong>RAG Fundamentals and Advanced Techniques</strong>, both free video-based resources aimed at helping learners understand document embeddings, vector databases, and building retrieval systems. (<a href="https://www.freecodecamp.org/news/learn-rag-and-mcp-fundamentals/?utm_source=chatgpt.com">FreeCodeCamp</a>)</p><p>If I wanted to understand how LLMs work under the hood, I&#8217;d also use <strong>freeCodeCamp&#8217;s &#8220;Code an LLM From Scratch&#8221;</strong> or a long-form coding workshop that walks through implementing LLM ideas directly. (<a href="https://www.freecodecamp.org/news/code-an-llm-from-scratch-theory-to-rlhf/?utm_source=chatgpt.com">FreeCodeCamp</a>)</p><p>At this stage, I would build my first real AI application:</p><ul><li><p>PDF Q&amp;A assistant</p></li><li><p>resume review assistant</p></li><li><p>interview prep assistant</p></li><li><p>company research assistant</p></li></ul><p>And I would test it seriously:</p><ul><li><p>when retrieval fails</p></li><li><p>when answers become ungrounded</p></li><li><p>when hallucinations appear</p></li><li><p>how I can improve relevance and latency</p></li></ul><p>That mindset is much closer to real AI engineering than simply saying, &#8220;I built a chatbot.&#8221;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Days 61&#8211;75: I would learn APIs, deployment, agents, and MLOps</h2><p>This is the part many beginners skip, but this is often where the &#8220;engineer&#8221; part actually begins.</p><p>At this stage, I&#8217;d focus on:</p><ul><li><p>FastAPI</p></li><li><p>Docker</p></li><li><p>model serving</p></li><li><p>experiment tracking</p></li><li><p>monitoring basics</p></li><li><p>agent workflows</p></li></ul><p>For APIs, I&#8217;d use the <strong>FastAPI official tutorial</strong>, which the project itself describes as the official and recommended way to learn FastAPI. For Docker, I&#8217;d use <strong>Docker Get Started</strong>. Both are free. (<a href="https://fastapi.tiangolo.com/learn/?utm_source=chatgpt.com">FastAPI</a>)</p><p>For MLOps basics, I&#8217;d use <strong>freeCodeCamp&#8217;s MLflow + Databricks MLOps course</strong> and supplement it with the <strong>MLflow getting-started docs</strong> if I needed reference material. (<a href="https://www.freecodecamp.org/news/learn-mlops-with-mlflow-and-databricks/?utm_source=chatgpt.com">FreeCodeCamp</a>)</p><p>For agents, I&#8217;d use the <strong>Hugging Face Agents Course</strong>, which explicitly states that it is a free course for understanding, using, and building AI agents. (<a href="https://huggingface.co/agents-course?utm_source=chatgpt.com">Hugging Face</a>)</p><p>For production thinking, I&#8217;d also use <strong>Full Stack Deep Learning</strong> and <strong>Made With ML</strong>, both of which provide free materials focused on building production-grade ML/AI systems. (<a href="https://fullstackdeeplearning.com/?utm_source=chatgpt.com">Full Stack Deep Learning</a>)</p><p>By the end of this phase, I would take one earlier project and turn it into something more serious:</p><ul><li><p>expose it through an API</p></li><li><p>Dockerize it</p></li><li><p>add experiment tracking</p></li><li><p>write a clean README</p></li><li><p>document limitations and failure cases</p></li></ul><p>That alone teaches a huge amount.</p><div><hr></div><h2>Days 76&#8211;90: I would build one flagship capstone</h2><p>Now I would stop collecting tutorials and start shipping something real.</p><p>This last phase would be about building one project that brings everything together:</p><ul><li><p>data ingestion</p></li><li><p>model or LLM logic</p></li><li><p>evaluation</p></li><li><p>API serving</p></li><li><p>deployment</p></li><li><p>documentation</p></li><li><p>debugging</p></li></ul><p>I&#8217;d choose one path based on interest.</p><h3>If I wanted to become an LLM / GenAI Engineer</h3><p>I&#8217;d go deeper into:</p><ul><li><p>RAG pipelines</p></li><li><p>agent systems</p></li><li><p>evals</p></li><li><p>retrieval optimization</p></li><li><p>latency and cost trade-offs</p></li></ul><p>My free stack would be:</p><ul><li><p><strong>Hugging Face LLM Course</strong></p></li><li><p><strong>Hugging Face Agents Course</strong></p></li><li><p><strong>freeCodeCamp RAG courses</strong></p></li><li><p><strong>Full Stack Deep Learning</strong> (<a href="https://huggingface.co/learn/llm-course/en/chapter1/1?utm_source=chatgpt.com">Hugging Face</a>)</p></li></ul><h3>If I wanted to become a Computer Vision AI Engineer</h3><p>I&#8217;d go deeper into:</p><ul><li><p>CNNs</p></li><li><p>transfer learning</p></li><li><p>image classification</p></li><li><p>object detection basics</p></li><li><p>vision embeddings</p></li></ul><p>My free stack would be:</p><ul><li><p><strong>PyTorch tutorials</strong></p></li><li><p><strong>Kaggle Intro to Deep Learning</strong></p></li><li><p>additional <strong>freeCodeCamp deep learning / PyTorch videos</strong> (<a href="https://docs.pytorch.org/tutorials/index.html?utm_source=chatgpt.com">PyTorch Docs</a>)</p></li></ul><h3>If I wanted to become an Applied ML / MLOps Engineer</h3><p>I&#8217;d go deeper into:</p><ul><li><p>training pipelines</p></li><li><p>experiment tracking</p></li><li><p>deployment</p></li><li><p>monitoring</p></li><li><p>ML system design</p></li></ul><p>My free stack would be:</p><ul><li><p><strong>Google ML Crash Course</strong></p></li><li><p><strong>MLflow resources</strong></p></li><li><p><strong>Full Stack Deep Learning</strong></p></li><li><p><strong>Made With ML</strong> (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p></li></ul><div><hr></div><h2>The three projects I&#8217;d want by Day 90</h2><p>If I really wanted to be job-ready, I would want these three finished:</p><ol><li><p><strong>One classical ML project</strong></p></li><li><p><strong>One deep learning project</strong></p></li><li><p><strong>One deployed AI application using LLMs, RAG, or agents</strong></p></li></ol><p>Why these three?</p><p>Because together they show:</p><ul><li><p>I understand ML fundamentals</p></li><li><p>I can work with deep learning tools</p></li><li><p>I can build and ship real AI systems</p></li></ul><p>And that lines up well with what current roles are asking for: scalable ML solutions, working with large datasets, evaluation, deployment, production readiness, and end-to-end execution.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>The free resources I&#8217;d personally keep bookmarked</h2><p>These are the ones I&#8217;d keep open throughout the journey:</p><ul><li><p><strong>Kaggle Learn: Python</strong> (<a href="https://www.kaggle.com/learn/python?utm_source=chatgpt.com">Kaggle</a>)</p></li><li><p><strong>freeCodeCamp Python full course on YouTube</strong> (<a href="https://www.youtube.com/c/Freecodecamp?utm_source=chatgpt.com">YouTube</a>)</p></li><li><p><strong>Kaggle Learn: Intro to SQL</strong> (<a href="https://www.kaggle.com/learn?utm_source=chatgpt.com">Kaggle</a>)</p></li><li><p><strong>freeCodeCamp SQL full course on YouTube</strong> (<a href="https://www.youtube.com/watch?v=3CkLRGdPSR0&amp;utm_source=chatgpt.com">YouTube</a>)</p></li><li><p><strong>Khan Academy: Linear Algebra / Statistics</strong> (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p></li><li><p><strong>Google Machine Learning Crash Course</strong> (<a href="https://developers.google.com/machine-learning/crash-course?utm_source=chatgpt.com">Google for Developers</a>)</p></li><li><p><strong>Kaggle: Intro to ML / Intermediate ML / Intro to Deep Learning</strong> (<a href="https://www.classcentral.com/provider/kaggle?utm_source=chatgpt.com">Class Central</a>)</p></li><li><p><strong>PyTorch Tutorials</strong> (<a href="https://docs.pytorch.org/tutorials/index.html?utm_source=chatgpt.com">PyTorch Docs</a>)</p></li><li><p><strong>Hugging Face LLM Course</strong> (<a href="https://huggingface.co/learn/llm-course/en/chapter1/1?utm_source=chatgpt.com">Hugging Face</a>)</p></li><li><p><strong>freeCodeCamp RAG courses</strong> (<a href="https://www.freecodecamp.org/news/learn-rag-and-mcp-fundamentals/?utm_source=chatgpt.com">FreeCodeCamp</a>)</p></li><li><p><strong>FastAPI official tutorial</strong> (<a href="https://fastapi.tiangolo.com/learn/?utm_source=chatgpt.com">FastAPI</a>)</p></li><li><p><strong>Docker Get Started</strong> (<a href="https://www.docker.com/get-started/?utm_source=chatgpt.com">Docker</a>)</p></li><li><p><strong>Hugging Face Agents Course</strong> (<a href="https://huggingface.co/learn/agents-course/en/unit0/introduction?utm_source=chatgpt.com">Hugging Face</a>)</p></li><li><p><strong>Full Stack Deep Learning</strong> (<a href="https://fullstackdeeplearning.com/?utm_source=chatgpt.com">Full Stack Deep Learning</a>)</p></li><li><p><strong>freeCodeCamp MLflow / MLOps course</strong> (<a href="https://www.freecodecamp.org/news/learn-mlops-with-mlflow-and-databricks/?utm_source=chatgpt.com">FreeCodeCamp</a>)</p></li></ul><p>If I were starting from zero today, I would not try to learn all of AI at once.</p><p>I would focus on <strong>coding, ML foundations, deep learning, LLM apps, deployment, and real projects</strong> - because that&#8217;s what actually moves me closer to becoming an AI Engineer.</p>]]></content:encoded></item><item><title><![CDATA[START Framework: The Structured Job Search Blueprint (2026 Edition)]]></title><description><![CDATA[When you commented START, you weren&#8217;t asking for motivation.]]></description><link>https://karthiktechdairy.substack.com/p/start-framework-the-structured-job</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/start-framework-the-structured-job</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Mon, 23 Feb 2026 15:38:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When you commented <strong>START</strong>, you weren&#8217;t asking for motivation.</p><p>You were asking for direction.</p><p>So here it is.</p><p>This is the exact framework I would follow if I had to restart my job search today with zero advantage.</p><p>No referrals.<br>No brand name company.<br>No shortcuts.</p><p>Just strategy.</p><div><hr></div><h1>S - Select One Role (60-Day Focus Rule)</h1><p>The biggest mistake I see?</p><p>People apply to 5 roles at once.</p><p>Data Analyst.<br>Data Engineer.<br>ML Engineer.<br>Business Analyst.<br>Product Analyst.</p><p>Each role has:</p><ul><li><p>Different resume keywords</p></li><li><p>Different project expectations</p></li><li><p>Different interview patterns</p></li></ul><p>Pick ONE role for 60 days.</p><p>Commit.</p><p>Clarity increases response rate more than volume ever will.</p><div><hr></div><h1>T - Track Market Signals (Reverse Engineer Demand)</h1><p>Open 20 recent job descriptions.</p><p>Create a simple sheet:</p><p>SkillFrequencyMandatory or Preferred</p><p>You&#8217;ll quickly notice:</p><p>For Data Analyst roles:</p><ul><li><p>SQL appears almost everywhere</p></li><li><p>Visualization tools matter</p></li><li><p>Stakeholder communication is often hidden but critical</p></li></ul><p>The market leaves clues.</p><p>Most people don&#8217;t collect them.</p><p>If you want to speed this up, tools like job aggregators (including FoxHunt AI) help filter active listings quickly instead of scrolling reposted jobs. But even manually, this step is non-negotiable.</p><div><hr></div><h1>A - Assemble 2 Targeted Projects</h1><p>Not 10 small projects.</p><p>Two strong, business-aligned ones.</p><p>Project 1: Revenue / Operations / Growth problem<br>Project 2: Automation or Efficiency problem</p><p>Each project should include:</p><ul><li><p>Clear problem statement</p></li><li><p>Data cleaning explanation</p></li><li><p>Metrics before vs after</p></li><li><p>Visual dashboard</p></li><li><p>Hosted demo link</p></li></ul><p>Recruiters don&#8217;t care about how many notebooks you have.</p><p>They care whether you can solve a business problem.</p><div><hr></div><h1>R - Refine Resume Around One Identity</h1><p>Your resume should answer one question:</p><p><strong>&#8220;What problem does this candidate solve?&#8221;</strong></p><p>Most resumes fail because they try to impress everyone.</p><p>Strong resumes position you clearly for one role.</p><p>Checklist:</p><ul><li><p>Remove unrelated skills</p></li><li><p>Add numbers (at least 60&#8211;70% quantified bullets)</p></li><li><p>Keep bullet length between 10&#8211;30 words</p></li><li><p>Use tools mentioned in job descriptions</p></li><li><p>Remove vague buzzwords</p></li><li><p>Avoid responsibility-based phrases like &#8220;Responsible for&#8221;</p></li></ul><p>Your resume should feel intentional, not crowded.</p><p>When I was actively applying, I used this structured resume framework for my own job hunt:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://foxhunt.ai/resume&quot;,&quot;text&quot;:&quot;Resume&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://foxhunt.ai/resume"><span>Resume</span></a></p><p></p><p>It&#8217;s built around the exact principles above:</p><ul><li><p>Role-specific alignment</p></li><li><p>Keyword optimization</p></li><li><p>Clean formatting</p></li><li><p>Strong quantified bullets</p></li></ul><p>But remember - tools only amplify clarity.<br>They don&#8217;t replace it.</p><p>Your story still matters more than any template.</p><div><hr></div><h1>T - Target Applications Strategically</h1><p>Instead of:<br>200 random applications</p><p>Do:<br>30 high-quality, early applications.</p><p>Apply within 24 hours of posting.</p><p>Why?</p><p>Because recruiters review in batches.<br>Late applications often get buried.</p><p>Speed + relevance &gt; mass applying.</p><div><hr></div><h1>Bonus Layer: Outreach That Doesn&#8217;t Annoy</h1><p>Bad message:<br>&#8220;Can you refer me?&#8221;</p><p>Better message:<br>&#8220;I noticed you&#8217;re a Data Analyst at X. I&#8217;m preparing for similar roles and curious what tools your team uses most daily.&#8221;</p><p>Curiosity creates conversation.<br>Conversation creates opportunity.</p><div><hr></div><h1>What We&#8217;re Building Here</h1><p>This Substack is not about motivation.</p><p>It&#8217;s about systems.</p><p>In the next issues, I&#8217;ll break down:</p><ul><li><p>Exact project templates by role</p></li><li><p>Resume bullet transformation examples</p></li><li><p>Outreach scripts that get responses</p></li><li><p>Interview question patterns by company type</p></li><li><p>How to track applications like a sales pipeline</p></li></ul><p>And occasionally, I&#8217;ll share tools and systems I&#8217;m building that make this process faster and more structured.</p><p>But the strategy always comes first.</p>]]></content:encoded></item><item><title><![CDATA[Capgemini Data Analyst (L1) – Complete Interview Questions & Answers]]></title><description><![CDATA[Last week, I spoke with someone who got selected at Capgemini as a Data Analyst (L1).]]></description><link>https://karthiktechdairy.substack.com/p/capgemini-data-analyst-l1-complete</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/capgemini-data-analyst-l1-complete</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Thu, 19 Feb 2026 17:09:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>He told me something interesting:</p><blockquote><p>&#8220;They didn&#8217;t test advanced stuff.<br>They tested whether I understand the basics clearly.&#8221;</p></blockquote><p>That&#8217;s the pattern.</p><p>If your fundamentals are strong and you can explain your thinking step by step, you&#8217;re already ahead.</p><p>Here are the <strong>exact questions + complete answers</strong> explained simply.</p><div><hr></div><h1>1. INNER JOIN vs LEFT JOIN</h1><h3>INNER JOIN</h3><p>Returns only matching records from both tables.</p><pre><code><code>SELECT e.name, d.department
FROM employees e
INNER JOIN departments d
ON e.dept_id = d.id;
</code></code></pre><p>Only employees who have matching department IDs will appear.</p><h3>LEFT JOIN</h3><p>Returns all records from the left table + matching records from right table.</p><pre><code><code>SELECT e.name, d.department
FROM employees e
LEFT JOIN departments d
ON e.dept_id = d.id;
</code></code></pre><p>Employees without a department will still appear (department = NULL).</p><p>&#128073; Interview Tip: Always explain with a real business example.</p><div><hr></div><h1>2. WHERE vs HAVING</h1><h3>WHERE</h3><p>Filters rows before grouping.</p><pre><code><code>SELECT * 
FROM sales
WHERE region = 'East';
</code></code></pre><h3>HAVING</h3><p>Filters after GROUP BY.</p><pre><code><code>SELECT region, SUM(revenue)
FROM sales
GROUP BY region
HAVING SUM(revenue) &gt; 10000;
</code></code></pre><p>&#128073; Rule:<br>WHERE &#8594; rows<br>HAVING &#8594; aggregated results</p><div><hr></div><h1>3. Find Duplicate Records</h1><pre><code><code>SELECT name, COUNT(*)
FROM customers
GROUP BY name
HAVING COUNT(*) &gt; 1;
</code></code></pre><p>This shows duplicated names.</p><div><hr></div><h1>4. Remove Duplicates (Keep One)</h1><p>Using ROW_NUMBER():</p><pre><code><code>DELETE FROM customers
WHERE id IN (
  SELECT id FROM (
    SELECT id,
           ROW_NUMBER() OVER(PARTITION BY name ORDER BY id) AS rn
    FROM customers
  ) t
  WHERE rn &gt; 1
);
</code></code></pre><p>Keep rn = 1, delete others.</p><div><hr></div><h1>5. Second Highest Salary</h1><p>Basic approach:</p><pre><code><code>SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
</code></code></pre><p>Using DENSE_RANK:</p><pre><code><code>SELECT salary
FROM (
  SELECT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) rnk
  FROM employees
) t
WHERE rnk = 2;
</code></code></pre><p>&#128073; DENSE_RANK handles ties properly.</p><div><hr></div><h1>6. COUNT(*) vs COUNT(column)</h1><ul><li><p>COUNT(*) &#8594; counts all rows</p></li><li><p>COUNT(column) &#8594; ignores NULL values</p></li></ul><p>Example:</p><p>If 5 rows exist and 2 salary values are NULL:</p><p>COUNT(*) = 5<br>COUNT(salary) = 3</p><div><hr></div><h1>7. GROUP BY Basics</h1><p>Used with aggregate functions.</p><pre><code><code>SELECT department, SUM(salary)
FROM employees
GROUP BY department;
</code></code></pre><p>Common mistake:<br>Selecting a column not included in GROUP BY without aggregation.</p><div><hr></div><h1>8. Primary Key vs Foreign Key</h1><p>Primary Key:</p><ul><li><p>Unique</p></li><li><p>Cannot be NULL</p></li><li><p>Identifies a row</p></li></ul><p>Foreign Key:</p><ul><li><p>Links to primary key in another table</p></li><li><p>Maintains relationship</p></li></ul><p>Business Example:<br>Customer table &#8594; Orders table</p><div><hr></div><h1>9. Normalization (1NF, 2NF, 3NF)</h1><p>1NF:</p><ul><li><p>No repeating groups</p></li><li><p>Atomic values</p></li></ul><p>2NF:</p><ul><li><p>Remove partial dependency</p></li></ul><p>3NF:</p><ul><li><p>Remove transitive dependency</p></li></ul><p>Purpose:</p><ul><li><p>Avoid redundancy</p></li><li><p>Improve data consistency</p></li></ul><div><hr></div><h1>10. Excel: VLOOKUP vs XLOOKUP</h1><p>VLOOKUP:</p><ul><li><p>Searches left to right only</p></li><li><p>Needs column number</p></li></ul><p>XLOOKUP:</p><ul><li><p>More flexible</p></li><li><p>Works both directions</p></li><li><p>Handles errors better</p></li></ul><p>Example:</p><pre><code><code>=XLOOKUP(A2, A:A, B:B)
</code></code></pre><div><hr></div><h1>11. When to Use Pivot Table?</h1><p>To:</p><ul><li><p>Summarize large data</p></li><li><p>Calculate totals by category</p></li><li><p>Create quick KPI reports</p></li></ul><p>Example:<br>Sales by Region &#8594; Drag Region to Rows, Revenue to Values.</p><div><hr></div><h1>12. Power BI: Measure vs Calculated Column</h1><p>Calculated Column:</p><ul><li><p>Computed row by row</p></li><li><p>Stored in model</p></li></ul><p>Measure:</p><ul><li><p>Calculated dynamically</p></li><li><p>Based on filter context</p></li></ul><p>Example:</p><p>Measure:</p><pre><code><code>Total Sales = SUM(Sales[Amount])
</code></code></pre><p>Use Measure for KPIs.</p><div><hr></div><h1>13. Handling Missing Values</h1><p>Options:</p><ul><li><p>Remove rows</p></li><li><p>Replace with mean/median</p></li><li><p>Replace using business rule</p></li><li><p>Keep NULL (if meaningful)</p></li></ul><p>Always explain WHY you choose a method.</p><div><hr></div><h1>14. Detecting Outliers</h1><p>Methods:</p><ol><li><p>IQR Method</p></li><li><p>Z-score</p></li><li><p>Visual inspection (boxplot)</p></li></ol><p>IQR formula:</p><p>Lower Bound = Q1 - 1.5 * IQR<br>Upper Bound = Q3 + 1.5 * IQR</p><div><hr></div><h1>15. Scenario: &#8220;Sales Dropped Last Month&#8221;</h1><p>Steps:</p><ol><li><p>Check if data is correct</p></li><li><p>Compare month-over-month trends</p></li><li><p>Break down by:</p><ul><li><p>Region</p></li><li><p>Product</p></li><li><p>Customer segment</p></li></ul></li><li><p>Check pricing or discount changes</p></li><li><p>Validate external factors</p></li></ol><p>&#128073; Interviewers test thinking process, not just tools.</p><div><hr></div><h1>Final Advice for Capgemini L1</h1><p>They look for:</p><ul><li><p>Strong SQL basics</p></li><li><p>Clear explanation</p></li><li><p>Logical thinking</p></li><li><p>Structured approach</p></li><li><p>Confidence in fundamentals</p></li></ul><p>Not advanced AI.<br>Not complex ML.</p><p>Just clarity.</p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[5 Free Data Certifications You Can Earn This Week (No Money Needed)]]></title><description><![CDATA[Most people delay certifications because they assume it costs money.]]></description><link>https://karthiktechdairy.substack.com/p/5-free-data-certifications-you-can</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/5-free-data-certifications-you-can</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Thu, 29 Jan 2026 04:37:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div><hr></div><p>Most people delay certifications because they assume it costs money.</p><p>Not true.</p><p>Here are <strong>5 legit, industry-recognized credentials</strong> you can complete <strong>for $0</strong> &#8212; and each one strengthens your profile for <strong>Data Analyst / Data Engineer / BI</strong> roles.</p><p>If you&#8217;re job hunting, pick <strong>2</strong> and finish them fast.<br>If you&#8217;re building a strong portfolio, do all <strong>5</strong> over the next month.</p><div><hr></div><h2>1) IBM SkillsBuild &#8212; Data Analytics (Free Digital Credentials)</h2><p>&#9989; Best for: <strong>Beginners &#8594; Intermediate</strong>, structured learning + shareable credential<br>&#127919; What you&#8217;ll learn: data analysis basics, data literacy, reporting mindset, real-world analytics workflows<br>&#128204; Why it helps: IBM credentials look strong on LinkedIn and help you show &#8220;I&#8217;m learning consistently.&#8221;</p><p>&#128279; Link: <a href="https://skillsbuild.org/students/digital-credentials">https://skillsbuild.org/students/digital-credentials</a></p><p><strong>Tip:</strong> Add this to LinkedIn as:<br><strong>Licenses &amp; Certifications &#8594; IBM SkillsBuild &#8594; Data Analytics (Credential)</strong></p><div><hr></div><h2>2) Snowflake &#8212; Hands-On Essentials Track (Free Badges)</h2><p>&#9989; Best for: <strong>Data Engineers / Analytics Engineers</strong>, modern warehouse skills<br>&#127919; What you&#8217;ll learn: Snowflake concepts, warehouses/databases, loading data, querying, basics of performance<br>&#128204; Why it helps: Snowflake is widely used in analytics teams &#8212; this shows you can work with modern stacks.</p><p>&#128279; Link: <a href="https://learn.snowflake.com/en/pages/hands-on-essentials-track/">https://learn.snowflake.com/en/pages/hands-on-essentials-track/</a></p><p><strong>Tip:</strong> Pair this with a small project:<br>&#8220;Load a CSV into Snowflake &#8594; run SQL queries &#8594; build a simple dashboard summary.&#8221;</p><div><hr></div><h2>3) HackerRank &#8212; SQL (Basic) Skills Certification Test</h2><p>&#9989; Best for: <strong>Interview prep</strong>, proof of SQL fundamentals<br>&#127919; What you&#8217;ll be tested on: SELECT, WHERE, joins basics, aggregations, grouping, simple subqueries<br>&#128204; Why it helps: Recruiters love quick proof. This is a clean &#8220;pass/fail credential&#8221; you can show fast.</p><p>&#128279; Link: <a href="https://www.hackerrank.com/skills-verification/sql_basic">https://www.hackerrank.com/skills-verification/sql_basic</a></p><p><strong>Tip:</strong> Do it after practicing 30&#8211;50 problems. Your pass badge becomes a strong signal.</p><div><hr></div><h2>4) Alteryx &#8212; Designer Core (Certification Exam Listing)</h2><p>&#9989; Best for: <strong>Analytics + ETL automation</strong>, drag-and-drop workflows<br>&#127919; What you&#8217;ll learn: data prep, joins, unions, transformations, workflow logic<br>&#128204; Why it helps: Many companies use Alteryx for BI automation. This stands out in analyst roles.</p><p>&#128279; Link: <a href="https://community.alteryx.com/t5/Certification-Exams/bd-p/product-certification">https://community.alteryx.com/t5/Certification-Exams/bd-p/product-certification</a></p><p><strong>Tip:</strong> If you&#8217;re targeting analyst roles, this can be a &#8220;differentiator&#8221; when others only list Excel.</p><div><hr></div><h2>5) MongoDB &#8212; Skill Badges (Free, Shareable)</h2><p>&#9989; Best for: <strong>Data + Backend + NoSQL</strong>, modern document databases<br>&#127919; What you&#8217;ll learn: querying with MongoDB, filtering, aggregations, schema design basics<br>&#128204; Why it helps: A lot of startups (and even big companies) use MongoDB. Knowing it makes you versatile.</p><p>&#128279; Link: <a href="https://learn.mongodb.com/skills/">https://learn.mongodb.com/skills/</a></p><p><strong>Tip:</strong> Add a simple project to GitHub:<br>&#8220;Store job postings &#8594; query by location/skills &#8594; build a basic analytics summary.&#8221;</p><div><hr></div><h1>Quick Plan (So You Actually Finish)</h1><p>If your goal is <strong>Data Analyst</strong>:</p><ul><li><p>Start with <strong>HackerRank SQL (Basic)</strong></p></li><li><p>Then do <strong>IBM SkillsBuild</strong></p></li><li><p>Add <strong>Snowflake</strong> as a bonus if you want modern tools</p></li></ul><p>If your goal is <strong>Data Engineer</strong>:</p><ul><li><p><strong>Snowflake &#8594; MongoDB &#8594; HackerRank</strong></p></li><li><p>Then Alteryx if your target jobs mention it</p></li></ul><div><hr></div><p><a href="https://www.linkedin.com/in/akvkumar/">LinkedIn</a><br><a href="https://www.instagram.com/karthik.techdairy/">Instagram</a></p>]]></content:encoded></item><item><title><![CDATA[Accenture Data Analyst Interview: 15 Questions + Answers (With Short Explanations)]]></title><description><![CDATA[I spoke with a person who got selected at Accenture as a Data Analyst, and these were the questions asked in his interview.]]></description><link>https://karthiktechdairy.substack.com/p/accenture-data-analyst-interview</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/accenture-data-analyst-interview</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Wed, 28 Jan 2026 15:19:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div><hr></div><p>If you&#8217;re preparing, bookmark this and practice the same set.</p><div><hr></div><h2>1) INNER JOIN vs LEFT JOIN (real scenario)</h2><h3>&#9989; Answer</h3><ul><li><p><strong>INNER JOIN</strong> returns only matching rows from both tables.</p></li><li><p><strong>LEFT JOIN</strong> returns all rows from the left table + matches from the right table (unmatched becomes <code>NULL</code>).</p></li></ul><h3>Example (Customers + Orders)</h3><pre><code><code>-- Only customers who placed orders
SELECT c.customer_id, o.order_id
FROM customers c
INNER JOIN orders o
  ON c.customer_id = o.customer_id;

-- All customers (even if they placed no orders)
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id;
</code></code></pre><p><strong>When to use what</strong></p><ul><li><p>Use <strong>INNER JOIN</strong> when you only want &#8220;existing relationships.&#8221;</p></li><li><p>Use <strong>LEFT JOIN</strong> when you want a full list from left table (like all customers, all products, all employees).</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/subscribe?"><span>Subscribe now</span></a></p></li></ul><div><hr></div><h2>2) WHERE vs HAVING (with example)</h2><h3>&#9989; Answer</h3><ul><li><p><strong>WHERE</strong> filters rows <em>before aggregation</em>.</p></li><li><p><strong>HAVING</strong> filters results <em>after aggregation</em>.</p></li></ul><pre><code><code>-- Filter rows before grouping (only completed orders)
SELECT customer_id, COUNT(*) AS total_orders
FROM orders
WHERE status = 'Completed'
GROUP BY customer_id
HAVING COUNT(*) &gt;= 2;
</code></code></pre><p><strong>Rule of thumb</strong></p><ul><li><p>Use <strong>WHERE</strong> for columns.</p></li><li><p>Use <strong>HAVING</strong> for aggregates like <code>COUNT</code>, <code>SUM</code>, <code>AVG</code>.</p></li></ul><div><hr></div><h2>3) SQL: 2nd Highest Salary (handle ties)</h2><h3>&#9989; Solution (best way: DENSE_RANK)</h3><pre><code><code>SELECT salary
FROM (
  SELECT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) s
WHERE rnk = 2;
</code></code></pre><h3>Short explanation</h3><ul><li><p><code>DENSE_RANK()</code> assigns same rank to ties.</p></li><li><p>The second highest distinct salary always has rank <code>2</code>.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/p/accenture-data-analyst-interview?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/p/accenture-data-analyst-interview?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></li></ul><div><hr></div><h2>4) ROW_NUMBER vs RANK vs DENSE_RANK</h2><h3>&#9989; Answer</h3><ul><li><p><strong>ROW_NUMBER()</strong> gives unique number (no ties).</p></li><li><p><strong>RANK()</strong> gives same rank for ties but leaves gaps (1,1,3).</p></li><li><p><strong>DENSE_RANK()</strong> gives same rank for ties without gaps (1,1,2).</p></li></ul><pre><code><code>SELECT employee_id, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
       RANK() OVER (ORDER BY salary DESC) AS rnk,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;
</code></code></pre><div><hr></div><h2>5) GROUP BY + HAVING (customers with &#8805;2 orders)</h2><pre><code><code>SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) &gt;= 2;
</code></code></pre><h3>If revenue threshold is needed:</h3><pre><code><code>SELECT customer_id, SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id
HAVING SUM(amount) &gt; 1000;
</code></code></pre><p><strong>Explanation:</strong> <code>HAVING</code> is used because we&#8217;re filtering aggregated values.</p><div><hr></div><h2>6) Joins create duplicate rows: Why? How to fix?</h2><h3>&#9989; Why duplicates happen</h3><p>When the relationship isn&#8217;t 1-to-1, joins can multiply rows.</p><p>Example:</p><ul><li><p>One customer has <strong>3 orders</strong></p></li><li><p>You join customers + orders &#8594; customer row appears <strong>3 times</strong></p></li></ul><h3>&#9989; Fix options</h3><p><strong>A) Use DISTINCT (quick fix, not always correct)</strong></p><pre><code><code>SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
</code></code></pre><p><strong>B) Aggregate before joining (best practice)</strong></p><pre><code><code>SELECT c.customer_id, o.total_orders
FROM customers c
LEFT JOIN (
  SELECT customer_id, COUNT(*) AS total_orders
  FROM orders
  GROUP BY customer_id
) o
ON c.customer_id = o.customer_id;
</code></code></pre><div><hr></div><h2>7) COUNT(*) vs COUNT(column)</h2><h3>&#9989; Answer</h3><ul><li><p><code>COUNT(*)</code> counts all rows (including NULLs).</p></li><li><p><code>COUNT(column)</code> counts only non-NULL values in that column.</p></li></ul><pre><code><code>SELECT COUNT(*) AS total_rows,
       COUNT(email) AS non_null_emails
FROM users;
</code></code></pre><div><hr></div><h2>8) CTE vs Subquery (when to use each)</h2><h3>&#9989; Answer</h3><p>Both do the same job. CTE improves readability and reuse.</p><p><strong>Subquery</strong></p><pre><code><code>SELECT *
FROM (
  SELECT customer_id, SUM(amount) AS total_spend
  FROM orders
  GROUP BY customer_id
) t
WHERE total_spend &gt; 1000;
</code></code></pre><p><strong>CTE (Cleaner)</strong></p><pre><code><code>WITH spend AS (
  SELECT customer_id, SUM(amount) AS total_spend
  FROM orders
  GROUP BY customer_id
)
SELECT *
FROM spend
WHERE total_spend &gt; 1000;
</code></code></pre><p><strong>Use CTE when</strong></p><ul><li><p>logic is long</p></li><li><p>multiple steps are needed</p></li><li><p>you want clean debugging</p></li></ul><div><hr></div><h2>9) Excel: VLOOKUP vs XLOOKUP vs INDEX-MATCH</h2><h3>&#9989; Answer</h3><ul><li><p><strong>VLOOKUP</strong>: older, left-to-right only, breaks if columns move.</p></li><li><p><strong>XLOOKUP</strong>: modern, flexible, supports left lookup, easy.</p></li><li><p><strong>INDEX-MATCH</strong>: powerful and stable, works everywhere.</p></li></ul><p><strong>XLOOKUP example</strong></p><pre><code><code>=XLOOKUP(A2, Customers!A:A, Customers!C:C, "Not Found")
</code></code></pre><div><hr></div><h2>10) Excel: Pivot Table for KPIs</h2><h3>&#9989; Steps (short and practical)</h3><ol><li><p>Insert &#8594; Pivot Table</p></li><li><p>Put <strong>Category/Region</strong> in Rows</p></li><li><p>Put <strong>Sales/Revenue</strong> in Values (SUM)</p></li><li><p>Add <strong>Month</strong> in Columns for trend</p></li><li><p>Add <strong>Filters</strong> like Product, Channel</p></li></ol><p><strong>Why it&#8217;s asked:</strong> Shows you can summarize data fast for business.</p><div><hr></div><h2>11) Handling Missing Values + Outliers (IQR/Z-score/business)</h2><h3>&#9989; Missing values strategies</h3><ul><li><p>Remove if small % and random</p></li><li><p>Impute:</p><ul><li><p>numeric: mean/median</p></li><li><p>categorical: mode</p></li></ul></li><li><p>Or fill with &#8220;Unknown&#8221; for business clarity</p></li></ul><h3>&#9989; Outlier strategies</h3><ul><li><p>Confirm if it&#8217;s data error or true extreme</p></li><li><p>Handle using:</p><ul><li><p><strong>IQR method</strong></p></li><li><p><strong>Z-score</strong></p></li><li><p>capping/winsorizing</p></li><li><p>log transform</p></li></ul></li></ul><p><strong>Best answer tip:</strong> Always mention &#8220;business context decides.&#8221;</p><div><hr></div><h2>12) Power BI: Measures vs Calculated Columns</h2><h3>&#9989; Answer</h3><ul><li><p><strong>Calculated Column</strong>: computed at refresh time, stored in model.</p></li><li><p><strong>Measure</strong>: computed at query time, depends on filter context.</p></li></ul><p><strong>Example</strong></p><ul><li><p>Column: <code>Profit = Sales - Cost</code> (stored per row)</p></li><li><p>Measure: <code>Total Profit = SUM(Sales) - SUM(Cost)</code> (changes with slicers)</p></li></ul><p><strong>In interviews:</strong> prefer measures for aggregations and dashboards.</p><div><hr></div><h2>13) DAX: What does CALCULATE() do?</h2><h3>&#9989; Answer</h3><p><code>CALCULATE()</code> changes the filter context and then evaluates an expression.</p><p><strong>Example: Sales for only &#8220;Online&#8221; channel</strong></p><pre><code><code>Online Sales =
CALCULATE(
  SUM(Sales[Amount]),
  Sales[Channel] = "Online"
)
</code></code></pre><p><strong>Short explanation</strong></p><ul><li><p>It&#8217;s the most powerful DAX function</p></li><li><p>Used to apply filters dynamically</p></li></ul><div><hr></div><h2>14) Make Power BI reports faster (huge data)</h2><h3>&#9989; Best optimization checklist</h3><ul><li><p>Reduce columns (remove unused fields)</p></li><li><p>Use <strong>Star Schema</strong> (fact + dimensions)</p></li><li><p>Avoid high-cardinality columns in visuals</p></li><li><p>Prefer measures over calculated columns</p></li><li><p>Use Aggregations and Incremental Refresh (if available)</p></li><li><p>Reduce visuals per page, avoid heavy custom visuals</p></li><li><p>Optimize DAX (avoid iterators when possible)</p></li></ul><p><strong>Interview punchline:</strong> Model first, DAX next, visuals last.</p><div><hr></div><h2>15) Case: &#8220;Sales dropped in one region&#8221; &#8212; How would you investigate?</h2><h3>&#9989; Strong structured approach (interview-ready)</h3><ol><li><p><strong>Confirm the drop</strong></p></li></ol><ul><li><p>Compare MoM, WoW, YoY</p></li><li><p>Check if it&#8217;s seasonality</p></li></ul><ol start="2"><li><p><strong>Break down</strong></p></li></ol><ul><li><p>Product, channel, customer segment</p></li><li><p>New vs returning customers</p></li></ul><ol start="3"><li><p><strong>Check operations</strong></p></li></ol><ul><li><p>Stockouts, delayed deliveries, pricing changes</p></li><li><p>Returns/refunds increased?</p></li></ul><ol start="4"><li><p><strong>Check marketing</strong></p></li></ol><ul><li><p>Campaign paused? CPC increased? traffic dropped?</p></li></ul><ol start="5"><li><p><strong>Check data issues</strong></p></li></ol><ul><li><p>ETL failure, missing transactions, time zone cutoffs</p></li></ul><ol start="6"><li><p><strong>Share recommendation</strong></p></li></ol><ul><li><p>Root cause + expected impact + next steps</p></li></ul><p><strong>Accenture loves:</strong> structured thinking + business storytelling.</p><p>#Accenture #DataAnalyst #SQL #PowerBI #Excel #DAX #InterviewPrep #DataAnalytics #BusinessAnalytics #JobSearch #Freshers #Analytics #CareerGrowth</p>]]></content:encoded></item><item><title><![CDATA[5 Real Data Science Projects You Can Copy From GitHub (and a Complete Roadmap to Become Job-Ready)]]></title><description><![CDATA[Most people say &#8220;I know scikit-learn&#8221; or &#8220;I know MLflow.&#8221;]]></description><link>https://karthiktechdairy.substack.com/p/5-real-data-science-projects-you</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/5-real-data-science-projects-you</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Sun, 25 Jan 2026 04:33:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people say &#8220;I know scikit-learn&#8221; or &#8220;I know MLflow.&#8221;</p><p>Hiring managers don&#8217;t hire that.</p><p>They hire this:<br>A <strong>working system</strong> that goes from <strong>data &#8594; model &#8594; evaluation &#8594; deployment &#8594; UI/insights</strong>.</p><p>Below are <strong>5 real, end-to-end GitHub projects</strong> you can study, replicate, and then rebuild with your own twist. After that, I&#8217;m dropping a <strong>complete Data Scientist roadmap</strong> you can follow step-by-step.</p><div><hr></div><h2>Part 1: Top 5 Data Science Projects (End-to-End)</h2><h3>1) Customer Churn Prediction Pipeline (Production-style)</h3><p><strong>Repo:</strong> AWS Customer Churn Pipeline (<a href="https://github.com/awslabs/aws-customer-churn-pipeline?utm_source=chatgpt.com">GitHub</a>)<br><strong>Why it&#8217;s strong:</strong> It&#8217;s not just a notebook. It&#8217;s built like a real system with training + inference pipelines, validation, tuning, and explainability baked in. (<a href="https://github.com/awslabs/aws-customer-churn-pipeline?utm_source=chatgpt.com">GitHub</a>)</p><p><strong>What you learn</strong></p><ul><li><p>End-to-end ML pipeline thinking</p></li><li><p>Validation + feature processing at scale</p></li><li><p>Explainability inside production workflows</p></li></ul><p><strong>How to make it &#8220;yours&#8221;</strong></p><ul><li><p>Replace dataset with any SaaS churn dataset (or telecom churn)</p></li><li><p>Add a &#8220;top churn drivers&#8221; report for business users</p></li></ul><p><strong>Resume bullet example</strong></p><ul><li><p>Built an end-to-end churn prediction pipeline with automated training, validation, hyperparameter tuning, and explainability.</p></li></ul><div><hr></div><h3>2) Insurance Cross-Sell (AutoML + MLflow + FastAPI + Streamlit)</h3><p><strong>Repo:</strong> End-to-End AutoML Insurance (<a href="https://github.com/kennethleungty/End-to-End-AutoML-Insurance?utm_source=chatgpt.com">GitHub</a>)<br><strong>Why it&#8217;s strong:</strong> This is the perfect &#8220;hireable&#8221; format: model tracking (MLflow), API (FastAPI), and an interface (Streamlit). (<a href="https://github.com/kennethleungty/End-to-End-AutoML-Insurance?utm_source=chatgpt.com">GitHub</a>)</p><p><strong>What you learn</strong></p><ul><li><p>Experiment tracking and model management</p></li><li><p>Serving predictions through an API</p></li><li><p>Building a simple app stakeholders can use</p></li></ul><p><strong>How to upgrade</strong></p><ul><li><p>Add model monitoring (drift + data checks)</p></li><li><p>Add a &#8220;confidence score&#8221; and threshold slider</p></li></ul><p><strong>Resume bullet example</strong></p><ul><li><p>Deployed an AutoML classification system using MLflow tracking, FastAPI inference service, and Streamlit UI for business users.</p></li></ul><div><hr></div><h3>3) Credit Card Fraud Detection (FastAPI + Streamlit, batch scoring)</h3><p><strong>Repo:</strong> Fraud Detection System (<a href="https://github.com/muhammadparkar/fraud-detection?utm_source=chatgpt.com">GitHub</a>)<br><strong>Why it&#8217;s strong:</strong> Fraud is a real-world DS problem: imbalance, precision/recall tradeoffs, and operational workflows. This project is built like a deployable app. (<a href="https://github.com/muhammadparkar/fraud-detection?utm_source=chatgpt.com">GitHub</a>)</p><p><strong>What you learn</strong></p><ul><li><p>Handling imbalanced datasets (SMOTE, thresholds)</p></li><li><p>Building batch scoring pipelines</p></li><li><p>Delivering downloadable analysis reports</p></li></ul><p><strong>How to make it stand out</strong></p><ul><li><p>Add cost-based evaluation (false positive vs false negative cost)</p></li><li><p>Add a &#8220;review queue&#8221; dashboard for flagged transactions</p></li></ul><p><strong>Resume bullet example</strong></p><ul><li><p>Built and deployed a fraud detection system with batch scoring, dynamic column mapping, and reporting via FastAPI + Streamlit.</p></li></ul><div><hr></div><h3>4) House Price Prediction with ZenML + MLflow (Real MLOps flavor)</h3><p><strong>Repo:</strong> ZenML + MLflow House Price Pipeline (<a href="https://github.com/vn33/MLOps_House-Price-Prediction-using-ZenML-and-MLflow?utm_source=chatgpt.com">GitHub</a>)<br><strong>Why it&#8217;s strong:</strong> Shows reproducibility, pipelines, and CI/CD mindset, which is rare in typical DS portfolios. (<a href="https://github.com/vn33/MLOps_House-Price-Prediction-using-ZenML-and-MLflow?utm_source=chatgpt.com">GitHub</a>)</p><p><strong>What you learn</strong></p><ul><li><p>Pipeline orchestration for DS work</p></li><li><p>Experiment tracking + deployment flow</p></li><li><p>Production-grade project structure</p></li></ul><p><strong>Upgrade idea</strong></p><ul><li><p>Add feature store style transformations</p></li><li><p>Add automated retraining when data drifts</p></li></ul><p><strong>Resume bullet example</strong></p><ul><li><p>Implemented an end-to-end regression pipeline using ZenML for reproducible workflows and MLflow for tracking and deployment.</p></li></ul><div><hr></div><h3>5) Recommender System as an App (FastAPI + Streamlit)</h3><p><strong>Repo:</strong> FastAPI Movie Recommender (<a href="https://github.com/gurezende/FastAPI_Movie_Recommender?utm_source=chatgpt.com">GitHub</a>)<br><strong>Why it&#8217;s strong:</strong> Recommendations are common in DS interviews, and this includes an API + UI and even click tracking. (<a href="https://github.com/gurezende/FastAPI_Movie_Recommender?utm_source=chatgpt.com">GitHub</a>)</p><p><strong>What you learn</strong></p><ul><li><p>Ranking/recommendation logic</p></li><li><p>Product analytics mindset (tracking interactions)</p></li><li><p>Packaging DS into an interactive system</p></li></ul><p><strong>Upgrade ideas</strong></p><ul><li><p>Add evaluation metrics (MAP@K, NDCG@K)</p></li><li><p>Add hybrid recommendations (content + collaborative)</p></li></ul><p><strong>Resume bullet example</strong></p><ul><li><p>Built a recommendation system with FastAPI endpoints, Streamlit UI, and interaction tracking to measure engagement.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/p/5-real-data-science-projects-you?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://karthiktechdairy.substack.com/p/5-real-data-science-projects-you?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p></li></ul><div><hr></div><h2>Part 2: Complete Data Scientist Roadmap (0 &#8594; Job-Ready)</h2><h3>Phase 0: Setup (1&#8211;2 days)</h3><ul><li><p>Python environment, Git/GitHub, Jupyter/VS Code</p></li><li><p>Basic Linux commands</p></li><li><p>Readme writing habit (every project gets a clean README)</p></li></ul><div><hr></div><h3>Phase 1: Foundations (2&#8211;3 weeks)</h3><p><strong>Python</strong></p><ul><li><p>Data types, functions, OOP basics</p></li><li><p>Writing clean code, modular scripts</p></li></ul><p><strong>Math + Stats essentials</strong></p><ul><li><p>Probability, distributions, Bayes basics</p></li><li><p>Mean/variance, sampling, CLT intuition</p></li><li><p>Confidence intervals, hypothesis testing basics</p></li></ul><p><strong>Checkpoint</strong></p><ul><li><p>Solve 30&#8211;50 small problems (python + probability + statistics)</p></li></ul><div><hr></div><h3>Phase 2: Data Skills (2&#8211;3 weeks)</h3><p><strong>SQL (non-negotiable)</strong></p><ul><li><p>Joins, window functions, CTEs</p></li><li><p>Aggregations, cohort queries</p></li></ul><p><strong>Data wrangling</strong></p><ul><li><p>pandas: joins, groupby, datetime, missing values</p></li><li><p>Data cleaning strategies and assumptions tracking</p></li></ul><p><strong>Visualization</strong></p><ul><li><p>matplotlib/plotly basics</p></li><li><p>telling a story with charts (not just plotting)</p></li></ul><p><strong>Checkpoint</strong></p><ul><li><p>Build a mini analytics report: raw CSV &#8594; cleaned dataset &#8594; SQL insights &#8594; dashboard chart pack</p></li></ul><div><hr></div><h3>Phase 3: Core Machine Learning (4&#8211;6 weeks)</h3><p><strong>Supervised learning</strong></p><ul><li><p>Regression: linear, regularization, tree models</p></li><li><p>Classification: logistic regression, trees, boosting</p></li><li><p>Metrics: precision/recall, ROC-AUC, PR-AUC, F1</p></li></ul><p><strong>Workflow</strong></p><ul><li><p>Train/val/test split</p></li><li><p>Cross-validation</p></li><li><p>Feature engineering</p></li><li><p>Leakage detection</p></li><li><p>Hyperparameter tuning (grid/random)</p></li></ul><p><strong>Explainability</strong></p><ul><li><p>Feature importance, SHAP basics</p></li><li><p>Error analysis: where the model fails and why</p></li></ul><p><strong>Checkpoint</strong></p><ul><li><p>One full ML project: EDA &#8594; model &#8594; evaluation &#8594; explainability &#8594; final business recommendations</p></li></ul><div><hr></div><h3>Phase 4: Specializations (pick 2, 3&#8211;6 weeks)</h3><p>Pick based on your target roles.</p><p><strong>Option A: NLP</strong></p><ul><li><p>TF-IDF &#8594; transformers</p></li><li><p>Text classification, embeddings, retrieval basics</p></li></ul><p><strong>Option B: Time Series</strong></p><ul><li><p>Baselines, backtesting, forecasting errors</p></li><li><p>Seasonality, trend, regressors</p></li></ul><p><strong>Option C: Recommenders</strong></p><ul><li><p>Collaborative filtering</p></li><li><p>Ranking metrics</p></li><li><p>Cold start strategies</p></li></ul><p><strong>Option D: Causal + Experimentation</strong></p><ul><li><p>A/B testing design</p></li><li><p>power and sample sizing (basic)</p></li><li><p>interpreting results for product decisions</p></li></ul><div><hr></div><h3>Phase 5: Production and &#8220;Hireable&#8221; DS (3&#8211;6 weeks)</h3><p>This is where you differentiate.</p><ul><li><p>Build APIs (FastAPI)</p></li><li><p>Make a small UI (Streamlit)</p></li><li><p>Track experiments (MLflow)</p></li><li><p>Add data checks (basic validation)</p></li><li><p>Containerize (Docker)</p></li><li><p>Optional: deploy (Cloud Run / AWS / Render)</p></li></ul><p><strong>Checkpoint</strong></p><ul><li><p>2 deployable projects (with a live demo or clear run instructions)</p></li></ul><div><hr></div><h3>Phase 6: Portfolio + Interview Readiness (ongoing)</h3><p><strong>Portfolio</strong></p><ul><li><p>3 strong projects max (quality &gt; quantity)</p></li><li><p>Each project must show:</p><ul><li><p>Problem framing</p></li><li><p>Metrics and why they matter</p></li><li><p>Error analysis</p></li><li><p>Business impact</p></li></ul></li></ul><p><strong>Interview prep</strong></p><ul><li><p>SQL daily practice</p></li><li><p>ML concepts: bias/variance, leakage, metrics, regularization</p></li><li><p>Case studies: churn, fraud, forecasting, recommendations</p></li></ul><div><hr></div><h2>The &#8220;Winning Portfolio&#8221; Strategy (Simple)</h2><p>If you do only this, you&#8217;ll be in a strong spot:</p><ul><li><p><strong>Project 1:</strong> Churn (classification + explainability + business insights)</p></li><li><p><strong>Project 2:</strong> Fraud (imbalance + thresholds + operational workflow)</p></li><li><p><strong>Project 3:</strong> Recommender (ranking + evaluation + product analytics tracking)</p></li></ul><p>All 3 should have: README, screenshots, clear setup, and a short &#8220;decision summary.&#8221;</p>]]></content:encoded></item><item><title><![CDATA[TCS Data Analyst Interview Questions (With Solutions + Short Explanations)]]></title><description><![CDATA[A friend recently cracked TCS Data Analyst and shared the exact questions asked in the interview. I&#8217;m posting the solutions + quick explanations here so you can practice smart and revise fast.]]></description><link>https://karthiktechdairy.substack.com/p/tcs-data-analyst-interview-questions</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/tcs-data-analyst-interview-questions</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Fri, 23 Jan 2026 15:58:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1) INNER JOIN vs LEFT JOIN (SQL)</h2><p><strong>Concept</strong></p><ul><li><p><strong>INNER JOIN</strong> &#8594; returns only matching rows in both tables</p></li><li><p><strong>LEFT JOIN</strong> &#8594; returns all rows from left table + matching from right (non-matching becomes NULL)</p></li></ul><p><strong>Example</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><pre><code><code>SELECT A.customer_id, B.order_id
FROM Customers A
LEFT JOIN Orders B
ON A.customer_id = B.customer_id;
</code></code></pre><p><strong>When to use</strong></p><ul><li><p>INNER: only want customers who placed orders</p></li><li><p>LEFT: want all customers, even if no orders</p></li></ul><div><hr></div><h2>2) WHERE vs HAVING (Real use case)</h2><p><strong>Concept</strong></p><ul><li><p><strong>WHERE</strong> filters rows <strong>before</strong> grouping/aggregation</p></li><li><p><strong>HAVING</strong> filters groups <strong>after</strong> aggregation</p></li></ul><p><strong>Example</strong></p><pre><code><code>SELECT dept, COUNT(*)
FROM Employees
GROUP BY dept
HAVING COUNT(*) &gt; 5;
</code></code></pre><p><strong>Real use case</strong></p><ul><li><p>WHERE: filter only 2025 sales rows first</p></li><li><p>HAVING: filter only regions where total sales &gt; 1M</p></li></ul><div><hr></div><h2>3) SQL: Find the 2nd highest salary</h2><p><strong>Simple approach (no ties handling)</strong></p><pre><code><code>SELECT MAX(salary) AS Second_Highest
FROM employees
WHERE salary &lt; (SELECT MAX(salary) FROM employees);
</code></code></pre><p><strong>Short note</strong></p><ul><li><p>This works when you just want the next lower value than the maximum.</p></li></ul><p>(If you want tie-handling version, comment and I&#8217;ll add DENSE_RANK version too.)</p><div><hr></div><h2>4) How do you handle missing values?</h2><p><strong>Common options</strong></p><ul><li><p><strong>Drop</strong> missing rows/columns (if very small % and not important)</p></li><li><p><strong>Impute</strong> using mean/median/mode</p></li><li><p><strong>Predict</strong> missing values using regression / KNN imputer</p></li></ul><p><strong>Rule of thumb</strong></p><ul><li><p>If missing &lt; ~5% &#8594; dropping can be OK</p></li><li><p>If the column is important &#8594; impute or model it</p></li></ul><div><hr></div><h2>5) How do you detect outliers? (IQR / Z-score / boxplot)</h2><h3>IQR Method (most common)</h3><ul><li><p>Outliers are below <strong>Q1 &#8722; 1.5&#215;IQR</strong> or above <strong>Q3 + 1.5&#215;IQR</strong></p></li></ul><h3>Z-score Method</h3><ul><li><p>If <strong>|z| &gt; 3</strong>, treat as outlier (common threshold)</p></li></ul><h3>Visual checks</h3><ul><li><p><strong>Boxplot</strong> and <strong>scatter plot</strong> for quick spotting</p></li></ul><div><hr></div><h2>6) Normalization (1NF, 2NF, 3NF)</h2><p><strong>Goal:</strong> reduce redundancy and avoid update anomalies.</p><ul><li><p><strong>1NF:</strong> atomic values (no lists inside a cell)</p></li><li><p><strong>2NF:</strong> remove partial dependency (depends on full composite key)</p></li><li><p><strong>3NF:</strong> remove transitive dependency (non-key should not depend on another non-key)</p></li></ul><p>Quick memory trick:<br><strong>1NF = clean cells</strong><br><strong>2NF = full key dependency</strong><br><strong>3NF = no indirect dependency</strong></p><div><hr></div><h2>7) OLTP vs OLAP (with examples)</h2><p><strong>OLTP (Transactional systems)</strong></p><ul><li><p>Fast inserts/updates</p></li><li><p>Highly normalized</p></li><li><p>Example: ATM, e-commerce checkout</p></li></ul><p><strong>OLAP (Analytics systems)</strong></p><ul><li><p>Fast reads + aggregations</p></li><li><p>Often denormalized (star schema)</p></li><li><p>Example: dashboards, reporting systems</p></li></ul><div><hr></div><h2>8) What is data cleaning + checklist?</h2><p><strong>Definition</strong><br>Data cleaning = making data accurate, consistent, and analysis-ready.</p><p><strong>My checklist</strong></p><ul><li><p>Remove duplicates</p></li><li><p>Fix missing values strategy</p></li><li><p>Standardize formats (dates, currencies, categories)</p></li><li><p>Handle outliers (remove/cap/transform)</p></li><li><p>Validate ranges (age &gt; 0, salary not negative)</p></li><li><p>Check consistency across columns (state vs zip, etc.)</p></li></ul><div><hr></div><h2>9) Power BI: Measures vs Columns</h2><p><strong>Column</strong></p><ul><li><p>Calculated per row (stored in the table)</p></li><li><p>Good for row-level logic or categories</p></li></ul><p><strong>Measure</strong></p><ul><li><p>Aggregated result evaluated on the fly (changes with filters/slicers)</p></li><li><p>Best for KPIs like Sales, Profit, YoY%</p></li></ul><p>Shortcut:<br><strong>Columns = row-wise</strong><br><strong>Measures = filter-context dependent</strong></p><div><hr></div><h2>10) DAX: What does CALCULATE() do?</h2><p><strong>Concept</strong><br><code>CALCULATE()</code> changes the <strong>filter context</strong> of a measure.</p><p><strong>Example</strong></p><pre><code><code>Total2023Sales =
CALCULATE(SUM(Sales[Amount]), Sales[Year] = 2023)
</code></code></pre><p>In simple terms:<br>It tells Power BI: &#8220;Compute this, but under these filters.&#8221;</p><div><hr></div><h2>11) Make Power BI reports faster for huge data</h2><p>High-impact optimizations:</p><ul><li><p>Remove unused columns</p></li><li><p>Use <strong>Star schema</strong></p></li><li><p>Prefer DAX measures over heavy transformations</p></li><li><p>Aggregate at source/query level</p></li><li><p>Turn off Auto date/time (often helps)</p></li><li><p>Reduce visuals on a page (too many visuals slows rendering)</p></li></ul><div><hr></div><h2>12) Scenario: &#8220;Sales dropped in one region&#8221; &#8212; how to investigate?</h2><p>A clean interview flow:</p><ol><li><p>Compare <strong>MoM / YoY trend</strong> for that region</p></li><li><p>Break down by <strong>product, category, channel, customer segment</strong></p></li><li><p>Check <strong>pricing changes</strong>, discounts, stock-outs, returns</p></li><li><p>Look for <strong>customer churn</strong> or loss of key accounts</p></li><li><p>Validate external factors: holidays, competition, supply chain issues</p></li></ol><p>Bonus line:<br>&#8220;I&#8217;d validate whether it&#8217;s a data issue first (missing transactions, wrong filters, refresh failures).&#8221;</p><div><hr></div><h2>13) GROUP BY + common mistakes</h2><p><strong>Purpose</strong><br>Groups same values and summarizes them.</p><p><strong>Example</strong></p><pre><code><code>SELECT department, COUNT(*)
FROM employees
GROUP BY department;
</code></code></pre><p><strong>Common mistakes</strong></p><ul><li><p>Selecting a non-aggregated column not present in GROUP BY</p></li><li><p>Using WHERE instead of HAVING for aggregate filters</p></li><li><p>Grouping at wrong granularity (monthly vs daily mismatch)</p></li></ul><div><hr></div><h2>14) COUNT(*) vs COUNT(column)</h2><ul><li><p><strong>COUNT(*)</strong> &#8594; counts all rows (including NULLs)</p></li><li><p><strong>COUNT(column)</strong> &#8594; counts only rows where that column is <strong>NOT NULL</strong></p></li></ul><p>Interview-safe example:<br>&#8220;If some salaries are NULL, COUNT(salary) will be lower than COUNT(*)&#8221;</p><div><hr></div><h2>15) Tell me about a time you used data to drive a decision (project answer)</h2><p>Use this structure:<br><strong>Context &#8594; Action &#8594; Insight &#8594; Impact (with a number)</strong></p><p><strong>Sample</strong><br>&#8220;I built an e-commerce sales dashboard and analyzed product-wise revenue, profit margins, and region performance. I found returns were unusually high in one region. After drilling down, it pointed to a logistics issue. We adjusted the delivery partner for that region and reduced the return rate by ~18%.&#8221;</p><p>Tip:<br>Always add <strong>one metric</strong> (18%, 2x, 10 hours saved, etc.)<br><br><a href="https://drive.google.com/file/d/1QIkfWxFsiJ686TZllc9sQAIU5zFi3OHc/view?usp=sharing">Complete PDF solutions</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://karthiktechdairy.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Ultimate Cold Outreach Template Guide]]></title><description><![CDATA[For LinkedIn & Email]]></description><link>https://karthiktechdairy.substack.com/p/the-ultimate-cold-outreach-template</link><guid isPermaLink="false">https://karthiktechdairy.substack.com/p/the-ultimate-cold-outreach-template</guid><dc:creator><![CDATA[Karthik Adari]]></dc:creator><pubDate>Wed, 21 Jan 2026 15:58:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UMJq!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ba28ade-da8d-46d1-89b3-fb1b8a34eaf7_800x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>The Ultimate Cold Outreach Template Guide (v2.0)</strong></h1><p><strong>Optimized for LinkedIn &amp; Email</strong></p><h3><strong>Phase 1: The Connection Request (300 Character Limit)</strong></h3><p><em>Strategy: No links. No pitch. Just a specific hook.</em></p><p><strong>Option A: The &#8220;Fan&#8221; (Best for Hiring Managers)</strong></p><blockquote><p>Hi [Name], I recently saw your post about [Specific Topic] - the point about [Detail] really stood out to me. I&#8217;m currently building in this space and would love to connect to follow your updates.</p></blockquote><p><strong>Option B: The &#8220;Fellow Professional&#8221; (Best for Peers)</strong></p><blockquote><p>Hi [Name], I found your profile while researching [Company]. I see we both work in [Industry/Domain]. I&#8217;d love to connect to share insights in the field.</p></blockquote><p><strong>Option C: The &#8220;Alumni/Mutual&#8221; (Best for Warm Leads)</strong></p><blockquote><p>Hi [Name], I noticed we are both alumni of [University/Company]. I&#8217;m currently working in [Industry] and would love to connect with a fellow [Mascot/Alumni Name] in the space.</p></blockquote><div><hr></div><p><strong>Phase 2: The Hiring Manager (The &#8220;Value Pitch&#8221;)</strong></p><p><em>Strategy: Hyper-specific opening + Low friction CTA. Avoid attachments on LinkedIn initially.</em></p><p><strong>Subject:</strong> Question regarding [Role Title] / [Specific Project]</p><p><strong>Message:</strong></p><blockquote><p>Hi [Name],</p><p>I saw the [Role Title] opening and your team&#8217;s recent work on [Specific Initiative/News]. This role&#8217;s focus on [JD Theme, e.g., Scaling Systems] aligns perfectly with what I&#8217;ve delivered in my past work.</p><p><strong>Quick highlights of my fit:</strong></p></blockquote><ul><li><p><strong>Relevance:</strong> [Number] years focused on [Specific Domain].</p></li><li><p><strong>Impact:</strong> Built [Project] which resulted in [Metric/Result, e.g., 20% efficiency increase].</p></li><li><p><strong>Skillset:</strong> Strong technical command of [Key Tool A] and [Key Tool B].</p></li></ul><blockquote><p>I know you are busy. Instead of a meeting, would you be open to a 2-minute overview of how I could support the current priorities?</p><p>Best,</p><p>[Your Name]</p><p>[Portfolio Link - Only include if sending via Email]</p></blockquote><div><hr></div><p><strong>Phase 3: The Recruiter (The &#8220;Screening Checklist&#8221;)</strong></p><p><em>Strategy: Make their job easy. Give them the data they need to &#8220;pass&#8221; you immediately.</em></p><p><strong>Subject:</strong> Application for [Role Title] (ID: [Job ID]) - [Your Name]</p><p><strong>Message:</strong></p><blockquote><p>Hi [Name],</p><p>I&#8217;m writing to express strong interest in the [Role Title] role (Job ID: [Number]). Based on the requirements for [Skill A] and [Skill B], I believe I am a strong technical match.</p><p><strong>The Logistics (To save you time):</strong></p></blockquote><ul><li><p><strong>Location:</strong> [Your City] (Open to relocation/Remote)</p></li><li><p><strong>Authorization:</strong> [Citizen / Green Card / Visa Status]</p></li><li><p><strong>Notice Period:</strong> [2 Weeks / Immediate]</p></li><li><p><strong>Key Skills:</strong> [Skill 1], [Skill 2], [Skill 3]</p></li></ul><blockquote><p>I&#8217;ve attached my resume for review. I&#8217;d love to connect if my background aligns with what you are looking for.</p><p>Best,</p><p>[Your Name]</p></blockquote><div><hr></div><p><strong>Phase 4: The Peer Referral (The &#8220;Soft Ask&#8221;)</strong></p><p><em>Strategy: Low-effort questions. If they reply, offer to write the referral blurb for them.</em></p><p>Step 1: The Initial Outreach</p><p>Subject: Quick question about [Company/Team]</p><blockquote><p>Hi [Name],</p><p>I came across your profile while researching [Company]. I&#8217;ve built [Project 1] related to this domain, so I have always admired the team&#8217;s approach to [Topic].</p><p>I know you&#8217;re busy, but I&#8217;d love your quick take on one thing:</p></blockquote><ul><li><p><em>Is the team currently more focused on [Strategy A] or [Strategy B]?</em></p></li></ul><blockquote><p>No pressure at all - thanks for sharing your work!</p><p>Best,</p><p>[Your Name]</p></blockquote><p><strong>Step 2: The &#8220;Ask&#8221; (Send ONLY after they reply)</strong></p><blockquote><p>Thanks, [Name]. That insight is really helpful.</p><p>I actually noticed the [Job Title] role opened up on your team. Since I have a background in [Your Skill], I feel I&#8217;d be a great fit.</p><p>Would you be open to referring me? <strong>If yes, I can send over the job link and 3 bullet points about my experience so you don&#8217;t have to write anything.</strong></p></blockquote><div><hr></div><p><strong>Phase 5: The Follow-Up (The &#8220;Quick Bump&#8221;)</strong></p><p><em>Strategy: Short, direct, and zero guilt.</em></p><p><strong>Subject:</strong> Re: [Previous Subject Line]</p><blockquote><p>Hi [Name],</p><p>Quick bump on this - happy to send a 2-minute overview via email/message instead of scheduling time if that&#8217;s easier.</p><p>Let me know if the role is still a priority.</p><p>Best,</p><p>[Your Name]</p></blockquote><div><hr></div><p><strong>Key Strategy Notes</strong></p><ol><li><p><strong>The &#8220;Quick Highlights&#8221; Section:</strong> Bullet points are essential. Hiring managers scan emails; they do not read them word-for-word. This section allows them to see your value in 3 seconds.</p></li><li><p><strong>No Attachments on LinkedIn:</strong> LinkedIn compresses images and sometimes flags PDFs as security risks. In DMs, say: <em>&#8220;Happy to share my resume if useful&#8221;</em> and wait for them to say yes. For Email, attaching is fine.</p></li><li><p><strong>The &#8220;Easy Referral&#8221;:</strong> Never make a current employee work for you. Always offer to write the &#8220;blurb&#8221; (the 3 bullet points) they can simply copy-paste into their internal referral system.</p></li><li><p><strong>The &#8220;2-Minute Overview&#8221;:</strong> Asking for &#8220;30 minutes&#8221; feels like a burden (a meeting). Asking to &#8220;share a 2-minute overview&#8221; feels like you are being helpful and respectful of their time.</p></li></ol><div><hr></div><p><strong>3 Golden Rules for Cold Outreach</strong></p><p><strong>1. Mobile Optimization is Non-Negotiable</strong></p><p>Most recruiters and managers read LinkedIn messages on their phones. If your message looks like a &#8220;wall of text&#8221; (more than 3 sentences without a break), they will skip it. Use short paragraphs, bullet points, and bold text to guide their eyes.</p><p><strong>2. Don&#8217;t be Assumptive</strong></p><p>Avoid phrases like &#8220;I can solve your problems&#8221; (you don&#8217;t know their problems yet). Use softer language: &#8220;I believe I can help with the priorities this role supports&#8221; or &#8220;This aligns with my experience in [Area].&#8221;</p><p><strong>3. Low-Friction Calls to Action (CTA)</strong></p><p>End every message with a question that allows them to say &#8220;Yes&#8221; easily.</p><ul><li><p><em>Bad:</em> &#8220;Can I have 30 minutes to pick your brain?&#8221; (High effort).</p></li><li><p><em>Good:</em> &#8220;Are you open to a connection?&#8221; or &#8220;Can I send a 2-minute overview?&#8221; (Low effort).</p></li></ul>]]></content:encoded></item></channel></rss>