After use of GitHub Copilot across various personal projects, I’ve compiled practical insights on what works, what doesn’t, and how to maximize productivity while maintaining code quality.
Project Setup & Dependencies
| Challenge | Best Practice | Notes |
|---|
| LLM-generated configs broke builds | Use official CLI tools | Letting Copilot generate requirements.txt or pyproject.toml caused dependency issues. Using poetry new or uv init avoided this entirely. |
| Version mismatches | Install latest versions manually | LLMs are trained on older versions. Installing the latest frameworks first and then asking Copilot to code on top prevented deprecated API usage. |
| Starting from scratch caused fragility | Begin with a working baseline | Always ran the app once manually before involving Copilot. This made it clear whether later failures were Copilot-induced. |
Scope Management
| Challenge | Best Practice | Notes |
|---|
| Unintended edits across files | Explicitly constrain scope | Without constraints, Copilot modified unrelated files.
Prompt pattern that helped: “Only modify user_service.py. Do not change imports or behavior elsewhere.” or “Make the minimum number of changes required to accomplish this.” |
| Excess debug code left behind | Clean up after fixes | Models often added print statements or logs during debugging. Running formatters and doing a final diff review helped catch leftover debug statements. |
| Unused code accumulated | Run linters and formatters after big changes | Linters caught unused imports, variables, and helper functions that Copilot created but never removed. Example: ruff, black, mypy after refactors. |
Technology Selection
| Challenge | Best Practice | Notes |
|---|
| Hard to debug unfamiliar stacks | Use languages/frameworks you know | Vibe coding only worked smoothly in ecosystems I already understood. In unfamiliar stacks, debugging Copilot’s mistakes took longer than writing code manually. |
| Hallucinated APIs | Prefer popular frameworks | FastAPI and Flask consistently produced better suggestions than niche frameworks due to larger training data. |
| Large diffs and boilerplate | Prefer less verbose technologies | Python worked better than Java; FastAPI better than Django REST Framework. Less boilerplate meant fewer opportunities for the model to mess up. |
| Reinvented common functionality | Import well-used libraries | Example: using pydantic for validation instead of custom validators, or httpx instead of raw urllib. Lightweight, popular libraries were handled more reliably by Copilot. |
Development Workflow
| Challenge | Best Practice | Notes |
|---|
| Large diffs were risky | Keep changes small and iterative | Smaller prompts like “Refactor only this function” worked better than “Clean up this module.” |
| Code worked but looked wrong | Review diffs, not just behavior | Even when tests passed, diff reviews caught duplicated logic and unnecessary abstractions introduced by Copilot. |
| Errors swallowed | Keep errors explicit | Copilot often wrapped logic in broad try/except blocks. Manual review ensured failures remained visible and actionable. |
| Risk of leaking secrets | Audit logs carefully | Debug-heavy iterations sometimes logged request payloads or headers. Extra care was needed to remove sensitive logging before merge. |
Testing & Documentation
| Challenge | Best Practice | Notes |
|---|
| Silent regressions | Re-run tests frequently | Copilot changes sometimes broke unrelated code paths. Running tests after every non-trivial change caught this early. |
| Tests locked wrong behavior | Write tests after logic stabilizes | Generating tests too early made refactors harder. Waiting until behavior settled worked better. |
| Docs drifted quickly | Update docs last | Asking Copilot to update docs after code was final reduced stale or redundant documentation. |
Maintainability & Best Practices
| Challenge | Best Practice | Notes |
|---|
| Unclear generated code | Never merge what you don’t understand | If I couldn’t explain the code in my own words, I didn’t merge it. This rule prevented long-term technical debt. |
| Over-reliance on generation | Use Copilot as a reviewer | Asking Copilot “Review this diff and point out risks or unnecessary abstractions” often produced better results than generation alone. |
| Faster mistakes | Treat speed as a risk | Copilot increased speed significantly, but deliberate slowdowns at review time were essential to maintain architectural control. |