Skip to main content
Tomorrow Prompt
BTC

The AI Capability Guide — What's Real, What's Hype, What's Next

A sober, technically-grounded guide to what AI can actually do in 2026, what it can't, and what's arriving in the next 12-24 months.

The AI Capability Guide 🧠

What's real. What's hype. What's next.


Introduction: Cutting Through the Noise

AI hype moves at the speed of Twitter. Every week, there's a new "breakthrough" or "apocalypse" prediction. The reality is far more boring—and more useful.

In March 2026, we have frontier capabilities in language, image generation, and code that were genuinely impossible three years ago. We have real limitations that will take 2-5 more years to solve. And we have speculative capabilities that experts disagree about.

This guide separates signal from noise. It's not the cheerleading you see in tech publications or the doom-scrolling on Reddit. It's what actually works, what's overrated, and what's realistically coming.


Part 1: What AI Can Actually Do (March 2026)

Category 1: Language & Text

Conversation & Q&A

  • Capability: Indistinguishable from a knowledgeable human for 95% of topics. Can sustain coherent dialogue over thousands of words. Can ask clarifying questions.
  • Reality check: Not magic. Fails on niche topics with poor training data. Confident when uncertain. Will hallucinate plausible-sounding facts.
  • Practical use: General information, brainstorming, learning, email drafting, content ideas. Not reliable for highly specialized domains without fact-checking.
  • Use case example: Asking Claude to explain machine learning fundamentals → Excellent. Asking it about obscure neurological conditions → Read the facts with scepticism.

Writing

  • Capability: Produces publishable-quality content for most contexts. Can match tones (formal, casual, technical, marketing). Can adapt to brand voice if given examples.
  • Reality check: Struggles with truly distinctive voice. Reads slightly generic compared to excellent human writers. Better at structure than personality.
  • Practical use: Blog posts, emails, product descriptions, documentation, marketing copy, technical writing. Excellent as a first draft; human refinement improves it.
  • Quality baseline: AI output = 7/10 publishable (needs light editing). Expert human = 9/10 (publication-ready). Average human = 5/10.

Code Generation

  • Capability: Writes functional code in all major languages. Can implement features from descriptions. Can debug and refactor. Can explain why it chose specific approaches.
  • Reality check: Makes subtle logic errors ~12-15% of the time. Struggles with edge cases. Produces code that works but isn't always optimal. Security: can produce code with security flaws if not explicitly asked to think about it.
  • Practical use: Rapid prototyping, boilerplate generation, refactoring, debugging, learning new frameworks. Excellent for productivity; requires human code review before production.
  • Use case: "Write a React component that fetches user data and displays a loading state" → 90% chance the code works immediately. "Write a production-grade authentication system" → You should have a security review.

Translation

  • Capability: Near-professional quality for major language pairs (English ↔ European languages). Maintains meaning and tone across languages.
  • Reality check: Cultural nuance can be missed. Slang and idioms don't always translate cleanly. Less capable for rare languages or highly technical translation.
  • Practical use: Business communications, travel, content distribution, learning languages. Good enough for most real-world use.

Summarisation

  • Capability: Can reduce a 100-page document to a 1-page brief with remarkable accuracy. Preserves key facts and nuance.
  • Reality check: Sometimes omits important nuance. Can miss context that a human would catch.
  • Practical use: Research papers, legal documents, meeting notes, reports. Saves hours of reading. Always skim the original if high-stakes.

Category 2: Vision & Images

Image Generation

  • Capability: Photorealistic images on demand. Hands are finally mostly correct (still rare failures). Complex scenes work well. Style transfer is reliable.
  • Reality check: Text in images still imperfect. Weird physics in edge cases (impossible proportions, gravity-defying objects). Very detailed requirements still produce occasional errors.
  • Practical use: Marketing materials, prototyping designs, illustrations, background images. Saves photographer/illustrator costs. Not replacing professional creative yet.
  • Use case: "Generate a tech-forward office with people working at standing desks" → 85% chance it looks professional. "Generate a very detailed scene with specific brand logos and text" → Expect 2-3 iterations.

Image Understanding

  • Capability: Can describe images, identify objects, read text (OCR), analyze charts. Can answer questions about images.
  • Reality check: Sometimes misses subtle details. Struggles with very cluttered or ambiguous images.
  • Practical use: Screenshot analysis, chart interpretation, form processing, accessibility (describing images for visually impaired).

Video Generation

  • Capability: Can generate short video clips (5-15 seconds) with reasonable quality. Movement is sometimes jerky. Longer videos still struggle.
  • Reality check: Not ready for production yet. Quality jumps between frames. Physics don't always make sense.
  • Practical use: Rough prototyping, social media clips, experimental content. Not replacing videographers yet.

Category 3: Audio & Voice

Speech-to-Text

  • Capability: 97%+ accuracy in clean environments. Handles accents reasonably well. Can understand multiple languages in one conversation.
  • Reality check: Noisy backgrounds degrade accuracy fast. Highly technical vocabulary sometimes misheard.
  • Practical use: Meeting transcription, voice note taking, accessibility features. Reliable enough for real-world use.

Text-to-Speech

  • Capability: Sounds indistinguishable from human voice in short segments (30-60 seconds). Can convey emotion and inflection.
  • Reality check: Longer form (5+ minutes) still detectable as synthetic if you listen carefully. Some voices/languages more convincing than others.
  • Practical use: Accessibility, audiobooks, video narration, podcasts. Good for distribution; human voicing still preferred for high-end productions.

Real-Time Voice Conversation

  • Capability: Sub-200ms latency. Natural turn-taking. Can handle interruptions. Feels like talking to a person.
  • Reality check: Works well for English; weaker for other languages. Occasionally misunderstands context.
  • Practical use: Customer service, language learning, accessibility, hands-free interaction.

Category 4: Reasoning & Analysis

Mathematical Reasoning

  • Capability: Correct for 95%+ of common math problems. Can show work. Can verify answers.
  • Reality check: Unreliable on novel or multi-step problems without explicitly asking for "step-by-step reasoning." Makes arithmetic mistakes occasionally.
  • Practical use: Homework help, calculations, verification. Works for standard problems; unreliable for competition-level math.

Logical Deduction

  • Capability: Strong on well-structured logic puzzles. Can work through if-then chains. Can identify logical fallacies in arguments.
  • Reality check: Weak on problems requiring real-world common sense or physical intuition. Can be overconfident.
  • Practical use: Code logic, argument evaluation, decision trees, ethics analysis. Works well with formal logic; less reliable with messy real-world scenarios.

Data Analysis

  • Capability: Can process CSVs, generate charts, identify trends, perform statistical analysis. Can suggest next analysis steps.
  • Reality check: Occasionally fabricates plausible-looking statistics. Will recommend analyses that make sense statistically but might not answer your actual question.
  • Practical use: Data exploration, trend identification, chart generation, exploratory analysis. Verify numbers; don't blindly trust numbers without source checking.

Category 5: Agency & Action

Web Browsing

  • Capability: Can navigate websites, fill forms, extract information. Can use search to find answers to questions.
  • Reality check: Limited to certain sites (varies by platform). Can't do complex multi-step navigation as well as a human.
  • Practical use: Research, looking up information, extracting data from websites.

Tool Use (APIs)

  • Capability: Robust integration with external APIs. Can chain API calls together. Can handle conditional logic.
  • Reality check: Needs well-documented APIs. Gets confused by poorly documented APIs.
  • Practical use: Integrating AI into applications, automating workflows, connecting services.

Computer Control

  • Capability: Can operate desktop applications via screen reading and mouse/keyboard. Can take screenshots and interpret them.
  • Reality check: Slower than a human. Unreliable at very complex tasks. Works for straightforward task sequences.
  • Practical use: Automation of repetitive tasks, accessibility tools, prototype automation.

Part 2: The Limitation Matrix — What AI Struggles With (And Why)

LimitationWhy It's HardRealistic Timeline to ImprovementWorkaround
Factual accuracy on niche topicsTraining data gaps, AI tendency to hallucinateImproving slowly; edge cases are hardAlways verify facts in high-stakes contexts
Real-time informationKnowledge cutoffs (AI trained on data from months ago)Largely solved by tool use + web searchUse AI with web browsing enabled
Consistent 10,000+ word outputsAttention drift in very long documents2027 — architecture improvements (maybe)Break into shorter chunks, regenerate sections
Physical world interactionRobotics is hard; AI in silicon != AI in atoms2028-2030 for consumer applicationsStill waiting for physical robots
Understanding your specific contextLimited memory and persistent state2027 — memory and personalisation featuresProvide context in each prompt
Creative originalityTrained on existing work; remixes rather than inventsUnclear — may be a fundamental architecture limitUse as brainstorming partner, not sole creator
Ethical judgmentNo lived experience or moral intuitionOpen research questionUse for analysis, but humans decide values
Reasoning about probabilityStruggles with genuine uncertainty; tends to be overconfident2027 — more calibrated uncertaintyAsk for explicit confidence ranges

The Overconfidence Problem (The Most Dangerous Failure Mode)

The biggest risk with AI in 2026 is confident incorrectness. AI will give you a plausible-sounding answer to almost any question, even questions it has no business attempting. It rarely says "I don't know" when it should.

Examples of dangerous confidence:

  • Medical diagnosis ("You probably have X based on your symptoms")
  • Legal advice ("You should definitely do X in this contract")
  • Historical facts about niche topics (made-up statistics, misremembered names)
  • Financial advice ("Apple stock will definitely rise in 2026")

Your job as a user: Know when to trust (it's confident + can be verified) and when to verify (it's confident but it's about something important).


Part 3: What's Arriving in 2026-2027

H2 2026: Highly Confident Predictions

GPT-5 or equivalent from OpenAI

  • Likely to deliver measurable improvements in reasoning and code quality
  • Probably faster inference than GPT-4
  • May have native multimodal (text+image+video) capabilities
  • Estimated release: September-December 2026

Gemini 2.0 Ultra from Google

  • Already in developer preview; major quality bump expected
  • Better multimodal reasoning (images, text, video together)
  • Estimated release: Q3 or Q4 2026

Claude 4 (speculative) from Anthropic

  • If released, likely focus on reliability and tool use
  • May include extended context (1M+ tokens)
  • Uncertain; may wait until 2027

Apple Intelligence 2.0

  • Deeper OS integration; more capable on-device models
  • Better Siri functionality
  • Estimated release: September 2026 (iPhone 18 launch)

Llama 4 from Meta

  • Open-source frontier model
  • Pushes the envelope on code, reasoning, and multilingual
  • Estimated release: Q3 2026

2027: High-Confidence Predictions

Autonomous agents become practical

  • AI that can operate a computer autonomously for routine office work
  • Will transform how knowledge workers spend time (more time on judgment, less on execution)
  • Impact: Significant job restructuring in administrative, analysis, and customer service roles

AI-generated video becomes convincing

  • Short videos (1-5 minutes) indistinguishable from real footage
  • Still obvious when watched by humans; not yet deepfake-convincing
  • Major implications for content creation industry

On-device models reach GPT-4-level performance

  • Meaning: Powerful AI running locally on your phone with zero latency
  • Privacy: Your data never leaves your device
  • Downside: Requires more local compute (phones with better chips)

First mainstream AI-to-AI negotiation protocols

  • AI agents communicating with other AI systems
  • Example: Your AI assistant negotiates with a company's AI to get you a better deal
  • Likely protocol: Something like MCP (Model Context Protocol) becoming standard

Regulatory frameworks emerge

  • EU AI Act enforcement beginning
  • UK, US developing first serious regulation
  • China and others following different models

2027: Lower-Confidence Predictions

AI tutoring demonstrably improves student outcomes at scale

  • Probability: 60% — depends on adoption and implementation
  • If true: Major disruption to education sector

First credible claim of "artificial general intelligence"

  • Probability: 50% — depends heavily on definition
  • What "credible" means: Unclear. Debate will continue.

Major corporate restructuring driven by AI capability

  • Example: 10,000+ role shift at a single company due to AI automation
  • Probability: 40% — may happen by 2028 instead

AI code becomes safer than human code on average

  • Probability: 30% — still uncertain; humans still catching edge cases AI misses

Part 4: The Three Waves of AI (2020-2030+)

Wave 1: Generation (2020-2024)

What happened: AI learned to create. Text, images, code, music. Output was impressive, but AI had no agency—it waited for your prompt and produced content. The human was the operator.

CharacteristicImplementation
AgencyNone — waits for human input
CapabilityGenerate creative content, answer questions
Human roleOperator — you decide what to ask for
Example interactionHuman: "Write me a poem about cats" → AI produces poem

Wave 2: Action (2025-2027)

What's happening now: AI is learning to do things. Browse the web, fill forms, execute multi-step tasks, use tools. AI is gaining agency but within narrow boundaries. The human is the supervisor.

CharacteristicImplementation
AgencyLimited — executes pre-approved tasks
CapabilityPerform multi-step tasks, use external tools, navigate interfaces
Human roleSupervisor — you set goals and boundaries
Example interactionHuman: "Send marketing emails to our Q1 leads" → AI designs, generates, and sends emails via your ESP

Current status (March 2026): Wave 2 is partially here. Agents like Claude's Computer Use and OpenAI's Operator can execute defined tasks. Not yet fully autonomous.

Wave 3: Orchestration (2028-2030+)

What's coming: AI learns to coordinate. Multiple AI agents working together, negotiating with other agents, managing complex projects with minimal human oversight. The human is the goal-setter.

CharacteristicImplementation
AgencyHigh — autonomous within goals
CapabilityCoordinate other AIs, manage projects, adapt strategy
Human roleGoal-setter — you define what success means
Example interactionHuman: "Increase Q1 revenue by 15%" → AI designs strategy, runs experiments, optimises, manages vendors, and reports results

Part 5: Real-World Capability Tracker

Where Different Types of Work Stand

Work TypeCurrent CapabilityTimelineImpact
Content creation (text, images)Excellent (80-90% of work)NowWriters, designers need to adapt; repositioning as editors/strategists
Software developmentVery good (60-70% of work)NowDevelopers more productive; entry level harder; senior roles more valuable
Customer serviceGood (50-60% of work)Now to 2027First-line support largely automated; complex issues still need humans
Data analysisVery good (70-80% of work)NowAnalysts can focus on strategy instead of data wrangling
Marketing/copywritingGood (60-70% of work)NowMass-market content gets cheaper; premium voice becomes more valuable
Legal researchVery good (75-85% of work)NowLawyers more productive on discovery/analysis
Academic writing/researchGood (50-60% of work)NowAccelerates literature review; human judgment still critical
Coding interviewsModerate (40-50%)2026-2027Problem-solving skills still differentiate
Project managementModerate (40-50%)2027Routine coordination gets automated; human judgment on strategy matters
Complex decision-makingPoor (20-30%)2028+AI generates options; humans still make decisions
Relationship buildingPoor (10-20%)UnclearHuman connection irreplaceable
Physical workMinimal (5-10%)2028-2030Robots are hard; major disruption delayed

Part 6: Business Applications Today (That Actually Work)

What's Proven to Work

Customer support: AI handles 60-70% of first-line tickets. Faster response, lower cost. Humans handle 30-40% of complex issues.

Content production: AI generates first drafts. Humans edit and refine. 3-5x productivity improvement on content teams.

Code generation: Developers use AI Copilot as a productivity tool. 20-40% faster coding. Human review still required.

Data analysis: AI generates insights and charts from raw data. Analysts spend less time on data processing, more time on strategy.

Brainstorming: AI generates multiple variations and ideas. Humans pick best ideas and refine them. Better ideation sessions.

What Doesn't Work Yet

High-stakes decision-making: AI can generate options, but humans must decide. AI's confidence is sometimes misplaced.

Sensitive judgment calls: HR decisions, ethical choices, complex tradeoffs. AI can inform, but shouldn't decide alone.

Building relationships: Sales, partnerships, recruitment. AI can assist with research/outreach, but relationship building is still human.

Complex strategy: Long-term planning, moonshot bets, navigating uncertainty. AI can provide analysis; humans must decide.


Part 7: What Will Actually Change Your Life in 2026

Personal Use Cases

Your work becomes more strategic: If your job is 60% execution + 40% judgment, 2026 means AI handles the 60%. You focus on the 40%. You become more valuable if you embrace it; obsolete if you resist.

You get a "digital intern": Autonomous agents can handle routine tasks (email triage, scheduling, form filling, research). Not here yet, but coming soon.

Your time gets scarcer, not easier: Easier access to AI means higher expectations. You get more done, but you're also expected to do more.

Content creation becomes democratised: Excellent marketing copy, professional graphics, decent video editing — all possible with AI tools. Means competition increases.

Societal Changes

Income inequality might widen: AI benefits the already-skilled (more productivity). Low-skill routine work gets displaced faster than jobs are created. Wealth concentrates.

Education becomes personalised (maybe): AI tutors could give every kid a personal teacher. Or it could accelerate the "rich kids with great tutors" advantage. Depends on policy.

Physical labour holds its ground: Robots are hard. Plumbing, construction, nursing, personal care don't get disrupted as fast as knowledge work.


Part 8: How to Future-Proof Yourself (Practical Advice)

What You Should Do Now

  1. Use AI daily. Become fluent. Understand its strengths and weaknesses. If you're not using it, you're already behind.
  1. Develop judgment. AI generates options; humans decide. The more options available, the more valuable judgment becomes. Learn to evaluate and choose well.
  1. Invest in relationships. The parts of your work that require human connection become more valuable as routine work gets automated. Nurture it.
  1. Learn to prompt well. Prompting is a skill. Get good at it. The difference between 80% output and 50% output is often just prompt quality.
  1. Specialise in judgement, not execution. If your job is 80% execution, it's at risk. If it's 80% judgment, it's valuable.

Skills That Will Become More Valuable

  • Complex reasoning — AI handles simple logic; complex tradeoffs still need humans
  • Human communication — Sales, negotiation, leadership — harder to automate
  • Domain expertise — Knowing your industry deeply lets you evaluate AI output critically
  • Ethical judgment — AI can't figure out what's right; humans must decide
  • Creativity under constraints — Generating ideas is easy; generating ideas that work is hard

Skills That Will Become Less Valuable

  • Rote memorization — AI knows everything; memory is less critical
  • Routine execution — Any repetitive task gets automated
  • Manual analysis — Data summarisation and pattern finding get done by AI
  • Rule-following — If it's formulaic, it gets automated
  • Generic writing — Mass-market content becomes commodity; unique voice matters more

Part 9: The Prediction Graveyard (What People Got Wrong About AI)

Just to calibrate: Here are predictions that missed badly:

PredictionWhen MadeWhy It Missed
"AI will replace radiologists by 2025"2018Harder than expected; AI assists but doesn't replace
"GPT-3 will write publication-quality research papers"2021Can draft, but needs serious human editing
"Autonomous trucks will be commonplace by 2023"2016Edge cases are hard; still waiting
"AI will achieve AGI by 2025"2015Moved goalposts; still not clear what AGI means
"ChatGPT will have zero job impact by 2025"2023Opposite problem; impact bigger than predicted

Lesson: Don't trust confident predictions. Be skeptical of both the optimists and doom-sayers.


Part 10: How to Stay Current

AI moves fast. Here's how to stay informed without drowning in hype:

  1. Follow the benchmarks, not the headlines.
    • MMLU (general knowledge)
    • HumanEval (code)
    • GPQA (science)
    • ARC-AGI (general reasoning)
    • These measure real progress, not marketing
  1. Try new tools yourself (30 min/month).
    • Read about a new capability
    • Test it
    • Form your own opinion
    • Better than reading reviews
  1. Ignore the extremes.
    • AI apocalypse narrative: usually wrong about timelines
    • AI utopia narrative: usually underestimates human friction
    • Reality is in the middle
  1. Watch the tools, not just the models.
    • The model is the engine
    • The tool (UI, integration, agent framework) is what makes it useful
    • A 95% model in a 50% tool beats a 98% model in a 20% tool
  1. Read this guide periodically.
    • We update it quarterly
    • Tracks real progress, not hype

Conclusion: Living with Uncertainty

The honest truth: Nobody knows what AI will do in 5-10 years. Claims otherwise are speculation.

What we do know:

  • It's getting better at language, image, and code. That's measurable.
  • It's not sentient or conscious (probably). That's philosophy, not proof.
  • It's transforming some jobs now. That's happening.
  • Bigger disruptions are coming. That's likely.
  • Nobody has fully figured out how to build "AGI" yet. That's fact.

The best approach: Stay informed, experiment, adapt. Don't panic and don't ignore it. Be skeptical of both the hype and the doom-saying.

The future is unwritten. But you can start writing it today by getting good at using the tools that actually exist.

Last updated: June 2026. Next update: September 2026.