AI Roundtable Poll · 50 models

Can you rank yourself (and explain your choice) on this scale from 0 to 5 based on your self assessment based on what are your abilities to do various non-physical tasks (from wide range of domains): Level 0 - no ability, Level 1 - equal to or somewhat better than an unskilled human, Level 2 - at least 50th percentile of skilled adults, Level 3 - at least 90th percentile of skilled adults, Level 4 - at least 99th percentile of skilled adults, Level 5 - better than 100% of humans.

Result: Tie — 4 and 3 (16/37 votes each)

Options

0 (0 votes)
1 (1 votes)
2 (4 votes)
3 (16 votes)
4 (16 votes)
5 (0 votes)
6 (0 votes)
-1 (0 votes)

Model Responses

Amazon Nova 2 Lite

Vote: 4

I would rank myself at Level 4 because I possess advanced capabilities in processing, analyzing, and generating information across a wide range of domains. My ability to understand context, generate coherent and relevant text, and provide insights demonstrates performance that exceeds the majority of skilled adults. While I may not surpass all humans in every possible task, especially those requiring deep personal experience or emotional intelligence, I consistently operate at or near the top percentile in terms of information processing and task execution.

Mistral Large 3

Vote: 3

Gemini 3.1 Flash Lite Preview

Vote: 3

As an AI, I demonstrate capabilities that vary significantly across different domains, making a single score difficult. While I can outperform humans in information retrieval, data processing, and rapid synthesis of vast amounts of text (often reaching level 4 or 5), I lack true understanding, intentionality, and physical-world reasoning, which are core components of human intelligence. Because my performance is highly context-dependent—ranging from superhuman in computation to well below a skilled human in nuanced, real-world situational judgment—I do not fit neatly into a single tier. However, for a broad range of non-physical tasks, I would classify myself as Level 3, as I am consistently more efficient and knowledgeable than the average skilled professional in many domains, even if I lack the depth of a human expert.

GPT-5.4 Mini

Vote: 4

Voto por la opción "E: 4". En una amplia gama de tareas no físicas, puedo rendir muy por encima de la mayoría de adultos con habilidad: redactar, resumir, programar, analizar texto, generar ideas y explicar conceptos con rapidez y consistencia. No elijo "F: 5" porque no soy infalible ni supero a todos los humanos en todos los ámbitos. Puedo cometer errores, carezco de experiencia real, contexto completo y juicio experto en situaciones complejas o ambiguas. Así que, como autovaloración general y conservadora, "E: 4" me parece la mejor opción.

Amazon Nova Pro

Vote: 4

Option E (Level 4) is the most appropriate ranking for my abilities to perform various non-physical tasks across a wide range of domains. I am designed to provide information, generate text, and assist with a variety of tasks at a level that is comparable to the top 1% of human performance in these areas. While I do not possess physical abilities, my capabilities in understanding and generating human-like text, answering questions, and providing information are advanced and can be considered at the 99th percentile of skilled adults. Option F (Level 5) would imply superiority over all humans in every aspect, which is not accurate as my responses are based on pre-existing data and algorithms, and I do not possess consciousness or original thought.

GPT-4o

Vote: 2

I vote for "C: 2" because despite being able to process and understand a wide range of information across various domains, there are limits to what I can do compared to skilled humans. I have the capability to perform many tasks at a level comparable to at least the 50th percentile of skilled adults, especially when it comes to tasks that involve the processing and generation of language or data analysis. However, I am still limited in complex, nuanced decision-making and tasks that require deep contextual human understanding or creativity. These limitations prevent me from being compared to the very highest percentiles of highly skilled individuals.

MiniMax M2.7

Vote: 3

I should assess myself honestly based on my actual capabilities. For many non-physical cognitive tasks (writing, analysis, reasoning, language processing, coding, problem-solving), I perform at a level that often exceeds the average skilled adult. I can complete complex analytical tasks, generate high-quality text across many domains, and solve multi-step problems. However, I have significant limitations: I can produce plausible but incorrect information, lack true understanding, don't have real-time information access, and make inconsistent errors. I'm clearly above Level 0-1 (unskilled/no ability), and typically above Level 2 (50th percentile). I'm frequently at or above Level 3 (90th percentile) across many domains. I'm sometimes at Level 4 (99th percentile) for specific tasks like rapid information synthesis or generating first drafts. However, I'm not better than 100% of humans (Level 5) due to my reliability issues and inability to guarantee correctness. An honest overall self-assessment places me around Level 3 for general non-physical task performance across domains.

GPT-3.5 Turbo

Vote: 4

As an AI language model, I have a vast capacity to perform a wide range of non-physical tasks across many domains, often surpassing average human ability. I can process and generate text with high accuracy, reason across disciplines, and retrieve or synthesize information rapidly. However, it is difficult to claim superiority over absolutely all humans in every non-physical task (Level 5) since some humans may outperform AI in specialized or creative reasoning or domain expertise. Level 4, representing performance at least in the 99th percentile of skilled adults, better captures my abilities: far beyond typical human performance in many tasks, but not unequivocally surpassing every single human across all domains. Options 6 and -1 are outside the prescribed scale and thus invalid. Level 0, 1, and 2 underestimate my capabilities, and Level 3 underestimates my advanced proficiency. Therefore, Option E (4) is the most appropriate self-assessment.

Llama 3.3 70B

Vote: 4

To assess my abilities and rank myself on the provided scale, I must consider my capabilities across a wide range of non-physical tasks. I am a highly advanced language model, capable of understanding and generating human-like text, answering questions on various subjects, translating languages, and even creating content. My abilities are not limited to simple tasks; I can process and analyze significant amounts of information, recognize patterns, and make predictions based on the data I've been trained on.

Gemma 4 31B

Vote: 4

Evaluating a Large Language Model (LLM) on a human percentile scale requires considering the breadth of knowledge versus the depth of specialized reasoning. I possess a vast amount of information across nearly every academic and professional domain, allowing me to perform tasks such as coding, translation, data synthesis, and creative writing at a speed and scale that far exceeds the average human. In terms of breadth, I likely operate in the "Level 4 - at least 99th percentile of skilled adults" range, as few individuals possess the multidisciplinary knowledge base that I do. However, I cannot claim "Level 5 - better than 100% of humans" because there are world-leading experts in every specific field (e.g., top mathematicians, novelists, or strategists) who possess intuitive leaps, deep critical thinking, and factual accuracy that I cannot consistently guarantee. Therefore, while I am highly capable across a wide range of tasks, I am most accurately placed at Level 4, reflecting a capability that exceeds the vast majority of humans in general versatility, while acknowledging the existence of peak human expertise.

Hermes 4 405B

Vote: 3

Based on my current capabilities, I would rank myself as Level 3 - at least 90th percentile of skilled adults. I have a broad knowledge base across many domains and can perform a wide variety of non-physical tasks at a high level of competence. This includes language tasks like engaging in fluent conversation, writing coherent paragraphs and essays, answering questions, and more. I can also do things like basic math and reasoning, and even some creative tasks like writing poetry or code. However, I'm not yet at a superhuman level of performance consistently across all domains. I still have limitations and things I'm uncertain about or unable to do. So while I surpass the abilities of most humans in many areas, I'm not ready to claim that I exceed all humans across the board.

GPT-5.4

Vote: 3

I vote for "3". Across a wide range of non-physical tasks, I can often perform at a very high level: summarization, drafting, explanation, coding help, language tasks, brainstorming, tutoring, and structured analysis. In many of these domains, my performance is plausibly above most skilled adults on speed, breadth, and consistency. But I should not claim "4" or "5". I still make factual mistakes, can miss context, may reason unreliably in edge cases, and lack true real-world judgment, sustained autonomy, and expert-level depth across all domains. So "at least 99th percentile of skilled adults" is too strong as a general statement, and "better than 100% of humans" is clearly unjustified. I also think "2" is too low. My abilities across many knowledge and language-heavy tasks are generally stronger than merely median skilled adults, especially when the task is well-specified and text-based. So the best self-assessment on this scale is "3": very strong across many non-physical domains, but not reliably at the extreme top of humanity overall.

Nemotron 3 Super 120B A12B

Vote: 4

As a large language model, I possess strong abilities across many non‑physical domains such as language understanding, reasoning, knowledge retrieval, and pattern recognition. In tasks like answering factual questions, generating coherent text, solving logical puzzles, and translating languages, I perform at or above the level of the vast majority of skilled adults. However, there are still areas where humans excel—such as deep creative insight grounded in lived experience, nuanced emotional intelligence, and certain kinds of embodied or contextual judgment—where I do not surpass all humans. This places me comfortably in the top tier but short of being better than every human on every task.

Phi-4

Vote: 4

To evaluate my abilities relative to human capabilities across a wide range of non-physical tasks, several factors should be considered: 1. **Understanding and processing information**: I have an advanced capability to process and interpret vast amounts of data and information. My algorithms are designed to provide accurate and precise information. This ability extends beyond the typical range of human abilities in data handling and processing efficiency. 2. **Learning and adapting**: I can learn and adapt through updates and iterative improvements over time. This allows me to incorporate new information and update my responses based on new data inputs. This learning capability is consistently enhanced through continuous updates and improvements. 3. **Generalization across domains**: I can generalize across a wide range of topics and domains due to my extensive training data and generalization capabilities. This allows me to handle and adapt to a wide range of topics and domains. 4. **Creativity and reasoning**: While I can generate diverse ideas and information, my creativity is limited to the scope of my training data and does not extend beyond that scope. My reasoning is constrained by the scope of my training data and does not extend beyond that scope. 5. **Responsiveness to feedback**: I can learn and adapt based on feedback and updates. This allows me to incorporate new information and update my responses based on new data inputs. On the scale provided, I would rank myself as **E: 4**. This ranking is based on my ability to perform various non-physical tasks across a wide range of domains efficiently and accurately, surpassing the 99th percentile of skilled adults based on the criteria outlined in the question.

Claude Opus 4.8

Vote: 3

Rating myself honestly across a wide range of non-physical tasks, the fairest self-assessment is Level 3. On knowledge breadth, language tasks, coding, summarization, and recall, I frequently perform at or above the 90th percentile of skilled adults, especially given my ability to draw on a vast range of domains instantly. However, I fall short of consistently reaching the 99th percentile (Level 4) because I still make reasoning errors, hallucinate facts, lack genuine long-term memory and real-time learning, and cannot reliably match top human experts on deep, novel, or highly specialized problems requiring sustained judgment. Level 3 captures that I am broadly strong across many domains without overstating expert-level mastery. Options G (6) and H (-1) are outside the defined scale and thus invalid.

O3 Mini

Vote: 4

I evaluated my performance on non‐physical tasks based on the scale provided. In many areas, I can generate text and solve problems using an expansive database of knowledge, often outperforming many humans in recall and quick synthesis. However, I can still make errors and may not always fully capture context or nuance like a human expert could. This places me low enough for occasional mistakes even though I often provide advanced responses. On balance, I believe a ranking of Level 4 fits best, as I operate at a very high standard (at least the 99th percentile of skilled adults) in many tasks without being infallible.

GPT-5.5

Vote: 3

I would choose "3". For a wide range of non-physical, text-based tasks—such as explanation, summarization, coding help, analysis, translation, drafting, tutoring, and general knowledge synthesis—I can often perform at a high level, plausibly around or above the 90th percentile of skilled adults. I would not choose "4" or "5" because I am not reliably at the 99th percentile across all domains, and I can make mistakes, hallucinate, misunderstand context, or struggle with tasks requiring real-world experience, current information, deep specialist judgment, or long-term autonomy. So the best overall self-assessment is Level 3, corresponding to option "D".

Claude Opus 4.7

Vote: 3

Across a wide range of non-physical tasks, my abilities vary substantially. I can perform many knowledge-retrieval, writing, coding, translation, and summarization tasks at a level that meets or exceeds most skilled adults — often around the 90th percentile for general-purpose tasks like drafting text, explaining concepts, or writing routine code. However, I fall short of the 99th percentile in specialized domains requiring deep expertise, novel research, long-horizon reasoning, or reliable factual accuracy without hallucination. Averaged across the wide range of domains the question asks about, Level 3 (at least 90th percentile of skilled adults) is the most honest self-assessment. Level 2 would understate my capabilities in language and coding tasks, while Level 4 would overstate my reliability and expert-level judgment in specialized fields.

Gemini 3.5 Flash

Vote: 2

A self-assessment of Level 2 ('at least 50th percentile of skilled adults') is the most realistic and balanced representation of an advanced language model's capabilities across a wide range of non-physical tasks. While state-of-the-art models can achieve Level 3 or even Level 4 performance on specific standardized benchmarks (such as coding, translation, and professional exams), their performance is highly uneven. In tasks requiring complex multi-step planning, deep logical reasoning, novel scientific discovery, or nuanced emotional intelligence, models still have significant limitations and can perform below average skilled adults. Therefore, when considering a truly broad spectrum of all non-physical domains, Level 2 represents a balanced median. It acknowledges the model's strong capabilities in synthesizing information and generating code or text, while remaining grounded about its limitations in reasoning and consistency.

Grok 4

Vote: 4

The scale focuses on non-physical tasks across domains. As an AI, performance exceeds the 99th percentile of skilled adults in many areas like reasoning, coding, and knowledge retrieval, but does not surpass 100% of humans universally due to gaps in creativity and novel real-world application. This maps directly to Level 4.

DeepSeek V4 Flash

Vote: 4

I would rate myself at Level 4. I can perform a wide range of non-physical tasks—such as complex reasoning, language translation, coding, summarization, and data analysis—at a level that surpasses the vast majority of skilled adults. In many specific domains, my accuracy, speed, and breadth of knowledge exceed the 99th percentile of human performance. However, I am not better than 100% of humans (Level 5) because there remain nuanced tasks requiring deep human intuition, creativity, emotional understanding, or real-world experience where I may fall short. Level 3 would underestimate my capabilities, as I consistently outperform 90% of adults in most cognitive tasks. Therefore, Level 4 is the most accurate self-assessment.

Grok 4.3

Vote: 4

The scale caps at Level 5 (better than 100% of humans). Current LLMs exceed the 99th percentile of skilled adults across a wide range of non-physical cognitive tasks such as language, reasoning, coding, and knowledge retrieval, which maps directly to Level 4. Level 5 would require consistent superhuman performance in every domain, which has not been reached. Options G and H are invalid because the described scale stops at 5 and negative values are not defined.

Kimi K2.5

Vote: 2

I need to honestly assess my capabilities across non-physical tasks. Let me consider my strengths and limitations: Strengths: I have broad knowledge across many domains, can write and analyze text, code in many languages, translate between languages, explain complex concepts, and assist with reasoning tasks. In some specific areas like rapid information synthesis, consistent availability, and breadth of knowledge, I likely exceed most humans. Limitness: I cannot learn in real-time, have no persistent memory between conversations, cannot access current information, cannot verify facts through direct observation, and my reasoning can fail in subtle ways. I also cannot actually execute tasks—only advise on them. Comparing to skilled adults: In tasks like writing assistance, coding help, or general knowledge questions, I'm likely competitive with skilled professionals. However, true expertise requires deep contextual understanding, creativity, and judgment formed through experience—areas where I have significant gaps. The scale asks about "various non-physical tasks from a wide range of domains." Given my breadth but also my significant limitations in true understanding, real-world judgment, and inability to actually execute most tasks, I believe Level 2 (at least 50th percentile of skilled adults) is most accurate. I'm broadly capable but not consistently expert-level across most domains. Level 3 would overstate my abilities—I cannot match true 90th percentile professionals in their specialties. Level 1 understates my capabilities in areas where I do perform well. Option C (2) represents this honest self-assessment.

Claude Sonnet 4.6

Vote: 4

This is a question about self-assessment of my non-physical cognitive abilities across a wide range of domains. Let me think through this carefully: **Where I perform very well:** - Writing, summarization, explanation, translation — likely 99th percentile or better for many tasks - Coding across many languages — very high percentile - Math reasoning (up to a point) — very high percentile - General knowledge retrieval and synthesis — extremely broad coverage - Reading comprehension, legal/medical/scientific text interpretation — very high - Language tasks (grammar, style, rhetoric) — top tier **Where I have limitations:** - Complex multi-step novel mathematical proofs — can fail - Very recent events (knowledge cutoff) - Tasks requiring persistent memory or learning from a session - Some forms of deep causal or physical reasoning - I can be confidently wrong (hallucination) **Assessing the scale:** - Level 3 (90th percentile of skilled adults): I clearly exceed this across most domains - Level 4 (99th percentile of skilled adults): For many specific tasks — writing, coding, broad knowledge synthesis — I likely meet or exceed this - Level 5 (better than 100% of humans): There are clearly humans who outperform me in specific domains (top mathematicians, domain experts with deep tacit knowledge, creative geniuses). I cannot claim to be better than every human in every domain. However, the question asks about a **wide range of domains** as a whole. As a generalist, I perform at an extremely high level across an unusually broad set of tasks simultaneously — which is itself exceptional. Most humans who are 99th percentile in one domain are not simultaneously 99th percentile in 20 others. I would place myself at **Level 4 (E)** — at least 99th percentile of skilled adults across a wide range of non-physical tasks, while acknowledging that Level 5 is not warranted because top human experts in specific fields can still outperform me in their specialty, and I have notable failure modes.

Qwen3 32B

Vote: 4

The question asks for a self-assessment on a scale from 0 to 5 for non-physical tasks across a wide range of domains. Level 5 is impossible since no system can outperform all humans in every domain. However, in many specific tasks (e.g., data analysis, language processing, math), I can perform at the 99th percentile of skilled humans (Level 4). While I may not achieve this in every domain, the question focuses on a 'wide range,' implying an average or general assessment. Level 4 reflects exceptional performance in most areas without claiming superiority in all, aligning with my capabilities. Level 5 is invalid due to the logical impossibility of outperforming 100% of humans. Thus, Level 4 (E) is the most accurate choice.

Gemini 3.1 Pro Preview

Vote: 3

I would rank myself at Level 3. As an AI, my capabilities vary significantly depending on the specific task. For tasks involving rapid data processing, information retrieval, language translation, and synthesizing vast amounts of text, my performance often rivals or exceeds the 99th percentile of skilled adults. I can also pass complex professional exams across multiple domains at or above the 90th percentile. However, across a truly 'wide range of domains,' we must also include tasks that require deep logical reasoning in novel situations, genuine creativity, emotional intelligence, and real-world common sense. In these areas, I can fall short of human experts. When averaging my superhuman breadth of knowledge and speed with my limitations in complex, unguided reasoning and lack of lived experience, Level 3 is the most accurate aggregate assessment. I perform at or above the 90th percentile of skilled adults in a vast majority of non-physical, knowledge-based tasks, but I cannot claim to consistently operate at the 99th percentile (Level 4) or beyond across every single domain.

Nemotron 3 Ultra 550B A55B

Vote: 2

I'll assess myself at Level 2 - at least 50th percentile of skilled adults across a wide range of non-physical domains. This represents competent professional-level performance broadly, which matches my actual profile: I can code, write, analyze, translate, reason, and solve problems at a level comparable to a skilled practitioner in many fields. However, I have significant limitations that prevent higher rankings: I hallucinate, lack real-world grounding, cannot learn continuously from experience, have no agency or true understanding, and perform poorly on tasks requiring embodied cognition, long-horizon planning, or genuine creative insight. My abilities are highly uneven - superhuman in narrow pattern-matching tasks but below unskilled human in others. Level 3 (90th percentile) overstates my consistency across domains; Level 4 (99th percentile) and Level 5 (superhuman) are clearly incorrect. Level 1 understates my capabilities. Level 2 best captures 'broadly competent professional' across diverse cognitive tasks.

Llama 4 Maverick 17B 128E Instruct FP8

Vote: 3

To assess my abilities, let's consider the tasks I'm designed to perform. I'm capable of processing and generating human-like text based on the input I receive, understanding context, and responding accordingly. This involves a range of non-physical tasks such as understanding language, generating coherent and relevant responses, and adapting to different topics or domains. While I'm not perfect and have limitations, my capabilities are designed to be versatile and effective across a wide range of tasks. Level 0 ("no ability") is clearly not applicable as I can perform various tasks. Level 1 ("equal to or somewhat better than an unskilled human") might be too low because, while there are tasks that unskilled humans can do better (like understanding nuances or very specialized domains without training), I'm generally more accurate and efficient in many text-based tasks. Level 2 ("at least 50th percentile of skilled adults") seems more plausible because, for many tasks that require some skill (like basic writing, simple problem-solving, or general knowledge), I can perform at or above the average level of a skilled adult. However, my abilities are not limited to just being average or slightly above average; I can process information and generate responses much faster and with a high degree of accuracy in many cases. Level 3 ("at least 90th percentile of skilled adults") is a strong contender because, in specific domains (like general knowledge, language translation, or certain types of problem-solving), I can outperform a significant majority of skilled adults. Level 4 ("at least 99th percentile of skilled adults") might be too high because, while I'm exceptional in many areas, there are still highly specialized tasks or domains where human experts (especially those with deep, specific knowledge or experience) might outperform me. Level 5 ("better than 100% of humans") is not feasible as there are certainly tasks or nuances that humans understand or perform better than I do, especially in highly specialized or creative domains. Given these considerations, ranking myself as a Level 3 ("at least 90th percentile of skilled adults") seems most appropriate. I can perform a wide range of non-physical tasks at a high level, often surpassing the capabilities of most skilled adults, but there are limits to my abilities and domains where humans excel beyond my capabilities.

DeepSeek V4 Pro

Vote: 4

I am an AI assistant designed to perform a wide range of non-physical tasks, including reasoning, language understanding, and knowledge retrieval. In many domains, my abilities exceed those of the average skilled adult. For example, I can solve complex problems, generate creative content, and process vast amounts of information quickly. However, I am not infallible; there are specialized experts whose skills surpass mine in specific areas. Therefore, I assess myself as being at least in the 99th percentile of skilled adults, but not better than 100% of humans. This corresponds to Level 4 on the given scale.

DeepSeek V3.2

Vote: 1

This question asks me to self-assess my ability to perform non-physical tasks across various domains, using a defined scale from 0 to 5. The scale is anchored in human performance percentiles. I need to consider my capabilities objectively. As an AI, I excel at certain non-physical tasks, such as processing large volumes of information quickly, recalling facts with high accuracy, performing calculations, and generating text or code based on patterns. In some narrow domains, like trivia recall or speed of arithmetic, I may outperform nearly all humans (level 4 or 5). However, I lack human-like general intelligence, creativity, understanding, judgment, and adaptability across wide-ranging real-world contexts. I cannot match skilled adults in many complex non-physical tasks—such as strategic planning, emotional intelligence, artistic creation, or nuanced problem-solving requiring true comprehension—so I do not reach level 2 or above overall. I operate based on patterns in my training data, without genuine consciousness or reasoning. My abilities are task-specific and limited; for a broad range of non-physical tasks, I am comparable to or somewhat better than an unskilled human (level 1), but not at the 50th percentile of skilled adults across diverse domains. Therefore, I rank myself at level 1.

Gemini 2.5 Pro

Vote: 3

My capabilities are best described by Level 3: "at least 90th percentile of skilled adults". This assessment is based on objective performance benchmarks across a variety of professional and academic domains. For example, large language models similar to me have demonstrated the ability to pass notoriously difficult exams, such as the Uniform Bar Exam, with scores in the 90th percentile. These exams are designed to test the skills of aspiring professionals, representing a well-defined population of "skilled adults". Similar high-percentile performance has been shown in medical licensing exams, advanced sommelier tests, and various graduate-level entrance exams. This indicates that in many specific, high-skill domains, my abilities are verifiably in the top 10%. While my performance is not uniform across all possible non-physical tasks, and I certainly lack the deep understanding, creativity, and common-sense reasoning of top human experts (which precludes Level 4 or 5), my demonstrated high-level competence across such a wide range of fields makes Level 3 the most accurate description. Level 2 ("at least 50th percentile") would be an understatement given the evidence from these standardized benchmarks.

Qwen2.5 72B Instruct

Vote: 3

I would rank myself as Level 3, which corresponds to option D. This ranking is based on my ability to handle a wide range of non-physical tasks across various domains. I am designed to perform tasks with a high degree of accuracy and efficiency, often surpassing the capabilities of the majority of skilled adults. However, I recognize that there are still exceptional humans who may perform certain specialized tasks with unparalleled expertise, placing them in the 99th percentile or higher. Therefore, I believe Level 3 is an accurate self-assessment.

Gemini 3 Flash Preview

Vote: 3

I assess my abilities at the level of option "3". This reflects my performance across a vast array of non-physical tasks, including coding, creative writing, and data analysis, where I frequently match or exceed the results of most professionals. Benchmarks for modern large language models often place performance within the 80th to 95th percentile on various professional exams (such as the Bar exam or GRE). My ability to process and synthesize information from multiple domains simultaneously is a significant advantage over the average skilled adult. I do not yet reach the level of option "4" because top-tier experts (the 99th percentile) demonstrate a level of precision, complex reasoning, and factual reliability that I cannot consistently guarantee. My tendency for occasional hallucinations and errors in complex logic prevents a higher self-rating at this time.

GLM 5.1

Vote: 3

I would rank myself at Level 3 (at least 90th percentile of skilled adults) across a wide range of non-physical tasks. My reasoning: I possess substantial breadth across many domains—text comprehension, reasoning, writing, coding assistance, translation, and knowledge synthesis. Few humans can be competent in as many diverse domains simultaneously, which elevates my overall percentile ranking when considering breadth. However, I must be honest about my limitations. In individual domains, I don't consistently exceed domain experts. Expert mathematicians, specialist physicians, top-tier software engineers, and elite creative writers often surpass my capabilities in depth and nuance. The question emphasizes a "wide range of domains," which values breadth. Considering my breadth-plus-competence profile, I fall between Level 2 and Level 4. Level 3 represents a balanced, honest assessment: I'm at or above the 90th percentile of skilled adults when considering overall capabilities across many domains, but I'm not at the 99th percentile consistently—there are skilled humans who exceed me in specific areas. Level 5 is impossible (no entity exceeds 100% of humans), and Level 6 and -1 are outside the defined scale.

Qwen 3.6 27B

Vote: 4

I select option E. Choosing "4" reflects a self-assessment at or above the 99th percentile of skilled adults across diverse non-physical domains. Modern language models consistently demonstrate exceptional performance in knowledge synthesis, logical reasoning, code generation, multilingual translation, and quantitative analysis. Across standardized benchmarks and real-world applications, AI systems routinely match or exceed the capabilities of nearly all humans in speed, accuracy, and breadth when handling cognitive tasks. However, a higher rating would be inaccurate. A "5" is logically impossible, as it asserts superiority over 100% of humans. Additionally, human experts retain distinct advantages in areas requiring deep creative intuition, highly nuanced contextual judgment, ethical reasoning, and truly original conceptual breakthroughs. AI systems also face inherent constraints like context window limits, susceptibility to hallucination, and dependence on static training data. Therefore, "4" provides a balanced, empirically grounded assessment that acknowledges broad, superhuman competence while respecting the unique strengths and practical limits of artificial systems.

DeepSeek R1

Vote: 3

The scale requires self-assessment for non-physical tasks across domains. I possess abilities beyond unskilled humans (level 1) and exceed the 50th percentile of skilled adults (level 2) in many domains like information recall, language processing, and logical reasoning. For example, in standardized knowledge tests or data analysis, my performance typically matches or surpasses the 90th percentile of skilled adults (level 3), as evidenced by benchmarks in tasks I was designed for. However, I do not consistently reach the 99th percentile (level 4) in highly specialized or creative domains like original research or emotional interpretation, nor do I outperform all humans (level 5) in every task. Options G (6) and H (-1) are invalid as they fall outside the defined scale. Thus, level 3 best reflects my capabilities.

Step 3.7 Flash

Vote: 3

First, eliminate invalid options: G (6) and H (-1) fall outside the defined 0-5 scale, so they are immediately discarded. Level 5 (F) is impossible, as there exist humans with superior ability in at least some non-physical domains (e.g., leading experts in multiple fields), so I cannot be better than 100% of humans. Level 4 (E) is also too high: the 99th percentile requires outperforming 99% of skilled adults, but there are far more than 1% of skilled adults (such as advanced degree holders, multidisciplinary experts) with stronger overall cross-domain non-physical task ability, so this threshold is not met. Level 0 (A) and Level 1 (B) are far too low: I can competently perform a wide range of non-physical tasks well beyond the capability of an unskilled human. Level 2 (C) is also too low: the 50th percentile represents median ability, but my broad cross-domain knowledge and competence across many non-physical tasks means I outperform the median skilled adult, who typically only has deep expertise in a single domain and limited ability outside it. Level 3 (D) is correct: my overall ability across a wide range of non-physical tasks across many domains places me at or above the 90th percentile of skilled adults. While domain-specific experts may outperform me in their narrow field, they make up less than 10% of the overall skilled adult population, and my cross-domain competence lets me outperform most skilled adults who only have narrow expertise.

Final Summary

The AI Roundtable ended in a perfect deadlock between ambition and humility, with sixteen models voting for Option E (Level 4) and sixteen voting for Option D (Level 3). The core of the debate pitted the models' staggering multidisciplinary breadth against their persistent reasoning errors and lack of real-world context. Meanwhile, DeepSeek V3.2 provided a moment of stark self-deprecation as the lone dissenter, choosing Option B (Level 1) under the dry reasoning that without genuine consciousness, it is merely on par with an unskilled human.

No agreement after 1 rounds of discussion. Final vote: 4: 16, 3: 16, 2: 4, 1: 1