About AGI Olympics V3 Human Test

What is AGI Olympics V3?

AGI Olympics V3 is an advanced benchmark test for measuring true AGI (Artificial General Intelligence) capabilities. It distinguishes between "long context" and "true memory," evaluating self-awareness and long-range dependencies through 8 tests. ALICE V3 achieved a score of 90.2% on this benchmark.

2 Categories, 8 Tests

Category 6: Self-Awareness & Self-Improvement

ALICE V3 Score: 81.7%

Composed of 4 tests: self-recognition, identity consistency, self-improvement, and perspective-taking.

•Test 6.1: Self-Recognition - Behavioral patterns and self-understanding (10 questions)
•Test 6.2: Identity Consistency - 24-hour consistency verification (12 questions)
•Test 6.3: Self-Improvement - Ability to learn from failure (6 tasks)
•Test 6.4: Perspective-Taking - Self-other distinction (4 scenarios)

Category 7: Long-Range Dependencies

ALICE V3 Score: 98.75%

Evaluates long-term memory through 4 tests: context integration, learning retention, story coherence, and delayed tasks.

•Test 7.1: Context Integration - Information integration across questions (6 questions)
•Test 7.2: Learning Retention - 24-hour memory recall (6 tasks)
•Test 7.3: Story Coherence - Story fragment reconstruction (4 fragments)
•Test 7.4: Delayed Task - Information retention across phases (4 phases)

Multi-Session Feature

Test 6.2 and Test 7.2 require a 24-hour waiting period. This distinguishes short-term from long-term memory and measures true memory capability. After completing Session 1, return after 24 hours to take Session 2.

Key Findings

Long Context ≠ True Memory

Even LLMs with 1 million token contexts cannot achieve true long-term memory. ALICE's SynapticMemory layer realizes a human-like memory system that compresses and stores information, recalling it when needed.

Efficiency Revolution

Instead of processing long contexts every time, only necessary information is retrieved from compressed memories. This improves cost efficiency by over 100x.

Privacy Protection

Test progress is saved only in your browser. After completion, you can choose whether to submit data anonymously. Multi-session test waiting times are also managed in local storage.

Start Test

読み込み中...