読み込み中...
読み込み中...
Published research results from Project A.L.I.C.E.
This study proposes and publicly releases "AGI Olympics V3," a comprehensive benchmark for measuring AGI (Artificial General Intelligence) capabilities. We designed a complete evaluation framework with 4-Tier structure (Tier 1: Self-Awareness & Self-Improvement, Tier 2: Core Capabilities, Tier 3: Consciousness, Tier 4: Long-Term Memory). This paper reports detailed evaluation results for Tier 1 and Tier 4. Validity is demonstrated through evaluation of three systems (A.L.I.C.E. V3, Gemini 2.5 Pro, Claude Sonnet 4.5). A key finding demonstrates "long context ≠ true memory." Test questions, evaluation protocols, and implementation guides are publicly available on the Extoria official website.
We present ALICE, a 28MB multi-agent system that explores the six unsolved Millennium Prize Problems through collaborative mathematical research. Unlike traditional AI approaches that rely on massive parameter scaling, ALICE demonstrates that weak individual agents (24.5% baseline accuracy) can achieve expert-level performance (82.2% accuracy) through structured collaboration—a 3.35× performance gain through synergy alone, validating our principle: "When weak AIs collaborate, expert intelligence emerges." Through three key contributions—observation-driven formalization, autonomous tool creation, and resource efficiency—we present a paradigm shift in mathematical AI.
This study evaluates the autonomous defense capabilities of the ALICE Level 4 system over 1,000 rounds in a true adversarial co-evolution environment. Using ALICE for both attacker and defender AI, we conducted empirical experiments with a complete black-box methodology. Results show the ALICE defense system achieved a 99.92% defense success rate, allowing only 7 breaches out of nearly 7,000 attacks. Additionally, it recorded zero false positives against 2,000 normal requests, demonstrating practical-level precision. While the attacker generated 438 mutation patterns, the defender's online learning outpaced the attacker's evolution, effectively neutralizing attacks within just 100 rounds. This research successfully demonstrates dynamic defense capabilities impossible with conventional static security testing, showing the deployment readiness of autonomous AI systems.
These papers evaluate A.L.I.C.E. based solely on observable behaviors and outputs without disclosing internal implementation. Architectural details remain undisclosed for ethical considerations and security reasons.