Penetration testing is a security testing methodology that simulates intrusion attempts against systems and networks from an attacker's perspective, in order to assess the presence of exploitable vulnerabilities and their actual impact.
Penetration testing (commonly known as pentesting) is a test in which security professionals attempt to infiltrate a system using the same methods as real-world attackers. Whereas fuzzing is an automated technique that mechanically probes input boundaries, pentesting uses human judgment as its core to comprehensively explore the question: "If I were attacking this organization, which route would I take?" Its strength lies in its ability to cross-functionally evaluate both technical and business logic dimensions—including network configuration, authentication flows, privilege escalation possibilities, and social engineering.
Pentests are classified according to the amount of prior information given to the tester.
In black-box testing, the tester attempts to infiltrate from the outside with no knowledge of the target organization's internal information whatsoever—the scenario closest to an actual external attacker. In white-box testing, source code, network diagrams, credentials, and other information are shared in advance, enabling comprehensive verification. Gray-box testing falls in between, with conditions often set such that the tester has credentials for a general user but no knowledge of the internal architecture details.
In practice, black-box testing is commonly used to assess resistance to external intrusion, while white-box testing is used when running regular code-level diagnostics within a DevSecOps pipeline.
AI red teaming is a methodology for verifying risks specific to AI systems—such as prompt injection, training data poisoning, and output bias—and differs from conventional pentesting in what it evaluates. However, the boundary between the two is becoming increasingly blurred. In web applications that incorporate LLMs, conventional SQL injection and prompt injection via LLMs coexist within the same system, meaning pentesters are now expected to have knowledge of AI security as well.
The case of Claude Mythos autonomously chaining multiple vulnerabilities in the Linux kernel to achieve privilege escalation from a general user to root demonstrated that AI can replicate in a matter of hours what pentesters conventionally take days to accomplish. CyberGym's score (Mythos 83.1%) quantitatively corroborates this capability.
Full automation of pentesting is still some way off, but a division-of-labor model—in which AI performs initial scans to rapidly identify attack surfaces while human pentesters focus on areas requiring judgment—is already entering practical use. Researchers in bug bounty programs have also begun leveraging AI tools to explore attack surfaces.



A2A (Agent-to-Agent Protocol) is a communication protocol that enables different AI agents to perform capability discovery, task delegation, and state synchronization, published by Google in April 2025.

Acceptance testing is a testing method that verifies whether developed features meet business requirements and user stories, from the perspective of the product owner and stakeholders.

AES-256 is the highest-strength encryption algorithm using a 256-bit key length within AES (Advanced Encryption Standard), a symmetric-key cryptographic scheme standardized by the National Institute of Standards and Technology (NIST).

A mechanism that controls task distribution, state management, and coordination flows among multiple AI agents.

Agent Skills are reusable instruction sets defined to enable AI agents to perform specific tasks or areas of expertise, functioning as modular units that extend the capabilities of an agent.