Notwithstanding the advantages, an understanding of the practical adoption and configuration patterns of DMARC is relevant. To elucidate this, a comprehensive, large-scale analysis was conducted, concentrating on the German top-level domain (.de). The investigation involved the examination of DNS records from over 11.6 million active .de domains to determine the prevalence of DMARC deployment, appraise the configured policies, and figure out the email authentication maturity level within a prominent country-code top-level domain.
The analysis of DMARC deployment status, even within a solitary top-level domain, represents a significant undertaking. Consequently, a methodology was devised emphasizing extensive data acquisition and rigorous analytical procedures. The present investigation concentrates exclusively on organizational domains registered under the `.de` top-level domain, specifically excluding subdomains from the analytical purview.
Data Acquisition
We aggregated domain names from multiple sources to build a comprehensive list. These included the Common Crawl URL index (specifically, version CC-MAIN-2024-30), various publicly available domain lists, and commercial domain datasets (using version current as of March 2025). We did not use .de zone files for this study, as they are generally not shared by DENIC.
The raw lists were processed to ensure quality and uniqueness. We performed domain validity checks using the Public Suffix List to filter out invalid entries and focused exclusively on organizational domains.
It's important to note that these aggregated lists may contain domains that have since expired or were never registered. Furthermore, we acknowledge the possibility that some sources might include honeypots or digital watermarks (unique domain names inserted to track list usage), although we have no direct knowledge of such inclusions in our specific source mix.
Data Collection
We used massdns, a high-performance stub resolver, renowned for its ability to handle massive numbers of DNS lookups concurrently.
We intentionally selected a low concurrency profile for massdns with the parameters hashmap-size 2, resolve-count 3, and retry never. This configuration was chosen to minimize the load on the public resolvers. Initial testing indicated that retries were extremely rare with this setup, making the retry never setting acceptable.
To ensure reliability and avoid overloading any single provider, we configured massdns to query a pool of 27 trusted, public DNS resolvers, including well-known services like Google Public DNS (8.8.8.8), Cloudflare (1.1.1.1), and Quad9 (9.9.9.9). The queries were distributed across multiple AWS EC2 Spot instances to manage the workload effectively.
All DNS responses (including successes, timeouts, errors like NXDOMAIN, etc.) were logged in the newline-delimited JSON (ndjson) format, including metadata like Time-To-Live (TTL) values and the specific Resource Record (RR) data within the ndjson output. The complete raw dataset amounted to approximately 20 GB.
Due to the parallel nature of data collection across multiple instances, it was possible for the same domain to be queried more than once. These duplicate DNS response entries were identified and handled during the later data processing phase.
For each domain in our list, we specifically queried for the following DNS record types:
- MX (Mail Exchanger) records for the domain itself.
- TXT (Text) records for the domain itself.
- TXT records for the corresponding _dmarc subdomain (e.g., _dmarc.example.de) to retrieve the DMARC policy.
The DNS data collection occurred over a focused period between April 17th, 2025, and April 23rd, 2025. Querying for MX, and organizational TXT records took approximately 4 days, while the specific _dmarc TXT record lookups were completed in 2 days.
Data Processing
The raw ndjson data was processed using custom Python scripts. Each line (representing a JSON object for a single DNS request and response) was parsed and analyzed. The relevant information was aggregated into a structured format suitable for analysis.
Recognizing the complexities and potential variations in DMARC record syntax, we developed a dedicated Python library specifically for this project. This library was designed for robustness against parsing errors, adhering strictly to the DMARC specification outlined in RFC 7489.
For SPF records (found within TXT records at the organizational domain level), we performed targeted analysis rather than full, recursive parsing. Our scripts focused on identifying the presence of SPF records and extracting key properties, such as the all "mechanism" (~all, -all, ?all). SPF Includes were not queried, as this would require more DNS requests.
To ensure the accuracy of our collection and processing pipeline, we performed manual validation checks. A random sample of domains was selected, and their relevant DNS records (MX, TXT, _dmarc.TXT) were queried manually using the standard dig command-line tool. The results obtained via dig were compared against the data processed by our framework to confirm consistency and correctness.
Procedural considerations were made to minimize the operational burden on DNS servers during data acquisition. The datasets procured through this methodology are derived from publicly accessible records and are themselves openly available. Moreover, the management of all collected information adhered to ethical standards and data protection practices, ensuring responsible handling throughout the process.