Commit graph

6 commits

Author SHA1 Message Date
Alexandre Gomes Gaigalas
7c8ecfa317 Fix PublicSuffix validator and UpdateDomainSuffixesCommand
- Parce PSL ICANN section into structured sections (rules,
   wildcards, exceptions) according to the format.
 - Updates PublicSuffix semantics for complete application of
   the rules.
 - Includes private domain suffixes now.
 - Refreshes the existing data.
 - Fixes the update-regionals.yml workflow, set it to run
   twice a week.

References: https://github.com/publicsuffix/list/wiki/Format#format
2026-02-23 12:18:57 +00:00
copilot-swe-agent[bot]
eedce8fb32 Use Punycode filenames for non-ASCII TLD suffix data files
Some systems and tools (e.g., certain archive extractors, Windows
environments, or CI pipelines) do not properly handle non-ASCII
characters in file paths. The public suffix data files for
internationalized TLDs (such as ישראל, СРБ, 香港, and ไทย) were stored
using their native Unicode names, which caused installation failures
on those systems.

This commit converts those filenames to their Punycode equivalents
(e.g., XN--4DBRK0CE.php instead of ישראל.php) using `idn_to_ascii()`.
Both the data generation command (`UpdateDomainSuffixesCommand`) and the
runtime validator (`PublicDomainSuffix`) are updated to use the same
Punycode-based file lookup, ensuring consistency. A polyfill dependency
(`symfony/polyfill-intl-idn`) is added so that `idn_to_ascii()` is
available even when the `intl` PHP extension is not installed.

Assisted-by: Claude Code (Claude Opus 4.6)
Co-authored-by: Henrique Moody <henriquemoody@gmail.com>
2026-02-09 17:34:56 +01:00
Henrique Moody
7c681fec66
Fix SPDX headers in all files
I ran the `bin/console spdx --fix` with different strategies for
different files. For most of the core classes, since they've been
drastically rebuilt, I've run it with the `git-blame` strategy, for for
the `src/Validators`, in which the API changed completely but the logic
remains the same, I use the `git-log` strategy.
2026-02-03 15:23:23 +01:00
Henrique Moody
4390e4feb6
Simplify how we load and save files in data/
We had different ways of saving and loading files from `data/`, so I decided to
unify them to simplify things. I repurposed the `DomainInfo` class and named it
`DataLoader`, so we can use the same class to load anything from the `data/`
directory.
2026-01-26 20:28:29 +01:00
Alexandre Gomes Gaigalas
d9cdc118b2 Introduce REUSE compliance
This commit introduces REUSE compliance by annotating all files
with SPDX information and placing the reused licences in the
LICENSES folder.

We additionally removed the docheader tool which is made obsolete
by this change.

The main LICENSE and copyright text of the project is now not under
my personal name anymore, and it belongs to "The Respect Project
Contributors" instead.

This change restores author names to several files, giving the
appropriate attribution for contributions.
2026-01-21 06:28:11 +00:00
Henrique Moody
7892a7c902
Port Bash scripts to PHP
It makes more sense to use PHP to generate PHP code than to use Bash. I
love writing Bash scripts, but I know it's not for everyone, and they
can become quite complex. Porting them to PHP code also lowers the
barrier for people to change them.

While I was making those changes, I also noticed a problem with how we
save the domain suffixes. We're converting all of them to ASCII, so we
are not preserving languages such as Chinese, Thai, and Hebrew, which
use non-ASCII characters.
2026-01-06 10:06:22 +01:00