135 points | by todsacerdoti2 days ago
There are several free ASN MMDBs [2,3] but you can also build your own MMDB files from any Prefix->Value mapping with the mmdbwriter library [4] or a CLI tool built on top of it like mmdbctl [5].
Assuming the ASN MMDB is fully loaded in memory, it would use around 60MB.
[1] https://maxmind.github.io/MaxMind-DB/
[2] https://dev.maxmind.com/geoip/docs/databases/asn/
[3] https://ipinfo.io/products/free-ip-data-downloads
[4] https://github.com/maxmind/mmdbwriter
[5] https://github.com/ipinfo/mmdbctl
(I work for IPinfo, but there are lots of other companies offering MMDB files).
… only if you pretend IPv6 does not exist, which isn't the case in the original blogpost.
But the idea can be extended to fit entire IP (both v4 and v6) subnets in an uint64, given that the minimal IPv6 prefix size to be globally routed is /48, you could use the first byte of the uint64 to mark the address type (e.g. 0x4 for IPv4 and 0x6 for IPv6), the remaining 3 (for IPv4) or 6 (for IPv6) bytes for actual prefix, and the last byte for prefix length.
The number 24 is larger, but the network is smaller.
The "/24" you refer to is a "prefix length", i.e. a network prefix being advertised by BGP.
store the entire thing in one big array, indexed by the uint24 value of the /24
so ASN for 0.0.0.0 stored in position 0, ASN for 0.0.1.0 stored in position 1, etc
and you can drop 0.0.0.0/8, 10.0.0.0/8, 127.0.0.0/8, 224.0.0.0/4, 240.0.0.0/24 to save 35 /8s
(256-16-16-3)256256*4 bytes needed -> 55mb
mmap it if necessary
If you're still seeing `hallpass` using that much memory, let me know! We'll figure out what's going on.
Rest of the described approaches relies on data being in memory rather on disk - not sure whether it was microsd card, SSD over USB, etc. but there's in memory option for SQLite.
Having that in mind it's time to optimize storage - instead of text just store using two 64 bit integers and it was already noticed two indexes are not very helpful - just create one that has both columns included - preferable as primary key so you don't waste space and already have rest of the select data included in single lookup.
Happy to see revisited results, but still not sure if that's a best approach for your problem - but seems like obvious improvements for generic SQL approach.
Does anyone have a theory to how that 4294967296 might have made its way into the data? I'm getting 0 matches on the actual internet with:
birdc show route protocol peer4 | grep 4294967296 -c
And ASNs are certainly not >32 bit.Edit: actually, I'm not seeing it in the linked iptoasn data either?
IP-to-ASN mappings are typically built from route collectors [2,3] that peer with various networks and receive their announcements. AFAIK route collectors don't filter anything and it's easy to find bogus announcements (e.g. private ASNs) in the data.
I can't find 4294967296 from a quick glance at the latest RouteViews data but I can find other private ASNs. For example AS7594 - AS2764 - AS4294901866 for 210.10.189.0/24 seen by the route-views.perth collector.
I don't know what kind of filtering iptoasn.com is doing but at work (ipinfo.io) we do filter bogus origins, as well as a bunch of other things like RPKI/IRR-invalid routes and hyper-specific prefixes (> /24 or /48) [4].
[1] https://bgpfilterguide.nlnog.net
[2] https://www.routeviews.org/routeviews/
[3] https://www.ripe.net/analyse/internet-measurements/routing-i...
That said you're ultimately right that my upstream provider is filtering the 4294901866 value from the article as well anyways for the reasons you stated.
Originally I went to write your exact comment because it seemed the value in the article should fit at a glance and then I must have done the comparison check backwards because I started pasting the 2^32 value in the rest of my comment concluded it was actually too large when really I had just jumbled things about.
Thanks for setting my mind straight!
996520704 996520959 4294901931 Unknown AS4294901931
3523770368 3523770623 4294901931 Unknown AS4294901931
3523771136 3523771391 4294901931 Unknown AS4294901931
Better still, use the free geolite ASN MMDB with geoip2-golang[0]. Or the lower-level maxminddb-golang[1] if you only need certain fields.
For IPv6 you need more. It just barely fits if you consider that all public Internet addresses start with 001 and only up to 48 bits can be a published prefix - that's 45 bits - and the public ASNs go up to 19 bits according to your article - that's exactly 64 in total (those high ASNs are bad data from wherever you're getting it from). But it won't work once the next 100000 ASNs get assigned, so you'd better go up to 96 or 128 bits and store the ASN properly in 32 bits.
For cache efficiency you should have one table of addresses and another parallel table of ASNs. Then every access to the address table that pulls in a cache line doesn't waste half the cache line with ASNs you aren't looking for. It won't affect the total memory use, however.
A separate table would hold the name and country for each ASN.
Go has an excellent standard library, but the solutions in there rarely compete with others writing a dedicated library to solve a hard problem they really had to solve.
But if that is used for individual IPs, without worrying about blocks they belong to, probably won't get big gains in that area.
[0] https://news.ycombinator.com/item?id=3015246
[1] https://github.com/openbsd/src/blob/2bd42e97200bee/sys/net/a...
ExecStartPre=/usr/bin/restic unlock
ExecStart=/usr/bin/restic backup
You just have to make sure that the type of Service used is correct, so that Systemd can track whether Restic has actually stopped running.