How to implement CRC-32 in Python

I have already introduce to you about the basics of CRC and CRC-32 here and here. If you don’t know about CRC, please take a look at those guides.

There are some libraries to calculate CRC-32 in Python, 2 of them are: zlib and binascii. Here’re the sample codes:

Using zlib

For applications that require data compression, zlib module allow compression and decompression.

zlib has crc32 method to computes a CRC (Cyclic Redundancy Check) checksum of data. If value is present, it is used as the starting value of the checksum; otherwise, a fixed default value is used.

The algorithm is not cryptographically strong, and should not be used for authentication or digital signatures. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm.

import zlib 
zlib.crc32(str.encode('hello from loitools.com'))

Is CRC-32 a hash function?

If you’ve read my previous post about CRC-32, you may have a question: Is CRC-32 a hash function?

CRC32 works very well as a hash algorithm. The whole point of a CRC is to hash a stream of bytes with as few collisions as possible. That said, there are a few points to consider:

  • CRC’s are not secure. For secure hashing you need a much more computationally expensive algorithm. For a simple bucket hasher, security is usually a non-issue.
  • Different CRC flavors exist with different properties. Make sure you use the right algorithm, e.g. with hash polynomial 0x11EDC6F41 (CRC32C) which is the optimal general purpose choice.
  • As a hashing speed/quality trade-off, the x86 CRC32 instruction is tough to beat. However, this instruction doesn’t exist in older CPU’s so beware of portability problems.

So the answer is: No, CRC-32 is not a hash function or at least, it didn’t made for that purpose.

What is CRC-32?

A cyclic redundancy check (CRC) is an error-detecting code designed to detect accidental changes to raw computer data, and is commonly used in digital networks.

A CRC32 algorithm typically takes in a file stream or character array and calculates an unsigned long codeword from the input.

One can transmit this codeword and re-calculate it on the receiver end, then compare it to the transmitted one to detect an error.

Types of CRC

The most commonly used polynomial lengths are:

  • CRC-8: 9 bits
  • CRC-16: 17 bits
  • CRC-32: 33 bits
  • CRC-64: 65 bits

Data length and CRC Length

The math is pretty simple. An 8-bit CRC boils all messages down to one of 256 values. If your message is more than a few bytes long, the possibility of multiple messages having the same hash value goes up higher and higher.

A 16-bit CRC, similarly, gives you one of the 65,536 available hash values. What are the odds of any two messages having one of these values?

A 32-bit CRC gives you about 4 billion available hash values.

How to embed Base64 images into HTML

You can easy use this sample code to embed Base64 images into your HTML code.

<div>
  <p>An example of loitools.com</p>
  <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUA
    AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
        9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
</div>

So basically, you can use this syntax:

 data:image/jpeg;charset=utf-8;base64, 

Arcording to this document.

How useful Base64 encoding is?

You may knew some basics about Base64 as I explained last post. But where do we use Base64?

How Base64 encoder works?

A Base64 encoder is, basically, a way of encoding arbitrary binary data in ASCII text. It takes 4 characters per 3 bytes of data, plus potentially a bit of padding at the end.

Essentially each 6 bits of the input is encoded in a 64-character alphabet. The “standard” alphabet uses A-Z, a-z, 0-9 and + and /, with = as a padding character. There are URL-safe variants.

Why Base64?

When you have some binary data that you want to ship across a network, you generally don’t do it by just streaming the bits and bytes over the wire in a raw format. Why? Because some media are made for streaming text. You never know – some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you’ve entered a special character combination (like how FTP translates line endings).

So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings.

Base64 applications

Base64 can be used in a variety of contexts:

  • Evolution and Thunderbird use Base64 to obfuscate e-mail passwords[1]
  • Base64 can be used to transmit and store text that might otherwise cause delimiter collision
  • Base64 is often used as a quick but insecure shortcut to obscure secrets without incurring the overhead of cryptographic key management
  • Spammers use Base64 to evade basic anti-spamming tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
  • Base64 is used to encode character strings in LDIF files
  • Base64 is sometimes used to embed binary data in an XML file, using a syntax similar to …… e.g. Firefox’s bookmarks.html.
  • Base64 is also used when communicating with government Fiscal Signature printing devices (usually, over serial or parallel ports) to minimize the delay when transferring receipt characters for signing.
  • Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
  • Can be used to embed raw image data into a CSS property such as background-image.

What is Base64?

Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation.

Base64 Examples

This is example of Base64 Encoder at loitools.com:

loitools.com-base64-encoder-examples
loitools.com-base64-encoder-examples

Now is an example of Base64 decoder:

loitools.com-base64-decoder-examples
loitools.com-base64-decoder-examples

You can see the Encoder/Decoder work exactly.

What is Base64 Encoder?

Base64 Encoder is a process of converting binary data to an ASCII string format by converting that binary data into a 6-bit character representation.

Base64 Encoding Table

In order to understand how Base64 works, the first important thing you need to deal with is the Base64 Table.

Take a look at that table here.

Hash functions introduction

What is a hash function?

A hash function takes an arbitrary string as input then it produces an output that is fixed in length called Message Digest or Hash Value or just Hash.

So, SHA-256 is a hash function that always produces 256 bit long output.

What is SHA?

SHA stands for “Secure Hash Algorithm” is an algorithm to produce hash output from an input.

Its enhanced version is called SHA-1. The algorithm offers five separate hash functions which were created by National Security Agency (NSA) and were issued by the National Institute of Standards and Technology (NIST).

MD4, MD5, SHA-256, SHA-512

An n-bit hash is a map from arbitrary length messages to n-bit hash values.

Current popular hashes produce hash values of length n = 128 (MD4 and MD5) and n = 160 (SHA-1)

An n- bit cryptographic hash is an n-bit hash which is one-way and col lision-resistant. Therefore above hashes can provide no more than 64 or 80 bits of security, respectively, against collision attacks.

SHA-256 is a 256-bit hash and is meant to provide 128 bits of security against collision attacks.

SHA-512 is a 512-bit hash, and is meant to provide 256 bits of security against collision attacks.

What is SHA-1?

In the previous post, I told you about what is MD5? How MD5 Decrypters work? and What is a checksum.

Today, we’ll understand SHA-1 encryption!

SHA-1 (short for Secure Hash Algorithm 1) is one of several cryptographic hash functions.

SHA-1 is most often used to verify that a file has been unaltered. This is done by producing a checksum before the file has been transmitted, and then again once it reaches its destination.

Vulnerabilities of the SHA Hash Function

SHA-1 is only one of the four algorithms in the Secure Hash Algorithm (SHA) family. Most were developed by the US National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST).

  • SHA-0 has a 160-bit message digest (hash value) size and was the first version of this algorithm.
  • SHA-1 is the second iteration of this cryptographic hash function. SHA-1 also has a message digest of 160 bits
  • SHA-2 is stronger than SHA-1 and attacks made against SHA-2 are unlikely to happen with current computing power.

What is a checksum?

A checksum is the outcome of running an algorithm, called a cryptographic hash function, on a piece of data, usually a single file.

So you have a single file, giving into an hash generator you will have your checksum.

By comparing the checksum that you generate from your version of the file, with the one provided by the source of the file, helps ensure that your copy of the file is genuine and error free.

That’s the way we prevent hackers to inject some code into downloaded files.

A Checksum example

First of all, take a look at the picture below

loitools.com-md5-hash-generator-mobile
loitools.com-md5-hash-generator-mobile

You can see, we have a string with the text “Abc”.

Now take a look at this MD5 Hash:

loitools.com-md5-hash-generator-desktop
loitools.com-md5-hash-generator-desktop

You can see the difference between Abc and abc hash string.

How MD5 Decrypter works?

As I has already introduced about MD5 that explained about what MD5 is? its history with pros and cons, then you can see that MD5 is non-reversible. So why you still see MD5 Decrypter on the internet?

How MD5 decrypter works?

They (MD5 decrypters) are not really doing any reversal with your given MD5 String.

They actually register all kind of string that users have entered and the MD5 output into a database, then, when user take a look a MD5 hash decrypter, they look up in the database to see if there are any recored with that MD5 hash.

So it should be called MD5 Lookup.

loitools.com-md5-hash-generator-desktop
loitools.com-md5-hash-generator-desktop

Then with those explanation, you can see that all kind of MD5 Decrypters only work for some common phrases or you are betting on your luck.