Understanding the Google Chrome Extension Architecture

As we have gone over in the previous post that introduced Google Chrome Extensions to us, most of the execution environment where our code will be executing is separated from the JavaScript inside the regular web pages and is sandboxed separately. In this post, we are describing the google chrome extension architecture.

Within this separated environment, we can organize the extension as having three big parts. They are important because of the local architecture. Specifically, the most important thing when it comes to organizing how our software solution will do its work: communication through various interfaces:

Internal communication
User input and browser API
External communication

We will go over each and break down their architecture. Explaining what is special about their executing environments and its scope. But first, the only thing that is mandatory to have inside the Chrome Extension files is:

The manifest.json file

The purpose of the manifest.json configuration file is to provide to the browser all the needed information about the extension. Anything from its name, to the many complex configuration setups one might need in order for their extension to work and be efficient in very different websites and pages.

Here is an example of a simple manifest.json configuration:

{
  "name": "Simple Extension",
  "description": " Simple Extension description",
  "version": "1.0",
  "manifest_version": 3,
  "icons": {
    "16": "icon_16.png",
    "32": "icon_32.png",
    "48": "icon_48.png",
    "128": "icon_128.png"
  },
  "background": {
    "service_worker": "background.js"
  },
  "permissions": ["activeTab"],
  "host_permissions": ["*://*.site.com/*"],
  "action": {
    "default_icon": "icon_16.png",
    "default_popup": "popup.html"
  }
}

We will go over the specifics later when we will create a simple extension ourselves. For now, let’s see the other parts of the architecture.

Internal communication of the extension

This is the main communication part and it is generally based upon which environment is the best suited for our specific problem. Specifically, which function do we need to use to achieve our goal.

*Google Chrome Extension Architecture* – Credit: https://developer.chrome.com/docs/extensions/mv3/architecture-overview/

Here we have 3 different environments and they are:

The content scripts:

Basic JavaScript files that are loaded on top of but NOT directly into a web page.

An environment that is enough for most basic extensions to work.

Within this environment, you can read and modify most of the things you would want on a web page. Mostly including the HTML elements through the manipulation of the DOM (Document Object Model).

The reason why this is a very close environment that is only on top of and not merged with the webpage’s JavaScript environment is the reason why you can’t, for example, access the global JavaScript variables of the page directly from inside the content script itself.

Usually, the way around this would be injecting the code inside the page by manipulating the DOM as mentioned above and adding a script element to it through the “document.createElement(‘script’);” function.

The background script, the “back-end” of our extension:

Within the background.js we have the highest level of access to most of the browser APIs which is why usually this is the main processing part of the more complex extensions.

This environment runs in the background of the browser process, usually on demand of a content script, without any web page or even access to any DOM to accompany it. After it finishes its job it terminates. You can access its executing CLI on “chrome://extensions/” if the extension has it.

Most of the time the background.js is used to execute higher-level operations that content scripts aren’t allowed to or are not specifically efficient at, one such example would be common external fetch requests explained below.

Content scripts are, for security reasons, locked to the same CORS policy (Cross-Origin Resource Sharing) as the page they are on. So in this case we would need to send a message from the content script into the background.js, run the fetch, and then return the required data back to the content script.

The popup page:

As you probably have seen before, these are the HTML pages that are rendered within a popup window after the icon of the extension is clicked.

They are not required and the extension icon click can be bound to a function within the background.js that will execute some required code without any additional visual UI.

They are limited in size and are used mostly as simple UI help for the user. Although there are other ways most extensions usually go about showcasing information to the end-users. Such as through desktop notifications and in-page custom HTML elements.

They have just enough tools to do what they need for a basic HTML page to function. But will probably need to be in communication with the background.js through the main internal communication Chrome extension API, message passing.

User input and browser API:

The extension can detect and respond to a lot of different behaviors of the user. Such as navigation, tab/window/focus switching, muting and updating, zoom changing, and so on.

Most of these are done by attaching Listeners to the browser through Chrome API and responding as required.

Although just as there are a lot of things we can automate there are also some that we can’t, and shouldn’t be able to for security reasons. Such as: clicking or opening the pop-ups of any other extensions, changing the browsers Omnibox, programmatically accepting additional site requests for things like desktop notifications and more.

External communication

Here we can communicate with other extensions if we properly set up a communication channel in both of them. But most of the data coming and going from an extension is going to come from the internet.

That means sending and receiving data to regular API endpoints, for example through fetch, and processing it accordingly.

The main unique thing we have to be aware of here is the CORS policy (Cross-Origin Resource Sharing) and the CSP (Content Security Policy).

Within Chrome extensions, you must have most, if not all, of your required resources already packaged inside your extension.

If you want to get external images, stylesheets, or fonts you must use cross-origin XMLHttpRequests to fetch them and then serve them via “blob:” URLs. For this, to work you also have to add that fetch domain to your “permissions:” configuration option inside the manifest.json.

This explains the main parts of a browser extension. Although there are, of course, other useful tools and APIs that are convenient. Such as for example a dedicated options page, chrome.storage.sync, and chrome.storage.local APIs.

Now that we understand how all the parts work, let’s go on to build our own simple Chrome extension at How to create a simple Google Chrome Extension.