Once the combination of useful keywords is selected we might want to use them as economic indicators in production and display them on trendecon.org. This vignette describes the steps for adding new indicators to the production processes of “Trendecon”.
The names of production functions within the trendecon
package start with proc_
. They usually operate on the file
system which means that calling these functions will write files to the
local disk. The proc_
functions will save files within two
folders: data
and raw
, which are located under
the working directory (getwd()
). raw
folder
holds all the data downloaded from Google, plus some transformations,
such as seasonally adjusted or combined series. data
folder, on the other hand, collects the final indicators - the data
displayed on the website. If not present these folders will be created
when calling proc_keyword_init
for the first time.
Within these folders, every used geo
location will have
its own subfolder. Thus, in the end, if we used Swiss and Austrian
indicators, we would end up with the following directory structure:
│
├── data
│ ├── at
│ └── ch
│
└── raw
├── at
└── ch
In order to create a new series we need to go through several steps:
As a rule an series is composed of multiple related keywords and a geographical location. For example purposes an series “homeoffice” will be created for “ch” (Switzerland) using “headset”, “monitor”, “maus”, and “hdmi” keywords.
To include a new series, each keyword must first be initiated. Our goal is to produce long daily time series, ideally ranging from 2006. However, Google does not provide daily or weekly data for such a long time period. Hence, we need to circumvent the problem by applying a moving window of daily and weekly queries over the whole time period. So be careful, as this causes a lot of queries to Google and might result in a temporary IP address ban.
The commands below will download the data for our example series:
proc_keyword_init("headset", "CH")
proc_keyword_init("monitor", "CH")
proc_keyword_init("maus", "CH")
proc_keyword_init("hdmi", "CH")
After running the code above we will have the aggregated series at
daily, weekly and monthly frequency, stored in the raw
folder. For example, the files of the “headset” keyword will be stored
under: raw/ch/headset_d.csv
,
raw/ch/headset_w.csv
, and
raw/ch/headset_m.csv
. Analogous files will be present for
the remaining 3 keywords.
Four actions need to be performed in order to combine multiple keywords under a single series:
All the above steps are performed by a single function call:
proc_index(c("headset", "monitor", "maus", "hdmi"), "CH", "homeoffice")
This function will combine the keywords passed as the first argument
and store the resulting series under the name specified by the third
argument. In this example a new file holding the data for “homeoffice”
series will be created under data/ch/homeoffice_sa_csv
. In
addition, for each series, two intermediate preparation files will also
be created, i.e. raw/ch/headset_mwd.csv
and
raw/ch/headset_sa.csv
. Here “mwd” stores the combined
monthly/weekly/daily data and “sa” contains the values after seasonal
adjustment.
P.S. The same function proc_index()
is also used for
updating the data of an existing data series.
Last step is to include the newly created series into a daily-update
script. First - all the files prepared in steps 1-3 need to be added to
trendecon/data
repository. Then, in order to schedule daily updates, the script
proc_trendecon_ch.R
in the trendecon/trendecon
repository has to be modified. Note that the end of the file (“ch”)
specifies geographic location and so for other locations (like “de”)
there will be a separate file
(i.e. proc_trendecon_de.R
)
In order to add the series used in the example to this script 3 changes are needed:
create a variable holding the set of keywords used for the series:
kw_homeoffice <- c(
"headset",
"monitor",
"maus",
"hdmi"
)
add a function call that creates/updates this series:
proc_index(kw_homeoffice, "CH", "homeoffice")
add the name of this series to production list:
After that simply add the updated version of this scrip to trendecon/trendecon repository and all is set.
The proc_trendecon_
functions produce the final
indicators that we show on trendecon.org. They are called by
an automated process which is set up on GitHub, so we do not need to
call them manually when updating the data. The full list of active Swiss
indicators can be found within the code of
proc_trendecon_ch()
function.
To get a better understanding of how multiple keywords are combined
into a single series we need to examine the inner working of
proc_index()
function. The steps of series preparation are
outlined below. Note that the functions displayed in this section are
internal and typically are not called by the users.
In the first step, the raw data series for each keyword are updated
with the latest daily, weekly, and monthly data. For example, to update
the raw series for keyword "Rezession"
for Switzerland, the
script calls the following internal function:
proc_keyword_latest("Rezession", "CH")
This function downloads raw daily, weekly, and monthly data for the
specified pair of keyword and geo location. If the data for a particular
keyword is not yet available, proc_keyword_init()
should be
called instead.
To combine the three frequencies (monthly, weekly, and daily), we apply the following methodology: in a first step, we “bend” the daily series to the weekly values, by applying a variant of the Chow-Lin (1971) method. This preserves the movement of the daily series and ensures that weekly averages are identical to the original weekly series. We then use the same methodology to bend the series to the monthly values.
To combine the three frequencies for a given keyword, the script calls another internal function:
proc_combine_freq("Rezession", "CH")
Some keywords’ time series might display seasonal patterns. For
example it is not surprising that searches for gardening are higher in
spring. In order to make meaningful comparisons over time such seasonal
patterns, present within the data, need to be removed. To achieve this
trendecon
uses the “Prophet” procedure for estimating an
additive model where non-linear trends are fit with yearly and weekly
seasonality and the holiday effects.
To seasonally adjust a combined keyword, the script calls the following internal function:
proc_seas_adj("Rezession", "CH")
Once the raw data for each keyword has been processed we end up with
multiple time series - one for each keyword within the indicator. As an
example the main indicator uses the following keywords:
"Wirtschaftskrise"
, "Kurzarbeit "
,
"arbeitslos"
, and "Insolvenz"
. To turn several
provided keywords into a single time series the first principal
component is used.
In order to achieve this the script calls ts_prcomp()
function from the tsbox
package.
Finally the prepared index is saved to a file in the data folder:
write_keyword(prepared_data, "indicator", "CH")