View on GitHub

IRStats2

Statistical tools for EPrints

Installation Configuration API

Configuration

This section details how to configure IRStats2 and mostly relates to the file cfg/cfg.d/z_irstats2.pl.

I recommend that you edit your changes in a separate file (eg. zz_irstats2_local.pl - must be loaded AFTER z_irstats2.pl) as this will make Bazaar updates easier to apply.

Datasets/Datatypes

Since IRStats2 can handle any EPrints datasets (not just the 'access' dataset which records downloads), you can declare in the configuration which EPrints datasets to process. For each EPrints dataset configured, IRStats2 will pass on the records from the Database to each processing module. This is coupled to the Stats::Processor modules and you will see that, by default, IRStats2 processes:

Each module will provide specific datum, which is declared in the module itself. For instance, Stats::Processor::Access::Downloads provides us with the "downloads" and "views" data-types.

Configuration example and options

access => { 
	filters => [ 'Robots', 'Repeat' ], 
	incremental => 1 
}

The only two options which can be used are:

Remember that if you want to process new datasets (e.g. "user") then you must write the associated Stats::Processor modules, otherwise nothing would happen.

Sets

A Set tells IRStats2 how to group data points and it is done via an existing ("eprint") meta-field. Each value of that set (in essence, the distinct values of the field) will become a set value you can use in IRStats2 to give you statistics on the value. For instance, you can get download stats by author or by item type. Both "author" and "item type" are sets. Most Set definitions are straight-forward to declare, with the exception of "creators" (a.k.a. "authors").

Configuration example and options

        {
                'field' => 'divisions',
                'groupings' => [ 'authors' ]
        },

This defines the Set "divisions" - if the divisions field reflects the hierarchical structure of your institution (as it should) then you can get stats per division/school/faculty. You can also get "Top publications" per division.

Here are all the options you may use when defining a Set:

Note that "eprint" is a built-in Set and should not be defined in the configuration. The "eprint" Set is the collection of all the eprints (or "publications") of your repository. It is the assumed Set when no set is declared, as for the scenario "show me the top publications [among the entire repository]".

Reports

Reports are single pages which group different metrics together. The main report page (http://yourrepo.url/cgi/stats/report) is such an example. If you create a new report, "my_report", it will be available at the URL: http://yourepo.url/cgi/stats/report/my_report.

In the configuration, Reports can be seen as a top-to-bottom stack of Stats::View modules. Such modules know how to draw certain stats such as graphs, tables or pie charts, they just need to be position on the report. The module handling the generation of reports (Screen::IRStats2::Report) takes care of passing on the correct context to each Stats::View module. Such contexts include any date filters or set values selected by a visiting user.

A basic report showing the monthly downloads graph and the top downloaded publications:

my_report => {
	items => [
		{ 
			plugin => 'ReportHeader'
		},
		{
			plugin => 'Google::Graph',
                        datatype => 'downloads',
                        options => {
                                date_resolution => 'month',
                                graph_type => 'column',
                        },
		},
                {
                        plugin => 'Table',
                        datatype => 'downloads',
                        options => {
                                limit => 10,
                                top => 'eprint',
                                title_phrase => 'top_downloads'
                        },
                },

	],
};

The options are detailed on the API section.

Security aspects

Users must have the following two roles to view stats:

However these two roles are given to the "public" by default, meaning that anyone can view and/or export the stats. You can comment out these lines in the configuration to prevent that behaviour.