BIBB research data

Supply, processing and use of data via the BIBB-FDZ

Holger Alda, Annett Friedrich, Daniela Rohrbach-Schmidt

In its capacity as a departmental research institution of the Federal Government, BIBB has comprehensive VET data records at its disposal. This data is also made available outside BIBB via the “BIBB-Forschungsdatenzentrum” (BIBB-FDZ) [Research Data Centre] and in compliance with data protection regulations and standardised academic research procedures. This article outlines the data available via the BIBB-FDZ, states the requirements which apply to the processing and use of research data, highlights the scope to which and the purposes for which data is used by researchers, and concludes by debating the importance to VET research of having a research data infrastructure.

BIBB’s VET research remit

BIBB is a departmental research institution of the Federal Government. One of its statutory tasks is “contributing to vocational training research by means of scientific research.” (§ 90 Subsection 2 Vocational Training Act, BBiG). In order to fulfil this legal remit, BIBB collects large numbers of data sets within the scope of research projects. These form the basis for a wide range of empirical analyses conducted by BIBB. The main results are published as part of an indicators-based reporting system (such as the Data Report to accompany the Report on Vocational Education and Training) or in other specialist journals. As well as evaluating the data it collects, BIBB also endeavours to make information available to the wider academic research community via its FDZ in order to facilitate further analyses and research work outside the institute. The following section begins by describing the status of research data at BIBB on a general level.

Supply of research data at BIBB-FDZ

inhalt_Contentseite-MetaBildZoom 41359

A thematic and methodological structuring of the research data available at BIBB-FDZ can be undertaken on the basis of characteristic stages of individual (vocational) education histories, the survey units (individual-level or firm-level data) and the survey designs (cross-sectional/longitudinal design) (cf. Figure 1). At the end of 2015, supply at BIBB-FDZ comprised 57 different firm-level and individual-level data sets, each unique in Germany. Their main thematic focus covers one or more characteristic phases of educational or working life and they have been collected on the basis of either a cross-sectional or longitudinal design.

The first life course stage is school. This thematic area encompasses data sets which look in particular at the vocational orientation or vocational choices of young people. Examples of relevant FDZ supply relating to the time of attendance at school (or to shortly afterwards) include the BIBB Student Surveys on Occupational Titles from the year 2005 and the six waves of the BIBB School Graduate Surveys covering a yearly time period from 2004 to 2012.

The three following stages focus on dual vocational education and training. At the 1st threshold, progression to qualifying training is normally considered. Surveys of young adults (for example the 2006 and 2011 BIBB Transition Survey) address issues such as which young people progress to certain forms of vocational training or educational alternatives and the general institutional conditions under which this takes place. a firm perspective ( within the scope of the BIBB Company Panel on Training and Competence Development, BIBB Training Panel for short, which has been taking place annually since 2012 or the Reference-Firm- System), the focuses at the 1st threshold include which firms are able to offer or fill training places and the operational prerequisites that allow them to do so.

Data sets which are primarily aligned to the life course stage of vocational education and training provide information on quality aspects of firm-based VET (the 2008 BIBB Survey Vocational Training from the Trainees Point of View and a BIBB Survey on Design and Implementation of Vocational Training from a Firm Perspective, also from 2008). Both of these studies are examples of the creation of data sources for a research aspect at both individual and firm level, this being achieved by transporting 14 quality characteristics from an individual survey to the firm perspective. In thematic terms, this facilitates data analysis of the quality of vocational education and training both from an individual and firm point of view. On the firm side, there is also detailed data relating to the costs and benefits of firm-based VET and on training motives (BIBB Surveys on the Costs and Benefits of Apprenticeship Training for Firms, 2000 and 2007, and the BIBB Training Panel).

As it relates to transition to the labour market, the 2nd threshold raises issues relating to firm training measures, firm retention strategy and actual firm behaviour with regard to offering such trainees permanent employment. Databases for the analysis of these or similarly structured issues include the BIBB Training Panel and the Cost-Benefit Survey (CBS). An issue which arises on the individual-level is employment which is appropriate to training once VET has been completed. Amongst other studies, the BIBB Transition Surveys contain information regarding this. Data sets relating to the further characteristic stage of employment deal with matters such as working requirements and conditions, workplace demands and the relationship between (vocational) education and employment. As longitudinal studies, the Employment Surveys conducted by the BIBB and the Bundesanstalt für Arbeitsschutz und Arbeitsmedizin [Federal Institute for Occupational Safety and Health] (BIBB/BAuA Employment Surveys) and its predecessors conducted by BIBB together with the Institut für Arbeitsmarkt- und Berufsforschung (IAB) [Institute for Employment Research] offer an opportunity to investigate these and other issues during the period from 1979 to 2012 on the basis of a large number of observations.

Finally, there are also data sets which are explicitly dedicated to continuing training. The focus here may be on the perspective and the structure of continuing training providers (BIBB Monitoring Continuing Education 2007 to 2013), on supply of and interest in continuing training on the part of firms (BIBB Survey on Staff Fluctuation and Employer-Provided Continuing Training, FluCT, 2011), or on the scope of and motives for continuing training at the individual level (BIBB Survey on Determinants of Individual Continuing Training, DICT, 2010).

It should be stressed that the data sets and topics mentioned merely constitute examples and that existing information from the BIBB databases is useful for research perspectives other than those stated. Secondary uses of BIBB databases are extremely creative and innovative in this regard, meaning that any representation of a possible spectrum of evaluation for the databases is necessarily incomplete. Alignment of the research data sets to the individual stages of the life course concept thus serve the purpose of providing initial guidance on the contents and topics covered by the BIBB-FDZ.

Processing project data to create research data

In order to make BIBB databases available to external academic researchers, standards relating to data processing, data access and documentation and the provisions of data protection law need to be met. If individual data rows exist for individual sampling units, general reference is also made to micro data in order to differentiate this data terminologically from aggregated data. Directly following the survey, micro data is initially available in the form of project data to the research projects at BIBB generating the data. After additional processing work at the FDZ for academic re-use, the databases thus created are then referred to as research data.

In Germany, the production of research data from project data and the establishment of access to this research data for the external academic research community is organised on a cross-institutional basis. The BIBB-FDZ thus forms part of a larger research data infrastructure. The coordination and quality assurance is administered by the Rat für Sozial- und Wirtschaftsdaten (RatSWD) [German Data Forum]. The processing of project data to create research data described below is closely aligned to the accreditation criteria of the RatSWD (cf. extract in the information box).

Extract from the accreditation Criteria for Research Data Centers

  1. In the RDCs and DSCs accredited by the German Data Forum, research data are prepared and made available for scientific purposes. Additionally, service is offered for data users at these centers.
  2. Research data form the cornerstone of scientific understanding. Accordingly, they can be used for scientific purposes independent of their original collection purposes. This includes …associated objects of assessment.
  3. User access is … subject to field-specific ordinances of data protection and data security. …
  4. Access to microdata is governed by legal regulations that ensure equal treatment of data users. Accordingly, the data centers provide for transparent and standardized access arrangements. …
  5. In addition to the provision of access, the data centers produce detailed documentations of data. Furthermore, information about available data and data centers are made available in standardized form via the internet, data and methodology reports, and the provision of individualized advice. …

Source: German Data Forum: Accreditation Criteria for Research Data Centers, Berlin, September 2010. URL: www.ratswd.de/dl/doc/RatSWD_FDZCriteria.pdf.pdf (retrieved 24.03.2016).

Anonymisation of BIBB research projects

One of the essential requirements for a Research Data Centre is anonymisation of databases provided (cf. Criterion 3, RatSWD). In the case of BIBB, the legal foundation in this regard particularly includes the provisions contained within the German Data Protection Act.

The requirement here is for so-called direct and indirect identifiers to be removed from databases before research data is forwarded to third parties. Direct identifiers are information such as name, address, telephone number and e-mail address. Indirect identifiers, on the other hand, are characteristics which may also facilitate the re-identification of individual sampling units given a certain additional knowledge. In the case of persons, this may be occupations with very low numbers, very high levels of income or more or less unique combinations of job tasks. In the case of firms, the risk of re-identification mainly arises through combinations of characteristics which especially help determine highly specialised firms and market leaders (for example simply by combining firm size, economic activities, and location). Full-text data also sometimes contains information that makes it possible to re-identify individual sampling units.

The challenge for a Research Data Centre is to anonymise databases in such a way so as to exclude the re-identification of individual sampling units whilst still permitting academic research evaluations to take place in an as unrestricted manner as possible. There are no generally valid rules governing this process. Each form of data anonymisation for a research data set is a unique approach for which existing examples are only able to provide limited guidance.

Securing access to research data

Alongside anonymisation, a further major task is the securing of transparent and standardised access to BIBB research data (cf. RatSWD Criterion 4).

Academic researchers outside BIBB receive Scientific-Use Files (SUF’s) to use in research projects. Campus Files (CF’s) are offered for course work purposes (Master and Bachelor thesis, seminar papers) and for use in lectures. SUF’s or CF’s are made available to researchers for a comparatively long but always restricted period. This means that they are able to work with the original set of variables of the primary data in their usual working environment.

For this reason, the anonymisation process requires the removal (or coarsening) of all variables with re-identification potential contained in the SUF’s or CF’s. The sensitive variables separated out under data protection law are stored in secondary data sets. A SUF is then produced from the remaining major data set. CF’s are created via simple random selection of about two thirds of all sampling units within the relevant SUF. In CF’s, some individual characteristics present in differentiated form in SUF’s are also consolidated onto higher aggregated levels.

Because of the high risk of re-identification, firm-level data cannot be provided as an SUF or CF. For this reason, firm-level data (and the sensitive variables from individual data sets not included in SUF’s/CF’s) are only made available for research via Remote Data Access or One-site Use at the Safe Centre of BIBB-FDZ in Bonn/Germany.

Data documentation and classification provision

Data work carried out by the BIBB-FDZ and in particular the anonymisation processes conducted are documented and published (cf. RatSWD Criterion 5). All of the BIBB-FDZ’s research data sets are registered at the “Registrierungsagentur für Sozial- und Wirtschaftsdaten” (da|ra) [Registration Agency for Social and Economic Data] and are citable to an unlimited extent by dint of having been allocated a Digital Object Identifier (DOI). For larger data sets and data sets which are in frequent demand, all data processing steps including the genesis of the original data materials are collated in the BIBB-FDZ “Data and Methodological Reports” series. Smaller data sets are described by means of brief documentation via the BIBB-FDZ Metadata Portal (https://metadaten.bibb.de/).

Figure 2: Examples of classification provision in the BIBB-FDZ Metadata Portal

The documentation of data sets includes tools such as questionnaires and the field and methodological reports of the surveying institutes. These tools are additionally made available in English (i.e. the BIBB/BAuA-Employment Surveys and the Supplemental Task Survey 2012 which are in particular demand internationally. In order to be able to assess and compare the evaluation potential of individual research data in a better way, the Metadata Portal also documents common socio-economic classification variables in the areas of occupations, economic activities, regions, and socio-demographics (cf. the selection of individual classifications in Figure 2). which are either already present in the research data sets or can be generated Stata do-files and SPSS syntaxes document the generation of all classification variables not included in the original project data, thus enabling users to gain a detailed understanding of the creation of relevant variables. In addition to this, information is provided on the origin/copyright holder, method, structure and development of the individual classifications.

Use of research data

A number of demand indicators from the reporting system for research data infrastructure institutions within the scope of the annual accreditation carried out by the RatSWD provide a meaningful picture of the re-use of BIBB data by external academic researchers. By the end of 2015, around 500 research projects conducted outside of BIBB were working with BIBB research data (cf. Table 1). These projects may encompass more than one person (total number of data users in 2015: 990) or more than one data set (total number of data sets in 2015: 1,044). Around 13 per cent of research projects being conducted outside BIBB are domiciled abroad, including at institutions such as the London School of Economics and Political Science and Harvard University.

Table 1: Extent of use of research data from BIBB-FDZ

The thematic areas which are addressed in these projects by using the research data are multifarious. They range from occupational research to continuing training and also extend to encompass healthcare topics (cf. selected examples in Table 2).

The main research results produced by external academic researchers are published in peer-reviewed national journals (such as the Kölner Zeitschrift für Soziologie und Sozialpsychologie or the Zeitschrift für Erziehungswissenschaft) and in international journals (for example the European Sociological Review or Social Science Research). BIBB-FDZ is also used to prepare dissertations, reports for ministries and papers for communication between academic research and practice.

In addition to this, the CF’s which have been available since 2014 are helping to train junior scientists. The CF’s have thus far been used for more than 20 seminar papers and Bachelor and Master theses and have been deployed as a database within the scope of eleven university teaching courses outside BIBB.

Table 2: Examples of external research topics using research data from BIBB-FDZ

Benefits of a research data infrastructure

The implementation and operation of a research data infrastructure institution for the re-use of research data require the deployment of technical, organisational and content-related data protection measures as well as a certain associated expenditure on documentation. Such work is highly valuable from an academic research, policy making and societal perspective. External use of BIBB data makes VET research more multi-faceted and links it more closely with BIBB’s internal research. Academic researchers operating outside BIBB also provide valuable feedback and indications regarding data quality and further possible improvements to data collection. Such proposals can be acted upon in future surveys and lead to enhancements in quality. By the same token, the provision of research data acts as a vehicle for the sustainable integration of VET research at BIBB into the national and international research landscape.

Vocational education and training research as a whole is the main beneficiary of a Research Data Centre such as that in place at BIBB. Academic researchers are able to analyse data which at least in some cases has been generated via considerable expenditure of (public) money, time, and know-how. The comprehensive documentation also provides quality-assured information to enable external researchers to conduct their own data surveys (for example using questionnaires or the experiences gained from the survey methods deployed in the field). The best-case scenario is that access to secondary data material renders planned data surveys obsolete.

The provision of quality-checked, standardised and citable research data also plays a role in the publication of research results. The more established and recognised the database forming the foundation of the analytical results of the relevant research work is, the better the acceptance of and response to an academic research publication will often be. Provision of generally accepted research data material that is precisely the same for all users also fosters progress in terms of arriving at quantitative and empirical findings and evidence-based policy guidance in the economic, social and educational sciences which form essential sub-disciplines of vocational education and training research.


Dr., Head of the BIBB Research Data Centre (BIBB-FDZ)

Research Associate at the BIBB Research Data Centre (BIBB-FDZ)

Dr., Research Associate at the BIBB Research Data Centre (BIBB-FDZ)

Translation from the German original (published in BWP 2/2016): Martin Kelsey, Global SprachTeam, Berlin