Learning to Read Data
The Göttingen Campus represents a network of universities and
university-affiliated institutions recognised for rigour and
robustness in research. As such, it is at the forefront of
development in the field of Data Science. By drawing on
established expertise and infrastructural resources, the project
“Learning to Read Data” is aimed at promoting data-processing
skills throughout the Campus as a whole. Working on this basis,
the purpose of the project initiated by the Göttingen Campus
is to offer broadly based and generally available tools for
acquiring basic data-handling skills in all bachelor degree
courses. The project is based on a three-pronged concept
involving: offering a purpose-designed lecture course
entitled “Data Literary Basics”, setting up a DataLab,
and to round off, editing and providing “Open Educational Resources”.
This concept has been submitted by the University of Göttingen
in participation with the
Data Literacy Education project
sponsored by the “Stifterverband” [Donors’ Association for the Advancement of Science]
and the Heinz Nixdorf Foundation.
This project, as initiated by the Göttingen Campus, is thus aimed at offering broadly based and readily available tools for teaching basic data handling skills to all students working for a bachelor’s degree. Three important building blocks are constituent in achieving this:
- the interactive lecture course on Data Literacy Basics, available to students of any subject doing a bachelor’s degree, teaching the fundamentals of data competence, practice-focused and research-related,
- setting up a DataLabs that acts as an interface between not only the different fields of study from which the students come, but also regional business and industry as well as social actors, and, in addition, the research scientists working together with CIDAS [Campus -Institute Data Science], their aim being to apply the theory of the lecture course to practical projects,
- Open Educational Resources, accordingly selected and checked for high standards, complement the lecture course and DataLab in teaching data handling skills. Adapted to the examination requirements of the degree programme in question, they can contribute towards the overall assessment.
The overarching aim in the teaching concept is that “all graduates may acquire skills in data handling that are important to them for their studies, research, job and social participation”. In a peer-to-peer assessment procedure, promoted by the “Stifterverband” and CHE [Centre for the Development of Universities], heads of universities, lecturers, students and ancillary services together have analysed the strengths and weaknesses and identified priority areas for implementation.
The “Data Literacy Basics” course
The summer term of 2019 will see the start of “Data Literacy Basics”, a joint course offered by the Göttingen Campus and spearheaded by the Centre for Statistics with accompanying tutorials. As a new course, it teaches the fundamentals of data handling skills for students studying for a bachelor’s degree. It will take place once a week with two hours of lectures and tutorials in the DataLab so that the students will be able to learn the relevant essentials of practical application by analysing suitable datasets.In order to ensure that this course, designed for undergraduate students of any subject, is recognised as an option on the curriculum, it will be available as such as from the summer term of 2019 as part of the university-wide core competence courses. As from the summer term of 2020, the course in question will be available to the Faculty of Arts, the Faculty of Social Sciences and the Faculty of Economics, and in the content area of teaching training. The special focus of attention here is on the subjects offered by the Faculty of Arts and the Faculty of Social Sciences where teaching data handling skills has a particularly extensive innovatory potential. The intention is to include further faculties by the summer term of 2021.
In terms of content, the planned course focuses especially on
- a lecture with a method-based approach
- exercise material in the tutorials that is subject-specific,
- providing learning content und suitable data sets for practice in the form of modules that are open and accessible online, and complementary to the lecture,
- an integrated, hands-on approach, directly applying the methods taught,
- involving, at an early stage, local businesses, research institutions and social actors not only in the planning of the course content but also in providing practical exercise material and examination projects.
The lecture draws from interactive support from a browser-based programming
environment that is easily available to students logging into a
university account on their laptop.
Learning a scripting language
So that students are able to use tools and libraries in handling data,
they need to learn a scripting language. The decision as to which language
is suitable depends not only on how easy it is to learn it but also on
how broadly it can be applied. The aim is to give students the tools with
which they can subsequently work in any research area no matter what the
subject matter is. Python offers great advantages here because of its
widespread use in many disciplines and because of its open-source nature.
Together with Jupyter Notebooks and the JupyterHub, Python-Universum also
offers two browser-based tools that greatly facilitate teaching and
learning, and allow a prompt start.
Collecting, reading, writing and editing data
In this part of the lecture the focus is not only on introducing
the student to collecting and managing data but also on facilitating
(automated) reading, writing and editing data in standard formats.
For this purpose, students require an introduction to the nature of
data that is short and practical, and that puts what is learned into
a greater social and scientific context. Next, the lecture shows how
to deal with questions on organising, manipulating and converting
data with the help of established and generally used tools from the
Python “toolkit” (Python Standard Library, Pandas). Finally, the
lecture discusses the issue of incomplete, corrupted entries in
datasets and what effects incomplete datasets can have on analytical
results.
Exploring data
This part of the lecture deals in more detail with the question of
how a library and its methods can be developed and used. Generally
accessible libraries (Pandas, Matplotlib) that facilitate dealing
with data are used to calculate and visualise simple statistical
quantities. With the help of these tools students can learn how to
interpret the results.
Statistical analysis
Here again, the focus is not so much on acquiring in-depth theoretical knowledge of statistics as on the ability to apply methods. With the help of libraries (SciPy, Scikit-learn) linear regressions and simple methods from the subject area of machine learning can be quickly applied to accessed datasets. In doing so, it is possible to draw attention to the problems that may occur using the methods or interpreting the results. The aim here is to give guidance in questions relating to
- what methods are suitable to deal with the respective questions and data,
- how the results gained can be used to find answers to the original questions and where the weaknesses of different methods lie.
Ethics, data protection and publishing data
As part of a broad review, the lecture on “Data Literacy Basics”
puts the discussed contents into the overall context of working
with data and underlines the significance of aspects of data
ethics, data protection and publishing. In this way, students
are encouraged to extend and enhance their successfully acquired
skills according to their individual needs in further course
modules. For this purpose, they can choose from a collection of
suitable Open Educational Resources, which forms the third pillar
of this project.
Setting up a DataLab
The second integral part of our initiative involves setting up a DataLab that is closely linked to the lecture and can be practically applied in the different disciplines. The DataLab offers an opportunity to hold tutorials and supervise examination projects connected with the lecture described above. But it also includes the already existing advice services for undergraduate and PhD students at the University of Göttingen, and uses these for the purpose of teaching. In addition, the DataLab benefits from the systematic involvement of young academics, that is, more than 3,000 PhD students at the University of Göttingen and the Göttingen Campus, and also approximately (expected) 100 students per academic year studying Data Science degree courses. A large number of PhD students has already acquired well-grounded expertise in subject-related data analysis and would be able to pass this on to participants of the course described. On the other hand, students from the Data Science study courses working as tutors can use their skills for the technical side of data-handling. In this way, the DataLab is able to teach participating students the subject-specific application of data skills within their own area of study. In addition, the DataLab is a focal centre for regional businesses to promote cooperation in all aspects of data analysis.Tutorials and project papers
Weekly, complementary tutorials are offered with the lecture on “Data Literacy Basics”. In the first phase of these tutorials, students are to learn how to work independently using their data-handling skills with the help of practical and subject-specific examples. The emphasis here is on help in solving subject-specific problems and on acquiring further independent skills in applying the script language and the tools to deal with data-related problems. The following are three practical examples taken from the Faculties of Economics, Social Sciences and Arts respectively:
- In tutorials for students of the Faculty of Economics the complex relationships between socio-economic variables such as working hours, wages and unemployment can be analysed with the help of comprehensive panel databases and by using regression techniques.
- In a tutorial for the Faculty of Social Sciences the online, freely available election manifestos of the parliamentary parties can be uploaded and compared with the help of corpus-based research methods.
- The acquired data-handling skills can be used in the Faculty of Arts, for example, to elucidate topometric tests carried out in connection with the 3D campus laboratory in the Department of Classical Archaeology. The computer-based comparison and analysis here can give the students more detailed insight.
In addition to the tutorials, the examination projects at the
end of the lecture on “Data Literacy Basics” are also supervised
by the DataLab. Students work at a data-related problem that
is relevant to their subject and thus demonstrate that they can
apply the acquired skills to a practical situation. Projects
are allotted to small groups of 3-5 undergraduate students,
whereby each group is supervised by a tutor. Intensive
supervision thus guarantees all-round support for the students.
Producing practical examples on the basis of data consulting
Since 2012 the Centre for Statistics has been offering advice in
statistics for those completing a bachelor’s thesis in any area
of study at the University of Göttingen. This is complemented by
a range of subject-specific advice options offered to undergraduate
and postgraduate students by seven different faculties.
In setting up a DataLab a central advisory facility can give
undergraduate and postgraduate students at the Göttingen Campus
an opportunity to follow up any questions about data handling.
In this way, existing facilities are being improved and extended.
A policy of central and focused information adds to more awareness
among students of the DataLab as a source of support.
Part of the work of the advice service is to systematically collect
relevant questions and datasets for the purpose of providing,
where possible, a feedback for the university teaching programme.
In particular, questions, datasets and insight gained are to be
prepared in such a way that they can be used as practical examples
in the introductory lecture and the corresponding tutorials,
the aim being to cover as many areas of data-handling skills as
possible. In addition, our Data Consulting also helps tutors to
deal with questions in examination projects relating to the planned
course.
A centre for social actors and regional industry
Based on already existing networks within the Göttingen Campus,
the SüdNiedersachsenInnovationsCampus
[Southern Lower Saxony Innovation Campus] and the
Measurement Valley,
the Datalab is intended to function not only as a focal
point for businesses
and further social actors but also as a platform for an
exchange of questions between students, researchers, businesses
and non-university institutions addressing specific issues
relating to data analysis. This feeds into the planned project
dealing with data-specific issues from non-university partners
that can be a topic for examination projects.
Open Educational Resources (OER)
In addition to the lecture and DataLab, there are further courses offered by other universities, representing a third pillar of the project and providing students with freely available study material. This could involve issues like “data organisation”, “quoting databases”, or in-depth study of data ethics, computerised decision-making or the publication of data.Content and scope
The emphasis here is primarily on less demanding courses
that are also viable for students from the Faculties of
Arts and Social Sciences. Within the framework of the
project only a limited number of innovatory ideas relating
directly to the planned lecture are to be included
(for example, recordings of the lecture); instead,
with the target group in mind, material already available
from other providers is to be compiled and catalogued.
The project staff are responsible for this on the one hand,
but on the other hand students and teachers may make
relevant suggestions and state specific needs.
Modules varying in length of time and depth of content
can meet the different needs.
Reconition of course credits
With relevant extramural and internal advisory boards
collaborating, there is a project-related regulatory
procedure in place that determines the overall framework
and practical issues of recognition of credits within
the curricula. For the purpose of cataloguing, module
descriptions are compiled and approved on the basis of this.
Similar modules, that are also offered at the same time,
are systematically compared as to their suitability and
recommended accordingly.
Accessing material
The assembled information on the overall procedure and
the catalogued OER is coordinated with partners in the
eCULT-Netzwerk
and published online. Based on this arrangement,
students and course coordinators at other universities
can also access data that is linked to the respective
ER material. In this way, relevant information about
corresponding courses can be made available beyond the
University of Göttingen.
Evaluation and quality management of courses on offer
The evaluation of new courses teaching broad-ranging data-handling skills is subject to different procedures. The EvaSys evaluation system, already a standard procedure at the university, allows an initial evaluation of the individual courses, especially of the lecture on “Data Literacy Basics” and the DataLab. The intention is to make increased use of this possibility for the purpose of receiving a detailed feedback from the students, especially in newly introduced courses. Evaluation of the lecture on “Data Literacy Basics” and the DataLab is also undergoing a formative process which includes social media channels.At the same time, a board of advisors will ensure that the idea of the project– teaching data-handling skills to undergraduate students from all the faculties – is implemented. The board will consist of researchers from universities and university-affiliated research institutions, stakeholders in industry and society, and student representatives who, once a year, will together evaluate the courses and the goals achieved so far, offering advice to the project team for changes and improvements. The evaluation will be based on an annual report that contains indicators denoting course attendance and capacity, the evaluation results of the EvaSys evaluation system and further relevant information.
Why is learning data handling skills so important?
The increasing digitalisation of industry, science and society is making the handling of data and understanding the information obtained from data a core competence for participation in society. This is reflected on the one hand in the increasing importance of the corresponding qualifications required for job applications in many fields of work. On the other hand, there has been a significant increase in the number of specialised data science courses that are focused on computer science, statistics or mathematics. As a result, there is a critical imbalance between the training of data scientists and the great demand for young professionals with data handling skills, but who are not specialised data scientists.What are the credentials of the Göttingen Campus for the “Data Literacy Basics” course?
The Göttingen Campus has collaborated for many years with the University of Göttingen, the Faculty of Medicine , five Max Plank Institutes, Deutschen Zentrum für Luft- und Raumfahrt [the German Aerospace Centre], Deutschen Primatenzentrum [the German Primate Centre] and other non-university research institutions. These are excellent credentials for creating a cross-faculty, campus-wide teaching programme for data handling skills. The support programme “Data Literacy Education” gives the Göttingen Campus an opportunity to establish a programme of this kind, target-driven for undergraduate students, and to secure its place permanently in the curriculum.As members of the Board of Trustees at the university, the Vice President for Study and Teaching and the full-time Vice President for Facilities, Infrastructure and Operations are responsible for promoting the digital teaching programme. The Göttingen Centre for Statistics with a staff of academics from seven faculties has longstanding experience in interdisciplinary and cross-faculty tuition not only in the master degree course Applied Statistics but also in the doctoral and certificate programme Applied Statistics and Empirical Methods. The Centre for Statistics also offers statistical advice in various forms (for undergraduate and PhD students, and research institutions) and is therefore well-equipped to be able to assess the current demand for data competence in various subject areas.
With the support of the eResearch Alliance the university has also, over the past few years, gained substantial experience and competence in the area of data management covering the whole cycle of data processing (designing and collecting, editing and analysing, storing long-term, publishing, and reusing existing data). The Service für Digitales Lernen und Lehren [Service for Digital Learning and Teaching] offers teachers as partners didactic advice in dealing with questions on digital skills and in using new media. There is an intensive exchange between the Lower Saxony network eCult+, promoted in the second phase by BMBF, and twelve other universities on the subject of learning and teaching digital technology. The society for scientific data processing (GWDG) also offers professional support for advanced IT-infrastructure. In addition, the GWDG and the eResearch Alliance together with other partners like the Centre for Statistics are already highly successful in organising annual, interdisciplinary Data Science Summer Schools. This underscores how well the various institutions are interlinked and points to their shared focus on data science. The Göttingen Campus has excellent contacts with regional industry, management and society, especially through the SüdNiedersachsenInnovations-Campus and the Measurement Valley.
The Campus Institute for Data Science (CIDAS) is at present in the process of establishing itself as a partner institution of the Göttingen Campus. The CIDAS is at the interface between computer science, statistics and mathematics, and applied disciplines especially within the field of economics, social sciences and the natural sciences. Its purpose is to combine the most recent developments in scientific methods in various areas of data science (e.g., machine learning, artificial intelligence, simulation-based methods, Applied Statistics, etc.) with internationally competitive research in the profile fields of the University of Göttingen and its associated research institutions. This involves not only scientific topics that are currently be researched in various scientific centres but also the integration of new key areas of work by way of four professorships yet to be announced (in the fields of data science, artificial intelligence and machine learning). The CIDAS will also host the bachelor degree courses Applied Data Science and Mathematical Data Science, to commence in the winter semester of 2018/2019.
The Göttingen Campus offers an excellent environment for research and teaching in data science for students specialising in this area. Especially because of the way the it is organised, the CIDAS is set to give on-going impetus to future projects and to continue its dynamic development. The three components of the project “Learning Reading Data” make it possible to add an important dimension to these developments and teach data handling skills covering the whole spectrum of university education for undergraduate students in all fields of study. The planned concept also allows for flexibility in adapting and scaling course contents and methods for a large number of students, it makes it possible to link theory and practice on a broad basis and to involve stakeholders in the process of development. In the medium term, there will be a review of the question of whether the measures in place can be brought together and complemented to produce a Data Literacy Certificate, giving students formal confirmation of their acquired data handling skills, over and beyond recognition of credits within the curriculum.
Structures and Team
The project team consists of two essential groups. A board of directors, which accompanies and instructs the implementation of the planned goals; and a group of individuals, whose task it is to takes care of the concrete implementation of the underlying project goals. The board of directors consists of the following persons:- Prof. Dr. Andrea Bührmann (Vice-President of the University of Göttingen of Studies, Teaching, Equality, and Diversity)
- Prof. Dr. Norbert Lossau (Vice-President of the University of Göttingen for research and information infrastructure)
- Prof. Dr. Thomas Kneib (Speaker of the Centre for Statistics, Professor for Statistics)
- Prof. Dr. Ramin Yahyapour (Managing Director of the GWDG, Speaker of the Cener for Applied Computer Science, Professor for Practical Informatics)
- Prof. Dr. Stephan Herminghaus (Max-Planck-Institut for Dynamics and Self-Organization)
- Prof. Dr. Stefan Halverscheid (Professor for Mathematics Education)
- Prof. Dr. Albert Busch (Dean of Studies of the Faculty of Philosophy)
- Prof. Dr. Stefan Dierkes (Dean of Studies of the Faculty of Economic Sciences)
- Prof. Dr. Timo Weishaupt (Dean of Studies of the Faculty of Social Sciences)
- Claudia Trepte (Manager Measurement Valley)
The management board thus combines scientific expertise with the necessary decision-making
authority with regard to university teaching, so that the course can be successfully
organised and promptly and broadly anchored in the university.
The team for implementation consists of
- Dr. Benjamin Säfken (scientific coordinator at the Centre for Statistics with extensive experience in establishing quantitative courses in university curricula)
- Dr. Alexander Silbersdorff (Statistical consulting at the Centre for Statistics with extensive experience in the design and implementation of statistical courses for heterogeneous audiences)
- Jana Lasser und Dr. Debsankha Manik (MPI for Dynamics and Self-Organization with experience in delivering multidisciplinary Data Literacy courses using Python and Open Educational Resources
- Dr. Wolfgang Radenbach (Head of the digitization department for studies and teaching with extensive experience in Open Educational Resources)
- Lea M. Dammann (Research Assistant and Doctoral Candidate at the Chair of Statistics)
- René-M. Kruse (Research Assistant and Doctoral Candidate at the Chair of Statistics)