Author Archives: gonzalezwsu22

Data Science who?

What Is Data Science?
First of all, data science is an interdisciplinary field. In this article, we’ll cover the key aspects that you can expect to encounter in a data scientist role. Data science is, in fact, a very broad discipline, which continues to expand with new data-related needs.

Type of data used:
Activities performed on the job
Time allocation
Key skills
Frequently used data science methods

What Types of Data Do Data Scientists Use in Their Analysis?
The answer is: they use both structured and unstructured data. Structured data comes in the form of Excel spreadsheets and CSV files. Examples of such data are client tables and spreadsheets with transaction information. Unstructured data, on the other hand, is everything else: images, video, and audio files, all types of other data we can have. Reportedly, unstructured data represents more than 80% of all enterprise data, so every data scientist worth their salt should be able to take advantage of it.

What Are the Main Data Scientist Responsibilities?
It depends mostly on company size. In larger enterprises, there will be a higher degree of specialization as the company is able to afford more resources. The main activities that a data scientist can perform – but not necessarily does – in a business environment are:

Data collection and storage
Data preprocessing (also referred to as data cleaning)
Data organization
Data visualization and the creation of KPI dashboards
Experimentation and A/B testing
Statistical inference
Building ML models
Evaluating ML models
Deploying ML models
And Monitoring how ML models perform

Which Data Science Tasks Take Up the Most Time?
Ask anyone in the industry and you will hear the same answer. They’ll tell you they spend 80% of the time in an effort to make a hypothesis, find the necessary data, and clean it. Only 20% of the useful hours are dedicated to performing analysis and interpreting the findings.

What are the Key Data Scientist Skills?
There are many abilities that you should have in order to become a skilled data scientist. Some of the most frequently used data science techniques are:

Statistical inference.
Linear regression.
Logistic regression.
Machine Learning techniques such as decision trees, support vector machines, clustering, dimensionality reduction.
Deep Learning methods – supervised, unsupervised, and reinforcement learning.
Regardless of the method, a data scientist’s end goal would be to make a meaningful contribution to the business – to create value for the company.

How Does Data Science Make a Meaningful Contribution to the Business?
We can distinguish among two main ways to do that. First, help a company make better decisions when it comes to their customers and employees. We hope you enjoyed this article and learned something new. Now that you have a basic understanding of the field of data science, you might be wondering where to start your learning journey. Our Introduction to Data Science course offers a beginner-friendly overview of the entire field of data science and all its complexities.

https://365datascience.com/trending/data-science-explained-in-5-minutes/

From the blog CS@Worcester – The Dive by gonzalezwsu22 and used with permission of the author. All other rights reserved by the author.

The Bug

Internal bug finding, where project developers find bugs themselves, can be superior to external bug finding because developers are more in tune with their own project’s needs and also, they can schedule bug finding efforts in such a way that they are most effective, for example after landing a major new feature. Thus, an alternative to external bug finding campaigns is to create good bug finding tools and make them available to developers, preferably as open-source software. Bug finders are motivated altruistically (they want to make the targeted software more reliable by getting lots of bugs fixed) but also selfishly (a bug finding tool or technique is demonstrably powerful if it can find a lot of previously unknown defects). A priority for “a valid bug that should be fixed, but” I’ve found FindBugs’ “Mostly Harmless” useful for this and have lobbied to include it in every bug tracker I’ve used since, though it’s more of a severity than a priority. External bug finders would like to find and report as many bugs as possible, and if they have the right technology, it can end up being cheap to find hundreds or thousands of bugs in a large, soft target. A major advantage of external bug finding is that since the people performing the testing are presumably submitting a lot of bug reports, they can do a really good job at it.

A priority for “a valid bug that should be fixed, but not something that will ever be at the top of the list unless something unforeseen happens.” The OSS-Fuzz project is a good example of a successful external bug finding effort; its web page mentions that it “has found over 20,000 bugs in 300 open-source projects. As an external bug finder, it can be hard to tell which kind of bug you have discovered (and the numbers are not on your side: the large majority of bugs are not that important). However, if they actively don’t trust you, for example because you’ve flooded their bug tracker with corner-case issues, then they’re not likely to ever listen to you again, and you probably need to move on to do bug finding somewhere else. Much more recently, it has become common to treat “external bug finding,” looking for defects in other people’s software, as an activity worth pursuing on its own. An all-too-common attitude (especially among victims of bug metrics) is that bug reporters are the enemy—but those making pull requests are contributors. First, every bug report requires significant attention and effort, usually far more effort than was required to simply find the bug. Occasionally, your bug finding technique will come up with a trigger for a known bug that is considerably smaller or simpler than what is currently in the issue tracker.

From the blog CS@Worcester – The Dive by gonzalezwsu22 and used with permission of the author. All other rights reserved by the author.

GitLab and the growth within

Collectives™, which launched this past June, is a new offering that creates a set of spaces where content related to certain languages, products, or services is grouped together on Stack Overflow. These spaces make it easier for users to discover and share knowledge around their favorite technologies. With the launch of its Collective, GitLab will continue to build on the collaboration that already exists with the community of developers and contributors using its platform. “Community is at the core of GitLab’s mission. With more than 1 million active license users and a contributor community of more than 2,400 people, we have a strong community aligned with our mission – to create a world where everyone can contribute,” said Brendan O’Leary, Senior Developer Evangelist at GitLab. “GitLab’s Collective on Stack Overflow aligns with our mission. This new space will help us to expand our open-source collaboration so contributors and developers can share and learn about version control, CI/CD, DevSecOps, and all-remote workflows. We believe the GitLab Collective will be a place where we can discover feedback and create opportunities for the GitLab community to contribute to Stack Overflow’s community.”

GitLab’s Collective is defined by a set of specific tags related to the company’s technology such as ‘gitlab’ and ‘gitlab-ci’. Users who join the collective can easily find the best answers and get in-depth technical product information about GitLab’s platform and application through how-to guides and knowledge articles. They can also see how they stack up on the leaderboard, and top contributors can be selected by GitLab as Recognized Members, users the company approves to respond to questions or recommend answers. When Collectives was launched on Stack Overflow with Google Cloud and Go Language earlier this summer, it was already seen by thousands of community members joining in. The contributions of the Collectives’ community, taken together, can help the millions of curious question askers who visit Stack Overflow, as well as users looking for a solution to a problem or a way to improve their skills. GitLab’s efforts to expand the pool for open-source collaborators aligns with their mission, to empower the world to develop technology through collective knowledge.

With such developments happening with GitLab’s, we can foresee GitLab leading the way for developers and engineers to further their knowledge and expand themselves. With the help of learning about version control, CI/CD, DevSecOps, this will definitely continue the growth for both those that use it and GitLab itself.

From the blog CS@Worcester – The Dive by gonzalezwsu22 and used with permission of the author. All other rights reserved by the author.

K8 Crash Course

Blog Discovery

As a newcomer to the software industry, Kubernetes was foreign to me. This is also an article on the K8s, so please indulge me.

Applications: developers who provide a predefined service to stakeholders, Docker is a well-known runtime environment for building and building applications in containers. Creating a sample Go app; let’s create a basic Go app that we’ll use to deploy to our minikube cluster.

Kubernetes: The Kubernetes Engineering Manager is an open source container management and deployment platform. It orchestrates the clusters of virtual machines and schedules the containers to run on those virtual machines based on their available compute resources and the resource requirements of the container.

What are Pods?

A Kubernetes pod is a group of one or more containers, linked together for the purpose of Nodes and clusters: workstation that provides resources to developers a node can be thought of as the workstation that provides these resources and Kubernetes, the manager that allocates these positions to employees. In Kubernetes, a node is a working computer that can be virtual or physical. A node can have many pods, and Kubernetes automatically manages the scheduling of pods among the nodes in the cluster.

Deployment: the team’s objectives and structure defined at the start of the year a deployment provides declarative updates for Pods and ReplicaSets. In a deployment, we set a desired state and the deployment controller gradually converts the current state to the desired state. Take a look at the deployment file below deployment. The above deployment file is like a declarative template for Pods and Replicas.

The above deployment named myapp in {metadata.name}, creates a replica set to display two pods of myapp.

What are ReplicaSets?

Creating a K8s deployment kubectl apply -f deploy. Kubernetes Services: The SPOC team that routes relevant external communications to developer pods are ephemeral resources. In Kubernetes, a service is an abstraction that defines a logical set of pods and a policy for accessing them. Kubernetes services provide addresses through which associated pods can be accessed. Create a Kubernetes service kubectl apply -f service. The minikube tunnel command can be used to expose LoadBalancer services. The Minikube tunnel runs as a process on the host and creates a network route to the cluster CIDR service using the cluster IP address as a gateway. You can now use this IP address to open the service in the browser. The section on services is incomplete without an analogy with the life of developers. Imagine an external team uncertain or confused about the use of a feature developed by the development team. With that, we come to the end of the article on K8s and the Adventures of the Freshest Day One.

https://betterprogramming.pub/relating-with-docker-and-kubernetes-as-developers-an-analogue-5e662b1f817b

From the blog CS@Worcester – The Dive by gonzalezwsu22 and used with permission of the author. All other rights reserved by the author.